epishman May 6, 2019 at 09:57

Swift vs Rust - benchmarking on Linux with a (un) clear finish

Hello, Habr!

I periodically look at Swift as an application programming language for Linux - simple, dynamic, compiled, without a garbage collector, which means that, theoretically, it is also suitable for devices. I decided to compare it with something as young and fashionable as Rust. As a test, I took an applied task - parsing and aggregating a large JSON file containing an array of objects. I tried to arrange the source codes in a single style, compared according to 4 parameters: execution speed, binary size, source size, subjective impressions of coding.

Task Details. There is a 100 MB JSON file, inside an array of a million objects. Each object is a record of debt - the name of the company, a list of telephones and the amount of debt. Different companies can use the same phones, and by this sign they need to be grouped, i.e. identify real debtors with a list of names, a list of telephones, and total debt. The original objects are “dirty”, i.e. data can be written as strings / numbers / arrays / objects.

The benchmarking results puzzled me. Details and source texts are under the cut.

Source JSON:

[
    {"company":"Рога и копыта", "debt": 800, "phones": [123, 234, 456]},
    {"company":"Первая коллекторская", "debt": 1200, "phones": ["2128506", 456, 789]},
    {"company":"Святой престол", "debt": "666", "phones": 666},
    {"company": "Казачий спас", "debt": 1500, "phones": [234567, "34567"], "phone": 666},
    {"company": {"name": "Шестерочка"}, "debt": 2550, "phones": 788, "phone": 789},
...

Implementation details

The task is divided into 4 stages:

1) Buffered character-by-character reading of a file, streaming parsing and selection of objects from an array. I didn’t bother with searching for libraries like YAJL, because we know for sure what the array is inside, and you can select objects by counting the opening and closing curly braces {}, since they are ASCII, not multi-byte Unicode. Surprisingly, I did not find in both languages the functions of character-by-character reading from the Unicode stream - a complete disgrace, thank God that JSON parsers take this work upon themselves, otherwise I would have to cycle with bit masks and shifts.

2) The object lines selected in stage 1 are passed to the regular JSON parser, and at the output we get a dynamic structure ( Any in Swift and JsonValue in Rust).

3) We dig into the dynamic data, at the output we form a typed structure:

//source data
class DebtRec {
    var company: String
    var phones: Array
    var debt: Double
}

4) Aggregate the debt record - we look for the debtor's phone number (or create), and update its attributes. Why use 2 more structures:

//result data
class Debtor {
    var companies: Set
    var phones: Set
    var debt: Double
}
class Debtors {
    var all: Array
    var index_by_phone: Dictionary
}

We store the final debtors in a dynamic array (vector), for quick phone searches we use an index hash table, in which for each telephone we store a link to the debtor. Oops ... Remembering that Rust does not encourage the storage of real links (even immutable ones), instead of the link we use the numeric index of the customer in the all array - index access is a cheap operation. Although, of course, if everyone switches to access by indexes and hashes, we get not an application, but some kind of DBMS. Can Rust get it from us?

PS
My Rust code is far from ideal - for example, a lot of to_string (), while it would be more correct to mess with links and lifetimes (although I assume that the clever compiler did this for me). As for Swift, the code is also very far from perfect, but this is the purpose of benchmarking - to show how a simple person is inclined to solve problems in a particular language, and what comes of it.

Test results

Projects were compiled with standard options:
swift build -c release
cargo build --release The

rust version of Rust showed monstrous performance in 86 seconds, perhaps honestly fulfilling my to_string () (or maybe it didn’t translate to machine code at all? <Joke>). For Swift, the difference in the release and release versions was not significant. Compare only release versions.

Reading and processing speed of 1 million
Swift objects : 50 seconds
Rust : 4.31 seconds, that is 11.5 times faster The

size of the
Swift binary code :
The binary itself is 62 Kb, but the runtime libraries are 9 pieces worth 54.6 MB (I counted only those , without which the program does not really start)
Rust :
The binary turned out to be not small - 1.9 MB, but it is one ( "lto = true" compresses to 950 KB, but it compiles much longer).

Size of the
Swift source code : 189 lines, 4.5 Kb
Rust : 230 lines, 5.8 Kb

Impressions of the language
As for the coding, it is indisputable that Swift is smooth and pleasing to the eye, especially in comparison with the "rough" Rust, and the programs are more compact. I will not find fault with trifles, I will note only those rakes that he himself stepped during the study. Sorry, I can be subjective.

1) The naming of the objects of the standard Swift library (as well as Foundation) is not as intuitive and structured as in Rust, apparently due to the need to drag the heritage of ancestors. Without documentation, it is sometimes difficult to guess which method or object to look for. Overloaded constructors certainly add pleasant magic, but this approach does not seem to be youthful at all, and the principle of naming factory methods in Rust is closer to me - from_str (), from_utf8 (), etc.

2) The abundance of inherited objects + method overloading in Swift makes it easier for a novice programmer to shoot himself in the foot. For example, as an intermediate buffer of bytes read from a file, I first used Data (), which is just what the JSON parser needs to input. This Data has the same methods as Array, i.e. allows you to add bytes, and in fact it is one and the same. However, the performance with Data was several times (!) Lower than in the current version with Array. In Rust, the performance difference between vectors and slices is almost not felt, and the access APIs are so different that you can’t mix it up.
PS
In the comments, Swift specialists were able to speed up the code several times, but this is the magic of professionals, while Rust could only speed up by 14%. It turns out that the threshold for entering Rust is actually lower, and not higher, as is commonly thought, and the evil compiler leaves no freedom to "do something wrong."

3) The optional Swift data type (as well as the cast operator) are made syntactically more elegantly, through postfixes ?! - unlike the clumsy raster unwrap (). However, raster match allows you to uniformly process the types Option, Result, Value, getting, if necessary, access to the error text. In Swift, in different places, the Optional return, the exception throw is used, and this is sometimes confusing.

4) Declarations of internal functions in Swift do not pop up, so they have to be declared higher in the text, which is strange, because in all other languages internal functions can be declared at the end.

5) In Rust, there are curved syntax constructs, for example, if you need to check the JSON value for void, you have to write one of 2 funny nonsense:

if let Null = myVal {
    ...
}
match myVal {
    Null => {
        ...
    }
    _ => {}
}

although obvious options beg:

if myVal is Null {
    ...
}
if myVal == Option::Null {
    ...
}

Therefore, we have to create a bunch of is_str (), is_null (), is_f64 () methods for each enum type in libraries, which, of course, are terrible syntax crutches.
PS
Apparently, this will be fixed soon, in the comments there is a link to the proposal.

Summary

So what is so slow in swift? We decompose at the stage of:

1) Reading a file, streaming parsing with selecting
Swift objects : 7.46 seconds
Rust : 0.75 seconds

2) Parsing JSON into a dynamic
Swift object : 21.8 seconds
is a million calls: JSONSerialization.jsonObject (with: Data (obj))
Rust : 1.77 seconds
is a million calls: serde_json :: from_slice (& obj)

3) Converting Any into a typed
Swift structure : 16.01 seconds
Rust : 0.88 seconds
- I assume that I can write more optimally, but my Rust code is just as “dumb” as on Swift

4) Swift aggregation
: 4.74 seconds
Rust : 0.91 seconds

That is, we see that everything in the Swift language slows down, and it needs to be compared with systems like Node.js or Python, and I'm not sure whose benchmarking will be useful. Considering the enormous amount of runtime, you can forget about using devices in general. It turns out that the link counting algorithm is much slower than the garbage collector? Then what, are we all learning Go and MicroPython?

Rust is handsome, although the task was too simple, and there was no need to plunge into the hell of borrowing and lifetime. Of course, it would be nice to test the raster Rc <> for braking, but I also want to run this test on Node, Go and Java, but it's a pity for free time (although, according to my estimates, Javascript will be slower than 2.5 times slower).

PS
I would be grateful to the rastamans and swifters for comments - what is wrong with my code.

Source code

Swift:

main.swift

import Foundation
let FILE_BUFFER_SIZE = 50000 
//source data
class DebtRec {
    var company: String = ""
    var phones: Array = []
    var debt: Double = 0.0
}
//result data
class Debtor {
    var companies: Set = []
    var phones: Set = []
    var debt: Double = 0.0
}
class Debtors {
    var all: Array = []
    var index_by_phone: Dictionary = [:]
}
func main() {
    var res = Debtors()
    var fflag = 0
    for arg in CommandLine.arguments {
        if arg == "-f" {
            fflag = 1
        }
        else if fflag == 1 {
            fflag = 2
            print("\(arg):")
            let tbegin = Date()
            let (count, errcount) = process_file(fname: arg, res: &res)
            print("PROCESSED: \(count) objects in \(DateInterval(start: tbegin, end: Date()).duration)s, \(errcount) errors found")
        }
    }
    for (di, d) in res.all.enumerated() {
        print("-------------------------------")
        print("#\(di): debt: \(d.debt)")
        print("companies: \(d.companies)\nphones: \(d.phones)")
    }
    if fflag < 2 {
        print("USAGE: fastpivot -f \"file 1\" -f \"file 2\" ...")
    }
}
func process_file(fname: String, res: inout Debtors) -> (Int, Int) {
    var count = 0
    var errcount = 0
    if let f = FileHandle(forReadingAtPath: fname) {
        var obj: Array = []
        var braces = 0
        while true {
            let buf = f.readData(ofLength: FILE_BUFFER_SIZE)
            if buf.isEmpty {
                break //EOF
            }
            for b in buf {
                if b == 123 { // {
                    braces += 1
                    obj.append(b)
                }
                else if b == 125 { // }
                    braces -= 1
                    obj.append(b)
                    if braces == 0 { //object formed !
                        do {
                            let o = try JSONSerialization.jsonObject(with: Data(obj))
                            process_object(o: (o as! Dictionary), res: &res)
                        } catch {
                            print("JSON ERROR: \(obj)")
                            errcount += 1
                        }
                        count += 1
                        obj = []
                    }
                }
                else if braces > 0 {
                    obj.append(b)
                }
            }
        }
    } else {
        print("ERROR: Unable to open file")
    }
    return (count, errcount)
}
func process_object(o: Dictionary, res: inout Debtors) {
    let dr = extract_data(o)
    //print("\(dr.company) - \(dr.phones) - \(dr.debt)")
    var di: Optional = Optional.none //debtor index search result
    for p in dr.phones {
        if let i = res.index_by_phone[p] {
            di = Optional.some(i)
            break
        }
    }
    if let i = di { //existing debtor
        let d = res.all[i]
        d.companies.insert(dr.company)
        for p in dr.phones {
            d.phones.insert(p)
            res.index_by_phone[p] = i
        }
        d.debt += dr.debt
    }
    else { //new debtor
        let d = Debtor()
        let i = res.all.count
        d.companies.insert(dr.company)
        for p in dr.phones {
            d.phones.insert(p)
            res.index_by_phone[p] = i
        }
        d.debt = dr.debt
        res.all.append(d)
    }
}
func extract_data(_ o: Dictionary) -> DebtRec {
    func val2str(_ v: Any) -> String {
        if let vs = (v as? String) {
            return vs
        }
        else if let vi = (v as? Int) {
            return String(vi)
        }
        else {
            return "null"
        }
    }
    let dr = DebtRec()
    let c = o["company"]!
    if let company = (c as? Dictionary) {
        dr.company = val2str(company["name"]!)
    } else {
        dr.company = val2str(c)
    }
    let pp = o["phones"]
    if let pp = (pp as? Array) {
        for p in pp {
            dr.phones.append(val2str(p))
        }
    } 
    else if pp != nil {
        dr.phones.append(val2str(pp!))
    }       
    let p = o["phone"]
    if p != nil {
        dr.phones.append(val2str(p!))
    }        
    if let d = o["debt"] {
        if let dd = (d as? Double) {
            dr.debt = dd
        }
        else if let ds = (d as? String) {
            dr.debt = Double(ds)!
        }
    }
    return dr
}
main()

Rust:

main.rs

//[dependencies]
//serde_json = "1.0"
use std::collections::{HashMap, HashSet};
use serde_json::Value;
const FILE_BUFFER_SIZE: usize = 50000;
//source data
struct DebtRec {
    company: String,
    phones: Vec,
    debt: f64
}
//result data
struct Debtor {
    companies: HashSet,
    phones: HashSet,
    debt: f64
}
struct Debtors {
    all: Vec,
    index_by_phone: HashMap
}
impl DebtRec {
    fn new() -> DebtRec {
        DebtRec {
            company: String::new(),
            phones: Vec::new(),
            debt: 0.0
        }
    }
}
impl Debtor {
    fn new() -> Debtor {
        Debtor {
            companies: HashSet::new(),
            phones: HashSet::new(),
            debt: 0.0
        }
    }
}
impl Debtors {
    fn new() -> Debtors {
        Debtors {
            all: Vec::new(),
            index_by_phone: HashMap::new()
        }
    }
}
fn main() {
    let mut res = Debtors::new();
    let mut fflag = 0;
    for arg in std::env::args() {
        if arg == "-f" {
            fflag = 1;
        }
        else if fflag == 1 {
            fflag = 2;
            println!("{}:", &arg);
            let tbegin = std::time::SystemTime::now();
            let (count, errcount) = process_file(&arg, &mut res);
            println!("PROCESSED: {} objects in {:?}, {} errors found", count, tbegin.elapsed().unwrap(), errcount);
        }
    }
    for (di, d) in res.all.iter().enumerate() {
        println!("-------------------------------");
        println!("#{}: debt: {}", di, &d.debt);
        println!("companies: {:?}\nphones: {:?}", &d.companies, &d.phones);
    }
    if fflag < 2 {
        println!("USAGE: fastpivot -f \"file 1\" -f \"file 2\" ...");
    }
}
fn process_file(fname: &str, res: &mut Debtors) -> (i32, i32) { 
    use std::io::prelude::*;
    let mut count = 0;
    let mut errcount = 0;
    match std::fs::File::open(fname) {
        Ok(file) => {
            let mut freader = std::io::BufReader::with_capacity(FILE_BUFFER_SIZE, file);
            let mut obj = Vec::new();
            let mut braces = 0;
            loop {
                let buf = freader.fill_buf().unwrap();
                let blen = buf.len();
                if blen == 0 {
                    break; //EOF
                }
                for b in buf {
                    if *b == b'{' {
                        braces += 1;
                        obj.push(*b);
                    }
                    else if *b == b'}' {
                        braces -= 1;
                        obj.push(*b);
                        if braces == 0 { //object formed !
                            match serde_json::from_slice(&obj) {
                                Ok(o) => {
                                    process_object(&o, res);
                                }
                                Err(e) => {
                                    println!("JSON ERROR: {}:\n{:?}", e, &obj);
                                    errcount +=1;
                                }
                            }
                            count += 1;
                            obj = Vec::new();
                        }
                    }
                    else if braces > 0 {
                        obj.push(*b);
                    }
                }
                freader.consume(blen);
            }
        }
        Err(e) => {
            println!("ERROR: {}", e);
        }
    }
    return (count, errcount);
}
fn process_object(o: &Value, res: &mut Debtors) {
    let dr = extract_data(o);
    //println!("{} - {:?} - {}", &dr.company, &dr.phones, &dr.debt,);
    let mut di: Option = Option::None; //debtor index search result
    for p in &dr.phones {
        if let Some(i) = res.index_by_phone.get(p) {
            di = Some(*i);
            break;
        }
    }
    match di {
        Some(i) => { //existing debtor
            let d = &mut res.all[i];
            d.companies.insert(dr.company);
            for p in &dr.phones {
                d.phones.insert(p.to_string());
                res.index_by_phone.insert(p.to_string(), i);
            }
            d.debt += dr.debt;
        }
        None => { //new debtor
            let mut d = Debtor::new();
            let i = res.all.len();
            d.companies.insert(dr.company);
            for p in &dr.phones {
                d.phones.insert(p.to_string());
                res.index_by_phone.insert(p.to_string(), i);
            }
            d.debt = dr.debt;
            res.all.push(d);
        }
    }
}
fn extract_data(o: &Value) -> DebtRec {
    use std::str::FromStr;
    let mut dr = DebtRec::new();
    let c = &o["company"];
    dr.company =
        match c {
            Value::Object(c1) =>
                match &c1["name"] {
                    Value::String(c2) => c2.to_string(),
                    _ => val2str(c)
                },
            _ => val2str(c)
        };
    let pp =  &o["phones"];
    match pp {
        Value::Null => {}
        Value::Array(pp) => {
            for p in pp {
                dr.phones.push(val2str(&p));
            }
        }
        _ => {dr.phones.push(val2str(&pp))}
    }
    let p = &o["phone"];
    match p {
        Value::Null => {}
        _ => {dr.phones.push(val2str(&p))}
    }
    dr.debt =
        match &o["debt"] {
            Value::Number(d) => d.as_f64().unwrap_or(0.0),
            Value::String(d) => f64::from_str(&d).unwrap_or(0.0),
            _ => 0.0
        };
    return dr;
    fn val2str(v: &Value) -> String {
        match v {
            Value::String(vs) => vs.to_string(), //to avoid additional quotes
            _ => v.to_string()
        }
    }
}

Test file.

Tags:

Swift vs Rust - benchmarking on Linux with a (un) clear finish

Implementation details

Test results

Summary

Source code

Also popular now: