How slow are iostreams?

    The I / O streams in the C ++ standard library are easy to use, type-safe, leak-proof, and allow easy error handling. However, the reputation of "slow" was fixed to them. There are several reasons for this, such as the widespread use of dynamic allocation and virtual functions. In general, streams are one of the oldest parts of the standard library (they began to be used around 1988), and many of their decisions are now perceived as “controversial”. However, they are widely used, especially when you need to write some simple program that works with text data.

    The performance issue of iostreams is not an idle one. In particular, the problem of console I / O performance can be encountered in sports programming systems, where even using a good algorithm, you can’t miss the time just because of I / O. I also encountered this problem when processing scientific data in text format.

    Today in the comments of the post there was a discussion about the iostreams slowness. In particular, freopen writes
    It's funny to look at your optimizations located next to reading through cin :)

    and aesamson makes such a recommendation
    You can replace it with getchar_unlocked () for * nix or getchar () for everyone else.
    getchar_unlocked> getchar> scanf> cin, where ">" means faster.


    In this post I will dispel and confirm some myths and give a couple of recommendations.


    All measurements in this post are for the Ubuntu 14.10 system with the GCC 4.9.1 compiler, compiled with keys
    g++ -Wall -Wextra -std=c++11 -O3
    

    The launch was carried out on a laptop with an Intel Core2 Duo P8600 processor (2.4 GHz).

    Formulation of the problem


    In sports programming, as well as in the UNIX-way, usually the input is fed into the input stream. So, the task:

    A lot of non-negative integers, one per line, enter the input stream (stdin). The program should output the maximum of the input numbers.

    We form the input data
    seq 10000000 > data
    

    We wrote 10 million consecutive integers to the data file, with a total volume of 76 megabytes.
    We will run the program like this
    time ./a.out < data
    

    So, let's get started.

    1. scanf


    Let's solve the problem using the good old scanf.
    int max_scanf()
    {
        int x;
        int max = -1;
        while (scanf("%d", &x) == 1)
        {
            max = std::max(x, max);
        }
        return max;
    }
    

    When using scanf, it is important not to forget to always check the return value - this is the number of arguments actually read and filled (GCC with -Wall will remind you of this). In our case, upon successful reading, the return value should be 1.
    Main function
    int main()
    {
        std::cout << max_scanf() << std::endl;
        return 0;
    }
    

    Operating time: 1.41 c

    2. The naive std::cin


    Now we will solve the problem in the simplest way using iostreams:
    int max_iostream(std::istream & f)
    {
        int x;
        int max = -1;
        while(f >> x)
            max = std::max(x, max);
        return max;
    }
    

    Hours: 4.41 c
    Wow! Threads were 3 times slower than scanf! That is, it turns out that iostream is really worthless in speed?

    3. Fast std::cin


    In fact, to correct the situation, it is enough to add one single line to the program. At the very beginning of the main function, insert:
    std::ios::sync_with_stdio(false);
    

    What does it mean?
    In order for the program to mix iostreams and stdio, this synchronization was introduced. By default, when working with standard streams ( std::cin, std::cout, std::cerr...), the buffer is flushed after each I / O operation so that the data is not mixed. If we intend to use only iostream, then we can turn off this synchronization. Read more at cppreference .
    Hours: 1.33 c
    A completely different matter! Moreover, it is faster than scanf! That is, not everything is so bad. The advantages of iostreams include ease of use, type safety and easier error handling.

    All subsequent options using std :: cin will use this optimization.

    4. The naive std::istringstream


    In addition to input from a file, the standard library also provides classes for input from a string with the same interface. Let's see how slow it is. We will read one line from the input stream, and then parse it with std::istringstream:
    int max_iostream_iss(std::istream & f)
    {
        int x;
        int max = -1;
        std::string line;
        while (std::getline(f, line))
        {
            std::istringstream iss(line);
            if(! (iss >> x))
                break;
            max = std::max(x, max);
        }
        return max;
    }
    

    Operating time: 7.21 c
    Very slowly!

    5. Reuse std::istringstream


    It may seem surprising, but the slowest thing about istringstreamit is its creation. And we re-create for each input line. Let's try to reuse the same object:
    int max_iostream_iss_2(std::istream & f)
    {
        int x;
        int max = -1;
        std::string line;
        std::istringstream iss(line);
        while (std::getline(f, line))
        {
            iss.clear();        // Сбрасываем флаги ошибок
            iss.str(line);      // Задаём новый буфер
            if(! (iss >> x))
                break;
            max = std::max(x, max);
        }
        return max;
    }
    

    Note that you need 2 calls - clear to clear the status flags, and str to set a new buffer from which to read.

    Hours: 2.16 c
    This is another matter. This is expectedly slower than reading directly from std::cin(data passes 2 times through stream classes), but not catastrophic.

    6. We want even faster! (getchar / getchar_unlocked)


    What if performance is still lacking? Use lower level I / O and custom parser. In the comments to the aforementioned topic, aesamson gave an example of code that implements the simplest parser of integers (probably taken from StackOverflow). For reading from the input stream, a thread getchar_unlocked-safe version is used getchar. I added a skip of extra characters and the simplest end of file processing:
    bool read_int_unlocked(int & out)
    {
        int c = getchar_unlocked();
        int x = 0;
        int neg = 0;
        for (; !('0'<=c && c<='9') && c != '-'; c = getchar_unlocked())
        {
            if (c == EOF)
                return false;
        }
        if (c == '-')
        {
            neg = 1;
            c = getchar_unlocked();
        }
        if (c == EOF)
            return false;
        for (; '0' <= c && c <= '9' ; c = getchar_unlocked())
        {
            x = x*10 + c - '0';
        }
        out = neg ? -x : x;
        return true;
    }
    int max_getchar_unlocked()
    {
        int x;
        int max = -1;
        while(read_int_unlocked(x))
            max = std::max(x, max);
        return max;
    }
    

    Operating time: getchar0.82 s, getchar_unlocked0.28 s!
    Very good result! And you can see how big the slowdown is due to locks for thread safety.
    But this approach has its drawbacks - it is necessary to write parsers for all the data types used (and this is not so simple even for floating point numbers), the complexity of error handling, thread safety in the case getchar_unlocked. Alternatively - you can try to use parser generator, for example re2c, boost::Spirit::Qietc. (a lot of them).

    7. C ++ 11: std::stoi


    Thanks to Lol4t0 for recalling in the comments about the functions that appeared in C ++ 11 std::stoi/std::stol/std::stoll. We will read one line at a time using getline, and then parse it using stol. The code will look like this:
    int max_stoi(std::istream & f)
    {
        int max = -1;
        std::string line;
        while (std::getline(f, line))
        {
            int x = std::stoi(line);
            max = std::max(x, max);
        }
        return max;
    }
    

    Hours: 1.04 c
    This is the fastest standard way to read integers. (And for floating-point numbers, there are similar functions stof / stod).

    8. Bonus: Reading in large blocks + Boost :: Spirit


    Let's try to write the fastest option. We will read the input data in large blocks and then parse it using Boost :: Spirit :: Qi , which claims to be a generator of very fast parsers. This is a compile-time generator: we describe a C ++ grammar in a notation similar to BNF, and during compilation using parser metaprogramming magic is generated.
    Code (attention, boost and metaprogramming detected!)
    #include 
    #include 
    #include 
    #include 
    template 
    Iterator max_parser(Iterator first, Iterator last, int& max)
    {
        namespace qi = boost::spirit::qi;
        namespace ascii = boost::spirit::ascii;
        namespace phoenix = boost::phoenix;
        using qi::int_;
        using qi::_1;
        using ascii::space;
        using phoenix::ref;
        using phoenix::if_;
        using qi::eoi;
        using qi::lexeme;
        bool r = qi::phrase_parse(first, last,
            //  Begin grammar
            (
                *lexeme[int_ >> (!eoi)][if_(_1 > ref(max))[ref(max) = _1]]
            )
            ,
            //  End grammar
            space);
        return first;
    }
    int max_spirit(std::istream & f)
    {
        size_t chunk_size = 1 << 20;
        std::unique_ptr buffer(new char[2*chunk_size]);
        char * middle = buffer.get() + chunk_size;
        char * begin = middle;
        int max = -1;
        while(true)
        {
            f.read(middle, chunk_size);
            if (f.gcount() == 0)
                break;
            char * end = middle + f.gcount();
            char * last = max_parser(begin, end, max);
            if (last < middle)
                break;
            // copy the remaining data just before the middle:
            begin = middle - (end - last);
            std::copy(last, end, begin);
        }
        return max;
    }
    


    Operating time: 0.18 c
    This is a record!

    Results and Tips


    Working hours:

    NoMethodGCC 4.9.1clang 3.5.0 + libc ++GCC 100M *
    1scanf1.411.48
    2std :: cin4.4113.30
    3std :: cin and std :: ios :: sync_with_stdio (false)1.3313.24
    4std :: istringstream7.219.16
    5std :: istringstream with reuse2.167.92
    6agetchar0.820.849.14
    6bgetchar_unlocked0.280.262.94
    7std :: getline + std :: stoi1.043.5310.8
    8Big Block + Boost :: Spirit0.181.671.90

    * - Measurements on a file with 100 million numbers (file size 848 megabytes).

    Recommendations:

    • In C ++ 11, the fastest standard way to read numbers from a stream is std::getline+ std::stol(in combination with sync_with_stdio(false)if a standard stream is used std::cin). This method is noticeably faster than scanf and second only to getchar methods.
    • In order to speed up std::cin/std::cout, you can use. std::ios::sync_with_stdio(false);In this case, the speed will become comparable or better than scanf. (Just make sure you don't mix streaming and stdio operations on the same thread)
    • Istringstream has a very slow constructor. Therefore, performance can be seriously increased if you reuse a stream object.
    • More performance can be achieved using getchar_unlocked(or just getcharif thread safety is needed) and a custom parser.
    • You can achieve even greater performance if you read data in large chunks and then work exclusively in memory.

    Attention! The results are valid only on a particular system and can vary greatly on other systems! In particular, I quickly tried clang + libc ++ and got much worse thread performance (whereas using libstdc ++ and clang and gcc gave almost identical results). Be sure to test performance when applying tips! (And, ideally, report the results on your system in the comments so others can take advantage). Full code is available here .

    Update 1. On the advice of Lol4t0 , method number 7 was added.
    Update 2.Added runtimes to the table in clang + libc ++ (version 3.5.0, executed on the same system). It can be seen that the performance of the threads is very poor, and besides, the trick to turn off synchronization does not work. As a result, stoi is 2 times slower than scanf.
    Update 3. Added option number 8: reading in large blocks and parsing using Boost :: Spirit. And this is the champion!

    Also popular now: