Speed up integer reading from file in C++

喜你入骨 提交于 2019-12-03 21:48:50

This program

#include <iostream>
int main ()
{
    int num;
    while (std::cin >> num) ;
}

needs about 17 seconds to read a file. This code

#include <iostream>   
int main()
{
    int lc = 0;
    int item = 0;
    char buf[2048];
    do
    {
        std::cin.read(buf, sizeof(buf));
        int k = std::cin.gcount();
        for (int i = 0; i < k; ++i)
        {
            switch (buf[i])
            {
                case '\r':
                    break;
                case '\n':
                    item = 0; lc++;
                    break;
                case ' ':
                    item = 0;
                    break;
                case '0': case '1': case '2': case '3':
                case '4': case '5': case '6': case '7':
                case '8': case '9':
                    item = 10*item + buf[i] - '0';
                    break;
                default:
                    std::cerr << "Bad format\n";
            }    
        }
    } while (std::cin);
}

needs 1.25 seconds for the same file. Make what you want of it...

Streams are slow. If you really want to do stuff fast load the entire file into memory, and parse it in memory. If you really can't load it all into memory, load it in chunks, making those chunks as large as possible, and parse the chunks in memory.

When parsing in memory, replace the spaces and line endings with nulls so you can use atoi to convert to integer as you go.

Oh, and you'll get problems with the end of chunks because you don't know whether the chunk end cuts off a number or not. To solve this easily stop a small distance (16 byte should do) before the chunk end and copy this tail to the start before loading the next chunk after it.

Have you tried input iterators?

It skips the creation of the strings:

std::istream_iterator<int> begin(infile);
std::istream_iterator<int> end;
int item = 0;
while(begin != end)
    item = *begin++;

Why don't you skip the stream and the line buffers and read from the file stream directly?

template<class T, class CharT, class CharTraits>
std::vector<T> read(std::basic_istream<CharT, CharTraits> &in) {
    std::vector<T> ret;
    while(in.good()) {
        T x;
        in >> x;
        if(in.good()) ret.push_back(x);
    }
    return ret;
}

http://ideone.com/FNJKFa

Following up Jack Aidley's answer (can't put code in the comments), here's some pseudo-code:

vector<char> buff( chunk_size );
roffset = 0;
char* chunk = &buff[0];
while( not done with file )
{
    fread( chunk + roffset, ... ); // Read a sizable chunk into memory, filling in after roffset
    roffset = find_last_eol(chunk); // find where the last full line ends
    parse_in_mem( chunk, chunk_size - roffset ); // process up to the last full line
    move_unprocessed_to_front( chunk, roffset ); // don't re-read what's already in mem
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!