Fastest way to find the number of lines in a text (C++)

后端 未结 8 948
臣服心动
臣服心动 2020-12-07 23:12

I need to read the number of lines in a file before doing some operations on that file. When I try to read the file and increment the line_count variable at each iteration u

8条回答
  •  再見小時候
    2020-12-07 23:45

    The only way to find the line count is to read the whole file and count the number of line-end characters. The fastest way tom do this is probably to read the whole file into a large buffer with one read operation and then go through the buffer counting the '\n' characters.

    As your current file size appears to be about 60Mb, this is not an attractive option. You can get some of the speed by not reading the whole file, but reading it in chunks., say of size 1Mb. You also say that a database is out of the question, but it really does look to be the best long-term solution.

    Edit: I just ran a small benchmark on this and using the buffered approach (buffer size 1024K) seems to be a bit more than twice as fast as reading a line at a time with getline(). Here's the code - my tests were done with g++ using -O2 optimisation level:

    #include 
    #include 
    #include 
    #include 
    using namespace std;
    
    unsigned int FileRead( istream & is, vector  & buff ) {
        is.read( &buff[0], buff.size() );
        return is.gcount();
    }
    
    unsigned int CountLines( const vector  & buff, int sz ) {
        int newlines = 0;
        const char * p = &buff[0];
        for ( int i = 0; i < sz; i++ ) {
            if ( p[i] == '\n' ) {
                newlines++;
            }
        }
        return newlines;
    }
    
    int main( int argc, char * argv[] ) {
        time_t now = time(0);
        if ( argc == 1  ) {
            cout << "lines\n";
            ifstream ifs( "lines.dat" );
            int n = 0;
            string s;
            while( getline( ifs, s ) ) {
                n++;
            }
            cout << n << endl;
        }
        else {
            cout << "buffer\n";
            const int SZ = 1024 * 1024;
            std::vector  buff( SZ );
            ifstream ifs( "lines.dat" );
            int n = 0;
            while( int cc = FileRead( ifs, buff ) ) {
                n += CountLines( buff, cc );
            }
            cout << n << endl;
        }
        cout << time(0) - now << endl;
    }
    

提交回复
热议问题