How to read 4GB file on 32bit system

前端 未结 4 1791
予麋鹿
予麋鹿 2020-12-19 14:53

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that so

相关标签:
4条回答
  • 2020-12-19 15:24

    Nice to see you found my benchmark at How to parse space-separated floats in C++ quickly?

    It seems you're really looking for the fastest way to count lines (or any linear single pass analysis), I've done a similar analysis and benchmark of exactly that here

    • Fast textfile reading in c++

    Interestingly, you'll see that the most performant code does not need to rely on memory mapping at all there.

    static uintmax_t wc(char const *fname)
    {
        static const auto BUFFER_SIZE = 16*1024;
        int fd = open(fname, O_RDONLY);
        if(fd == -1)
            handle_error("open");
    
        /* Advise the kernel of our access pattern.  */
        posix_fadvise(fd, 0, 0, 1);  // FDADVICE_SEQUENTIAL
    
        char buf[BUFFER_SIZE + 1];
        uintmax_t lines = 0;
    
        while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
        {
            if(bytes_read == (size_t)-1)
                handle_error("read failed");
            if (!bytes_read)
                break;
    
            for(char *p = buf; (p = (char*) memchr(p, '\n', (buf + bytes_read) - p)); ++p)
                ++lines;
        }
    
        return lines;
    }
    
    0 讨论(0)
  • 2020-12-19 15:35

    Since this is windows, you can use the native windows file functions with the "ex" suffix:

    windows file management functions

    specifically the functions like GetFileSizeEx(), SetFilePointerEx(), ... . Read and write functions are limited to 32 bit byte counts, and the read and write "ex" functions are for asynchronous I/O as opposed to handling large files.

    0 讨论(0)
  • 2020-12-19 15:41

    The case of a 64-bit system with small memory should be fine to load a large file into - it's all about address space - although it may well be slower than the "fastest" option in that case, it really depends on what else is in memory and how much of the memory is available for mapping the file into. In a 32-bit system, it won't work, since the pointers into the filemapping won't go beyond about 3.5GB at the very most - and typically around 2GB is the maximum - again, depending on what memory addresses are available to the OS to map the file into.

    However, the benefit of memory mapping a file is pretty small - the huge majority of the time spent is from actually reading the data. The saving from using memory mapping comes from not having to copy the data once it's loaded into RAM. (When using other file-reading mechanisms, the read function will copy the data into the buffer supplied, where memory mapping a file will stuff it straight into the correct location directly).

    0 讨论(0)
  • 2020-12-19 15:46

    You might want to look at increasing the buffer for the ifstream - the default buffer is often rather small, this leads to lots of expensive reads.

    You should be able to do this using something like:

    std::ifstream file(filename_xml.c_str());
    char buffer[1024*1024];
    file.rdbuf()->pubsetbuf(buffer, 1024*1024);
    
    uintmax_t m_numLines = 0;
    std::string str;
    while (std::getline(file, str))
    {
        m_numLines++;
    }
    

    See this question for more info:

    How to get IOStream to perform better?

    0 讨论(0)
提交回复
热议问题