std::ifstream buffer caching

时间秒杀一切 提交于 2019-12-01 16:06:57
davka

Without getting into the discussion on stream buffers, you can get rid of the seek_back and generally make the code much simpler by doing:

using namespace std;
merge(istream_iterator<int>(file1), istream_iterator<int>(),
           istream_iterator<int>(file2), istream_iterator<int>(),
           ostream_iterator<int>(cout));

Edit:

Added binary capability

#include <algorithm>
#include <iterator>
#include <fstream>
#include <iostream>

struct BinInt
{
    int value;
    operator int() const { return value; }
    friend std::istream& operator>>(std::istream& stream, BinInt& data)
    {
        return stream.read(reinterpret_cast<char*>(&data.value),sizeof(int));
    }
};

int main()
{
    std::ifstream   file1("f1.txt");
    std::ifstream   file2("f2.txt");

    std::merge(std::istream_iterator<BinInt>(file1), std::istream_iterator<BinInt>(),
               std::istream_iterator<BinInt>(file2), std::istream_iterator<BinInt>(),
               std::ostream_iterator<int>(std::cout));
}

In decreasing order of performance (best first):

  • memory-mapped I/O
  • OS-specific ReadFile or read calls.
  • fread into a large buffer
  • ifstream.read into a large buffer
  • ifstream and extractors
Mike Dunlavey

A program like this should be I/O bound, meaning it should be spending at least 80% of it's time waiting for completion of reading or writing a buffer, and if the buffers are reasonably big, it should be keeping the disk heads busy. That's what you want.

Don't assume it is I/O bound, without proof. A way to prove it is by taking several stackshots. If it is, most of the samples will show the program waiting for I/O completion.

It is possible that it is not I/O bound, meaning you may find other things going on in some of the samples that you never expected. If so, then you know what to fix to speed it up. I have seen some code like this spending much more time than necessary in the merge loop, testing for end-of-file, getting data to compare, etc. for example.

You can just use the read function of an ifstream to read large blocks.

http://www.cplusplus.com/reference/iostream/istream/read/

The second parameter is the number of bytes. You should make this a multiple of 4 in your case - maybe 4096? :)

Simply read a chunk at a time and work on it.

As martin-york said, this may not have any beneficial effect on your performance, but try it and find out.

I think it is very likely that you can improve performance by reading big chunks.

Try opening the file with ios::binary as an argument, then use istream::read to read the data.

If you need maximum performance, I would actually suggest skipping iostreams altogether, and using cstdio instead. But I guess this is not what you want.

Unless there is something very special about your data it is unlikely that you will improve on the buffering that is built into the std::fstream object.

The std::fstream objects are designed to be very effecient for general purpose file access. It does not sound like you are doing anything special by accessing the data 4 bytes at a time. You can always profile your code to see where the actual time is spent in your code.

Maybe if you share the code with ous we could spot some major inefficiencies.

Edit:

I don't like your algorithm. Seeking back and forward may be hard on the stream especially of the number lies over a buffer boundary. I would only read one number each time through the loop.

Try this:
Note: This is not optimal (and it assumes stream input of numbers (while yours looks binary)) But I am sure you can use it as a starting point.

#include <fstream>
#include <iostream>

// Return the current val (that was the smaller value)
// and replace it with the next value in the stream.
int getNext(int& val, std::istream& str)
{
    int result = val;
    str >> val;

    return result;
}

int main()
{
    std::ifstream   f1("f1.txt");
    std::ifstream   f2("f2.txt");
    std::ofstream   re("result");

    int v1;
    int v2;

    f1 >> v1;
    f2 >> v2;

    // While there are values in both stream
    // Output one value and replace it using getNext()
    while(f1 && f2)
    {
        re << (v1 < v2)? getNext(v1, f1) : getNext(v2, f2);
    }
    // At this point one (or both) stream(s) is(are) empty.
    // So dump the other stream.
    for(;f1;f1 >> v1)
    {
        // Note if the stream is at the end it will
        // never enter the loop
        re << v1;
    }
    for(;f2;f2 >> v2)
    {
        re << v2;
    }
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!