问题
#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<char> f1()
{
ifstream fin{ "input.txt", ios::binary };
return
{
istreambuf_iterator<char>(fin),
istreambuf_iterator<char>()
};
}
vector<char> f2()
{
vector<char> coll;
ifstream fin{ "input.txt", ios::binary };
char buf[1024];
while (fin.read(buf, sizeof(buf)))
{
copy(begin(buf), end(buf),
back_inserter(coll));
}
copy(begin(buf), begin(buf) + fin.gcount(),
back_inserter(coll));
return coll;
}
int main()
{
f1();
f2();
}
Obviously, f1()
is more concise than f2()
; so I prefer f1()
to f2()
. However, I worry that f1()
is less efficient than f2()
.
So, my question is:
Will the mainstream C++ compilers optimize f1()
to make it as fast as f2()
?
Update:
I have used a file of 130M to test in release mode (Visual Studio 2015 with Clang 3.8):
f1()
takes 1614
ms, while f2()
takes 616
ms.
f2()
is faster than f1()
.
What a sad result!
回答1:
I've checked your code on my side using with mingw482
.
Out of curiosity I've added an additional function f3
with the following implementation:
inline vector<char> f3()
{
ifstream fin{ filepath, ios::binary };
fin.seekg (0, fin.end);
size_t len = fin.tellg();
fin.seekg (0, fin.beg);
vector<char> coll(len);
fin.read(coll.data(), len);
return coll;
}
I've tested using a file ~90M
long. For my platform the results were a bit different than for you.
- f1() ~850ms
- f2() ~600ms
- f3() ~70ms
The results were calculated as mean of 10 consecutive file reads.
The f3
function takes the least time since at vector<char> coll(len);
it has all the required memory allocated and no further reallocations need to be done. As to the back_inserter it requires the type to have push_back
member function. Which for vector does the reallocation when capacity
is exceeded. As described in docs:
push_back
This effectively increases the container size by one, which causes an automatic reallocation of the allocated storage space if -and only if- the new vector size surpasses the current vector capacity.
Among f1
and f2
implementations the latter is slightly faster although both use the back_inserter
. The f2
is probably faster since it reads the file in chunks which allows some buffering to take place.
回答2:
If smaller than some GB you can read all at once:
#include "sys/stat.h"
....
char* buf;
FILE* fin;
filename="myfile.cgt";
#ifdef WIN32
struct stat st;
if (stat(filename, &st) == -1) return 0;
#else
struct _stat st;
if (_stat(filename, &st) == -1) return 0;
#endif
fin = fopen(filename, "rb");
if (!fin) return 0;
buf = (char*)malloc(st.st_size);
if (!buf) {fclose(fin); return 0;}
fread(buf, st.st_size, 1, fin);
fclose(fin);
No need to say you should use "new" in C++ not malloc()
来源:https://stackoverflow.com/questions/41139764/how-to-read-a-file-into-a-vector-elegantly-and-efficiently