Fastest way to read a text file of strings line by line [duplicate]

最后都变了- 提交于 2020-02-04 01:57:13

问题


Possible Duplicate:
What is the Fastest Method for High Performance Sequential File I/O in C++?

I have looked around a little bit and I am still not sure of the answer to this question.

When reading from a text file with an arbitrary word on every line, what would be the absolute fastest way of reading the words from that file? The scope of the project requires the fastest possible file read.

Using Visual Studio on Windows 7. No cross platform consideration.

Edit: Keep in mind, this file read is a one time thing, it will not be read from again and it will not be written to. The program starts, reads from the file, pushes it into a data structure and the loadFile() function is never called again.


回答1:


As I understand your question your objective is to read a file of words and insert each word into some data structure. You want this read+insertion to be as fast as possible. (I won't debate the rationale for or the wisdom of this, I'll just accept is as a requirement. :-) ) If my understanding is correct, then perhaps an alternative approach would be to write a utility program that will read the file of words, insert them into the data structure, and then serialize that data structure to a file (say BLOB.dat, for example). Then your main program will deserialize BLOB.dat into the data structure that you require. Essentially you pre-process the words file into some intermediate binary format that can be loaded into your data structure most efficiently. Or would this be cheating in your scenario??




回答2:


The fact that you have this tagged "multithreading" makes me think that you're considering a threaded read on the file. I'd really really recommend you reconsider, as this will cause very hairy concurrency issues to rear their ugly heads. You'll have to delve deep into the rabbit hole of mutexes, semaphores and inter-process communication, which can make even the best developers weep for the good old days before threads.

You have a .txt file, and you have words in that file to read. You have to open the file, and you have to read every word. There's just no getting around it. Unless you're willing to process the text file into a data structure made for concurrent access (intel TBB has some good ones) your best bet might be to just do a single-threaded read and pass data to other threads after everything is local.




回答3:


Either memory-map the file or read it in large fixed-sized chunks and process the data in memory.




回答4:


Do not memory map the file. As Raymond Chen explains, that kills the sequential access optimization. Since disks are slow, prefetching will keep the disk busy and therefore the throughput higher.




回答5:


Your file will probably load itself as fast as it is able to. After all most file operations just call the same system calls. IOstreams is said to be slower than cstdio, but I suggest you use a profiling tool here to find the best set of options here. Tweak the buffer size to match your need. But, unfortunately, with large files most of the time you will spend waiting for IO, only a minuscule time is used for processing. Tweaking how you load won't buy you much.

But since you are going to wait make sure that you use your time wisely.

Spawn a thread to load the file immediately when the application starts, and use that time time to do anything else. If you need the data to do anything, pass chunks of the read file to the other thread to process.



来源:https://stackoverflow.com/questions/9356216/fastest-way-to-read-a-text-file-of-strings-line-by-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!