Modifying a text file without reading into memory

浪尽此生 提交于 2021-02-05 06:54:06

问题


I was trying to figure out a way to modify a text file (specially deleting specific lines) without reading a big part of file into memory or rewriting the whole file. Here am talking about files larger than main memory about 15-50 Gigs.

P.S. I am using Linux.


回答1:


You aren't going to get around making a new file, so just bite the bullet and do it. Use grep with appropriate options and pipe the result to a second file:

$ grep -fv patternsToExcludeFromInput input > output

Another approach is to put patterns into, as examples, a hash table (Perl), a dictionary (Python), or an unordered_map (C++), and process each line of your input file to look for matches.

If there is no match, print the line to the standard output stream (which you can pipe to a regular file). Your memory usage will be limited mostly to the hash table and the line of input you are querying.




回答2:


If the file is way larger than memory, sed is your friend. It acts as a filter between your old file and a new file, and at the end, you just have to rename the new file to the old name. The syntax is a bit strange for newcomers, but it is really powerful, being able to select lines by number, by regexes, or by range, and apply insertions, deletions or string substutions.




回答3:


You can open the file in "rw" mode and use fseek, fread, fwrite to read/write portions of it. You must pay attention of not overwriting the part you have not read yet. So to delete a line you read and write forward, to insert a line you read and write backward (starting from the end of file).

example

To remove the first 100 bytes from the beginning of your file you could do something like:

FILE *fp = fopen(filename,"rw");
size_t BLOCK_SIZE = 1024;
char buffer[BLOCK_SIZE];
size_t offset = 100;
fseek(fp,0,SEEK_END);
size_t length = ftell(fp);
for (size_t i=0; i< (length-offset+BLOCK_SIZE-1) / BLOCK_SIZE; ++i) {
  fseek(fp,i*BLOCK_SIZE + offset,SEEK_SET);
  size_t count = fread(fp,buffer,sizeof(char),BLOCK_SIZE);
  fseek(fp,i*BLOCK_SIZE,SEEK_SET);
  fwrite(fp,buffer,sizeof(char),count);
}


来源:https://stackoverflow.com/questions/24615237/modifying-a-text-file-without-reading-into-memory

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!