reading a block of lines in a file using php

徘徊边缘 提交于 2019-11-29 12:13:06

i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.

Don't see, why you shouldn't be able to use fgets()

$blocksize = 50; // in "number of lines"
while (!feof($fh)) {
  $lines = array();
  $count = 0;
  while (!feof($fh) && (++$count <= $blocksize)) {
    $lines[] = fgets($fh);
  }
  doSomethingWithLines($lines);
}

Reading 100GB will take time anyway.

The fread approach sounds like a reasonable solution. You can detect whether you've reached the end of a line by checking whether the final character in the string is a newline character ('\n'). If it isn't, then you can either read some more characters and append them to your existing string, or you can trim characters from your string back to the last newline, and then use fseek to adjust your position in the file.

Side point: Are you aware that reading a 100GB file will take a very long time?

i think that you have to use fread($fp, somesize), and check manually if you have founded the end of the line, otherwise read another chunk.

Hope this helps.

I would recommend implementing the reading of a single line within a function, hiding the implementation details of that specific step from the rest of your code - the processing function must not care how the line was retrieved. You can then implement your first version using fgets() and then try other methods if you notice that it is too slow. It could very well be that the initial implementation is too slow, but the point is: you won't know until you've benchmarked.

I know this is an old question, but I think there is value for a new answer for anyone that finds this question eventually.

I agree that reading 100GB takes time, that I why I also agree that we need to find the most effective option to read it so it can be as little as possible instead of just thinking "who cares how much it is if is already a lot", so, lets find out our lowest time possible.

Another solution:

Cache a chunk of raw data

Use fread to read a cache of that data

Read line by line

Read line by line from the cache until end of cache or end of data found

Read next chunk and repeat

Grab the un processed last part of the chunk (the one you were looking for the line delimiter) and move it at the front, then reads a chunk of the size you had defined minus the size of the unprocessed data and put it just after that un processed chunk, then, there you go, you have a new complete chunk.
Repeat the read by line and this process until the file is read completely.

You should use a cache chunk bigger than any expected size of line.

The bigger the cache size the faster you read, but the more memory you use.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!