reading a block of lines in a file using php

后端 未结 5 1497
旧时难觅i
旧时难觅i 2020-12-20 03:16

Considering i have a 100GB txt file containing millions of lines of text. How could i read this text file by block of lines using PHP?

i can\'t use file_get_

相关标签:
5条回答
  • 2020-12-20 03:49

    I would recommend implementing the reading of a single line within a function, hiding the implementation details of that specific step from the rest of your code - the processing function must not care how the line was retrieved. You can then implement your first version using fgets() and then try other methods if you notice that it is too slow. It could very well be that the initial implementation is too slow, but the point is: you won't know until you've benchmarked.

    0 讨论(0)
  • 2020-12-20 03:52

    I know this is an old question, but I think there is value for a new answer for anyone that finds this question eventually.

    I agree that reading 100GB takes time, that I why I also agree that we need to find the most effective option to read it so it can be as little as possible instead of just thinking "who cares how much it is if is already a lot", so, lets find out our lowest time possible.

    Another solution:

    Cache a chunk of raw data

    Use fread to read a cache of that data

    Read line by line

    Read line by line from the cache until end of cache or end of data found

    Read next chunk and repeat

    Grab the un processed last part of the chunk (the one you were looking for the line delimiter) and move it at the front, then reads a chunk of the size you had defined minus the size of the unprocessed data and put it just after that un processed chunk, then, there you go, you have a new complete chunk.
    Repeat the read by line and this process until the file is read completely.

    You should use a cache chunk bigger than any expected size of line.

    The bigger the cache size the faster you read, but the more memory you use.

    0 讨论(0)
  • 2020-12-20 03:58

    The fread approach sounds like a reasonable solution. You can detect whether you've reached the end of a line by checking whether the final character in the string is a newline character ('\n'). If it isn't, then you can either read some more characters and append them to your existing string, or you can trim characters from your string back to the last newline, and then use fseek to adjust your position in the file.

    Side point: Are you aware that reading a 100GB file will take a very long time?

    0 讨论(0)
  • 2020-12-20 04:03

    i think that you have to use fread($fp, somesize), and check manually if you have founded the end of the line, otherwise read another chunk.

    Hope this helps.

    0 讨论(0)
  • 2020-12-20 04:06

    i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.

    Don't see, why you shouldn't be able to use fgets()

    $blocksize = 50; // in "number of lines"
    while (!feof($fh)) {
      $lines = array();
      $count = 0;
      while (!feof($fh) && (++$count <= $blocksize)) {
        $lines[] = fgets($fh);
      }
      doSomethingWithLines($lines);
    }
    

    Reading 100GB will take time anyway.

    0 讨论(0)
提交回复
热议问题