问题
i would like to understand how to use the buffer of a read file.
Assuming we have a big file with a list of emails line by line ( delimiter is a classic \n )
now, we want compare each line with each record of a table in our database in a kind of check like line_of_file == table_row.
this is a simple task if you have a normal file, otherwise, if you have a huge file the server usually stop the operation after few minute.
so what's the best way of doing this kind of stuff with the file buffer?
what i have so far is something like this:
$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
if ( preg_match('/'.$email.'/im',$buffer)) {
echo $row_val;
}
}
$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer);
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
if ( in_array($email,$lines)) {
echo $row_val;
}
}
回答1:
Like already suggested in my closevotes to your question (hence CW):
You can use SplFileObject which implements Iterator to iterate over a file line by line to save memory. See my answers to
- Least memory intensive way to read a file in PHP and
- How to save memory when reading a file in Php?
for examples.
回答2:
Don't use file_get_contents for large files. This pulls the entire file into memory all at once. You have to read it in pieces
$fp = fopen('file.txt', 'r');
while(!feof($fp)){
//get onle line
$buffer = fgets($fp);
//do your stuff
}
fclose($fp);
回答3:
Open the file with fopen() and read it incrementally. Probably one line at a time with fgets().
file_get_contents reads the whole file into memory, which is undesirable if the file is larger than a few megabytes
Depending on how long this takes, you may need to worry about the PHP execution time limit, or the browser timing out if it doesn't receive any output for 2 minutes.
Things you might try:
set_time_limit(0)to avoid running up against the PHP time limit- Make sure to output some data every 30 seconds or so so the browser doesn't time out; make sure to
flush();and possiblyob_flush();so your output is actually sent over the network (this is a kludge) - start a separate process (e.g. via
exec()) to run this in the background. Honestly, anything that takes more than a second or two is best run in the background
来源:https://stackoverflow.com/questions/5970632/php-how-to-read-big-remote-files-efficiently-and-use-buffer-in-loop