What's the difference between iterating over a file with foreach or while in Perl?

ぃ、小莉子 提交于 2019-12-17 09:23:10

问题


I have a filehandle FILE in Perl, and I want to iterate over all the lines in the file. Is there a difference between the following?

while (<FILE>) {
    # do something
}

and

foreach (<FILE>) {
    # do something
}

回答1:


For most purposes, you probably won't notice a difference. However, foreach reads each line into a list (not an array) before going through it line by line, whereas while reads one line at a time. As foreach will use more memory and require processing time upfront, it is generally recommended to use while to iterate through lines of a file.

EDIT (via Schwern): The foreach loop is equivalent to this:

my @lines = <$fh>;
for my $line (@lines) {
    ...
}

It's unfortunate that Perl doesn't optimize this special case as it does with the range operator (1..10).

For example, if I read /usr/share/dict/words with a for loop and a while loop and have them sleep when they're done I can use ps to see how much memory the process is consuming. As a control I've included a program that opens the file but does nothing with it.

USER       PID %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
schwern  73019   0.0  1.6   625552  33688 s000  S     2:47PM   0:00.24 perl -wle open my $fh, shift; for(<$fh>) { 1 } print "Done";  sleep 999 /usr/share/dict/words
schwern  73018   0.0  0.1   601096   1236 s000  S     2:46PM   0:00.09 perl -wle open my $fh, shift; while(<$fh>) { 1 } print "Done";  sleep 999 /usr/share/dict/words
schwern  73081   0.0  0.1   601096   1168 s000  S     2:55PM   0:00.00 perl -wle open my $fh, shift; print "Done";  sleep 999 /usr/share/dict/words

The for program is consuming almost 32 megs of real memory (the RSS column) to store the contents of my 2.4 meg /usr/share/dict/words. The while loop only stores one line at a time consuming just 70k for line buffering.




回答2:


In scalar context (i.e. while) <FILE> returns each line in turn.

In list context (i.e. foreach) <FILE> returns a list consisting of each line from the file.

You should use the while construct.

See perlop - I/O Operators for more.

Edit: j_random_hacker rightly says that

while (<FILE>) { … }

tramples on $_ while foreach does not (foreach localises $_ first). Surely this is the most important behavioural difference!




回答3:


In addition to the previous responses, another benefit of using while is that you can use the $. variable. This is the current line number of the last filehandle accessed (see perldoc perlvar).

while ( my $line = <FILE> ) {
    if ( $line =~ /some_target/ ) {
        print "Found some_target at line $.\n";
    }
}



回答4:


I added an example dealing with this to the next edition of Effective Perl Programming.

With a while, you can stop processing FILE and still get the unprocessed lines:

 while( <FILE> ) {  # scalar context
      last if ...;
      }
 my $line = <FILE>; # still lines left

If you use a foreach, you consume all of the lines in the foreach even if you stop processing them:

 foreach( <FILE> ) { # list context
      last if ...;
      }
 my $line = <FILE>; # no lines left!



回答5:


Update: j random hacker points out in a comment that Perl special cases the falseness test in a while loop when reading from a file handle. I've just verified that reading a false value will not terminate the loop -- at least on modern perls. Sorry for steering you all wrong. After 15 years of writing Perl I'm still a noob. ;)

Everyone above is right: use the while loop because it will be more memory efficient and give you more control.

A funny thing about that while loop though is that it exits when the read is false. Usually that will be end-of-file, but what if it returns an empty string or a 0? Oops! Your program just exited too soon. This can happen on any file handle if the last line in the file doesn't have a newline. It can also happen with custom file objects that have a read method that doesn't treat newlines the same way as regular Perl file objects.

Here's how to fix it. Check for an undefined value read which indicates end-of-file:

while (defined(my $line = <FILE>)) {
    print $line;
}

The foreach loop doesn't have this problem by the way and is correct even though inefficient.




回答6:


j_random_hacker mentioned this in the comments to this answer, but didn't actually put it in an answer of its own, even though it's another difference worth mentioning.

The difference is that while (<FILE>) {} overwrites $_, while foreach(<FILE>) {} localizes it. That is:

$_ = 100;
while (<FILE>) {
    # $_ gets each line in turn
    # do something with the file
}
print $_; # yes I know that $_ is unneeded here, but 
          # I'm trying to write clear code for the example

Will print out the last line of <FILE>.

However,

$_ = 100;
foreach(<FILE>) {
    # $_ gets each line in turn
    # do something with the file
}
print $_;

Will print out 100. To get the same with a while(<FILE>) {} construct you'd need to do:

$_ = 100;
{
    local $_;
    while (<FILE>) {
        # $_ gets each line in turn
        # do something with the file
    }
}
print $_; # yes I know that $_ is unneeded here, but 
          # I'm trying to write clear code for the example

Now this will print 100.




回答7:


Here is an example where foreach will not work but while will do the job

while (<FILE>) {
   $line1 = $_;
   if ($line1 =~ /SOMETHING/) {
      $line2 = <FILE>;
      if (line2 =~ /SOMETHING ELSE/) {
         print "I found SOMETHING and SOMETHING ELSE in consecutive lines\n";
         exit();
      }
   }
}

You simply cannot do this with foreach because it will read the whole file into a list before entering the loop and you wont be able to read the next line inside the loop. I am sure there will be workarounds to this problem even in foreach (reading into an array comes to mind) but while definitely offers a very straight forward solution.

A second example is when you have to parse a large (say 3GB) file on your machine with only 2GB RAM. foreach will simply run out of memory and crash. I learnt this the hard way very early in my perl programming life.




回答8:


foreach loop is faster than while (which is conditional based).



来源:https://stackoverflow.com/questions/585341/whats-the-difference-between-iterating-over-a-file-with-foreach-or-while-in-per

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!