Perl, disable buffering input

前端 未结 2 1922
温柔的废话
温柔的废话 2020-12-20 21:48

There is a file:

:~$ cat fff
qwerty
asdf
qwerty
zxcvb

There is a script:

:~$ cat 1.pl
#!/usr/bin/perl
print 
<         


        
2条回答
  •  北海茫月
    2020-12-20 22:42

    I've recently had to parse several log files which were around 6 gigabytes each. The buffering was a problem since Perl would happily attempt to read those 6 gigabytes into memory when I would assign the STDIN to an array... However, I simply didn't have the available system resources to do that. I came up with the following workaround that simply reads the file line by line and, thus, avoids the massive memory blackhole buffering vortex that would otherwise commandeer all my system resources.

    note: All this script does is split that 6 gigabyte file into several smaller ones(of which the size is dictated by the number of lines to be contained in each output file). The interesting bit is the while loop and the assignment of a single line from the log file to the variable. The loop will iterate through the entire file reading a single line, doing something with it, and then repeating. Result, no massive buffering... I kept the entire script intact just to show a working example...

    #!/usr/bin/perl -w
    BEGIN{$ENV{'POSIXLY_CORRECT'} = 1;}
    use v5.14;
    use Getopt::Long qw(:config no_ignore_case);
    
    my $input = '';
    my $output = '';
    my $lines = 0;
    GetOptions('i=s' => \$input, 'o=s' => \$output, 'l=i' => \$lines);
    
    open FI, '<', $input;
    
    my $count = 0;
    my $count_file = 1;
    while($count < $lines){
        my $line = ; #assign a single line of input to a variable
        last unless defined($line);
        open FO, '>>', "$output\_$count_file\.log";
        print FO $line;
        $count++;
        if($count == $lines){
            $count=0;
            $count_file++;
        }
    }
    print " done\n";
    

    Script is invoked on the command line like:

    (name of script) -i (input file) -o (output file) -l (size of output file(i.e. number of lines)

    Even if its not exactly what you are looking for, I hope it will give you some ideas. :)

提交回复
热议问题