How can I set the file-read buffer size in Perl to optimize it for large files?

前端未结

关注

 4  597

暗喜 2021-01-05 06:15

I understand that both Java and Perl try quite hard to find a one-size-fits all default buffer size when reading in files, but I find their choices to be increasingly antiqu

4条回答

[愿得一人] (楼主)

2021-01-05 07:04

Warning, the following code has only been light tested. The code below is a first shot at a function that will let you process a file line by line (hence the function name) with a user-definable buffer size. It takes up to four arguments:

an open filehandle (default is STDIN)
a buffer size (default is 4k)
a reference to a variable to store the line in (default is $_)
an anonymous subroutine to call on the file (the default prints the line).

The arguments are positional with the exception that the last argument may always be the anonymous subroutine. Lines are auto-chomped.

Probable bugs:

may not work on systems where line feed is the end of line character
will likely fail when combined with a lexical $_ (introduced in Perl 5.10)

You can see from an strace that it reads the file with the specified buffer size. If I like how testing goes, you may see this on CPAN soon.

#!/usr/bin/perl

use strict;
use warnings;
use Scalar::Util qw/reftype/;
use Carp;

sub line_by_line {
    local $_;
    my @args = \(
        my $fh      = \*STDIN,
        my $bufsize = 4*1024,
        my $ref     = \$_,
        my $coderef = sub { print "$_\n" },
    );
    croak "bad number of arguments" if @_ > @args;

    for my $arg_val (@_) {
        if (reftype $arg_val eq "CODE") {
            ${$args[-1]} = $arg_val;
            last;
        }
        my $arg = shift @args;
        $$arg = $arg_val;
    }

    my $buf;
    my $overflow ='';
    OUTER:
    while(sysread $fh, $buf, $bufsize) {
        my @lines = split /(\n)/, $buf;
        while (@lines) {
            my $line  = $overflow . shift @lines;
            unless (defined $lines[0]) {
                $overflow = $line;
                next OUTER;
            }
            $overflow = shift @lines;
            if ($overflow eq "\n") {
                $overflow = "";
            } else {
                next OUTER;
            }
            $$ref = $line;
            $coderef->();
        }
    }
    if (length $overflow) {
        $$ref = $overflow;
        $coderef->();
    }
}

my $bufsize = shift;

open my $fh, "<", $0
    or die "could not open $0: $!";

my $count;
line_by_line $fh, sub {
    $count++ if /lines/;
}, $bufsize;

print "$count\n";

0 讨论(0)

查看其它4个回答