I understand that both Java and Perl try quite hard to find a one-size-fits all default buffer size when reading in files, but I find their choices to be increasingly antiqu
Warning, the following code has only been light tested. The code below is a first shot at a function that will let you process a file line by line (hence the function name) with a user-definable buffer size. It takes up to four arguments:
STDIN
)$_
)The arguments are positional with the exception that the last argument may always be the anonymous subroutine. Lines are auto-chomped.
Probable bugs:
$_
(introduced in Perl 5.10)You can see from an strace
that it reads the file with the specified buffer size. If I like how testing goes, you may see this on CPAN soon.
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw/reftype/;
use Carp;
sub line_by_line {
local $_;
my @args = \(
my $fh = \*STDIN,
my $bufsize = 4*1024,
my $ref = \$_,
my $coderef = sub { print "$_\n" },
);
croak "bad number of arguments" if @_ > @args;
for my $arg_val (@_) {
if (reftype $arg_val eq "CODE") {
${$args[-1]} = $arg_val;
last;
}
my $arg = shift @args;
$$arg = $arg_val;
}
my $buf;
my $overflow ='';
OUTER:
while(sysread $fh, $buf, $bufsize) {
my @lines = split /(\n)/, $buf;
while (@lines) {
my $line = $overflow . shift @lines;
unless (defined $lines[0]) {
$overflow = $line;
next OUTER;
}
$overflow = shift @lines;
if ($overflow eq "\n") {
$overflow = "";
} else {
next OUTER;
}
$$ref = $line;
$coderef->();
}
}
if (length $overflow) {
$$ref = $overflow;
$coderef->();
}
}
my $bufsize = shift;
open my $fh, "<", $0
or die "could not open $0: $!";
my $count;
line_by_line $fh, sub {
$count++ if /lines/;
}, $bufsize;
print "$count\n";