What native Perl code replaces `cut`?

问题

I'm learning Perl as I edit a Perl script to replace Posix OS calls to native Perl functions for cross-platform use on Windows. This code has me stumped:

if (defined($OPTIONS)) {
    my ($method,$file) = ($1,$2);
    my $count = `cut -d\\  -f 2 $file | sort | uniq | wc -l`;
}

1) Where do $1 and $2 come from? This code is inside a function but the function doesn't have any arguments. Also, script itself parses over 70 named arguments, so they're not form the command line.

2) Since I don't know what $2 is, I'm not sure of the content of $file.

3) Whatever the content of $file, the cut function looks at the second field, as delimited by a backslash, of each line.

4) It looks like the ultimate result is the $count of unique instances of whatever cut found.

Considering $file could be quite large (million lines, multi-hundreds of megabytes), what is the most efficient native Perl code to replace this external call and get the same $count value? Also "efficient" is relative. This code is in a tool chain where other stages can run for 2 or 3 days. So, it's not a problem if this code requires 5 or 10 minutes on a large file.

回答1:

$1 $2 etc. are internal Perl variables that hold the contents of the first, second etc. captures from the most recent successful regex pattern match.

This should do what you want. It uses a hash to keep track of all the unique values for the second column, and sets $count to the number of different keys when the file has been read through. It's likely to be slightly faster than the tool chain equivalent. Note that it's untested as I'm not near a system with Perl at present.

I hope there's something more in the real version of this code, as the only effect this has is to change the values of a couple of local variables which are discarded at the end of the block.

if ( defined $OPTIONS ) {
    my ($method, $file) = ($1, $2);
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
    my %count;
    ++$count{ (split /\\/, $_, 3)[1] } while <$fh>;
    my $count = keys %count;
}

回答2:

Well $1, and $2 are previously defined variables. No telling how/where/why without additional code, but the command can be broken down as follows:

my $count = `cut -d\\  -f 2 $file | sort | uniq | wc -l`;

-d, sets the delimiter to \ (\ is used to escape the \ as it is a special character). -f, tells cut to extract the second field (what is between the first and second delimiter)

Example:

cut -d\\ -f 2 <<< $(echo "FIELD 1\FIELD2\THE_REMAINDER")

Result

FIELD2

The remaining commands that go through the pipes are as follows:

sort will take the list of fields and order them descending in value.

uniq will remove duplicates.

wc -l will give you a final total of the number of entries in your list (really it is the number of lines)

So in order to replicate this with a non-unix based solution you need to complete each of those steps systematically via Perl. This should not be hard to accomplish so I have omitted that part. Feel free to update your question with what you have tried and Im sure there will be a ton of assistance provided as this is a pretty interesting challenge IMHO.

来源：https://stackoverflow.com/questions/28446016/what-native-perl-code-replaces-cut

标签

perl

cross-platform