问题
I have a txt file like this:
#Genera columnA columnB columnC columnD columnN
x1 1 3 7 0.9 2
x2 5 3 13 7 5
x3 0.1 0.8 7 1 0.4
and I want to extract X determinate number of columns, just suppose that we want columnA, columnC and columnN (this could be a matrix with 1, 2, 20, 100 or more columns) and What I want to print OUT (this example is just 3 but could be more):
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4
I have tried
#!/usr/bin/perl
use strict;
use warnings;
my @wanted_fields = qw/columnA columnC columnN/;
open DATA, '<', "columns.txt" or die "cant open file\n";
my @datain = <DATA>;
close DATA;
my (@unit_name, $names, @lines, @conteo, @match_names, @columnas);
foreach (@datain){
if ($_=~ m/^$/g) { next; }
elsif ($_=~ m/#Genera/g) { $names= $_; }
else { push @lines, $_ }
}
@unit_name = split (/\t/, $names);
shift @unit_name;
my $count =0;
foreach (@wanted_fields){
my $unit_wanted =$_;
chomp $unit_wanted;
foreach (@unit_name){
if ($_ =~ m/$unit_wanted/g){
$count++;
push (@conteo, $count);
push (@match_names, $_);
}
}
}
foreach (@lines){
chomp;
@columnas = split (/\t/, $_);
#push @xx, $columnas[0][3];
}
I used the count to determinate the column to extract but in this case the number 2 do no correspond to columnC and 3 do not correspond to columnN well...... it is a any simple way to select any given columns, in this case I just want 3 but depend of the case could be 1,2 5, 10, 100 or more columns.
Thanks
回答1:
You can simplify like this and using hash slices.
#!/usr/bin/env perl
use strict;
use warnings;
my @wanted = ( '#Genera' , qw ( columnA columnC columnN ));
open my $input, '<', "file.txt" or die $!;
chomp ( my @header = split ' ', <$input> );
print join "\t", @wanted, "\n";
while ( <$input> ) {
my %row;
@row{@header} = split;
print join "\t", @row{@wanted}, "\n";
}
Which outputs:
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4
If you want to exactly match your indentation then add sprintf
to the mix:
E.g.:
print join "\t", map { sprintf "%8s", $_} @wanted, "\n";
while ( <$input> ) {
my %row;
@row{@header} = split;
print join "\t", map { sprintf "%8s", $_} @row{@wanted}, "\n";
}
Which then gives:
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4
回答2:
This program does as you ask. It expects the path to the input file as a parameter on the command line, which can then be read using the empty "diamond operator" <>
without explicitly opening it
Each non-blank line of the file is split into fields, and the header line is identified by the first starting with a hash symbol #
A call to map
converts the @wanted_fields
array into a list of indexes into @fields
where those column headers appear and stores it in array @idx
This array is then used to slice the wanted columns from @fields
for every line of input. The fields are printed, separated by tabs
use strict;
use warnings 'all';
use List::Util 'first';
my @wanted_fields = qw/ columnA columnC columnN /;
my @idx;
while ( <> ) {
next unless /\S/;
my @fields = split;
if ( $fields[0] =~ /^#/ ) {
@idx = ( 0, map {
my $wanted = $_;
first { $fields[$_] eq $wanted } 0 .. $#fields;
} @wanted_fields );
}
print join( "\t", @fields[@idx] ), "\n" if @idx;
}
output
#Genera columnA columnC columnN
x1 1 7 2
x2 5 13 5
x3 0.1 7 0.4
回答3:
There are command line switches that are used for this kind of application:
perl -lnae 'print join "\t", @F[1,3,5]' file.txt
Switch -a
automatically creates variable @F
for each line, split by whitespace. So @F[1,3,5]
is an array slice of elements 1, 3, and 5.
The downside of this, of course, is that you have to use the column numbers instead of the names.
来源:https://stackoverflow.com/questions/47901798/extract-multiples-columns-from-txt-file-perl