Parsing a text file with multiple columns

问题

I am attempting to extract each of the 11 columns in the following file:

http://bioinfo.mc.vanderbilt.edu/TSGene/Human_716_TSGs.txt

...into a list of scalars for a beginning level college bioinformatics project. My effort, please see below, is effective but not perfect since the amount of whitespace varies between columns (please see the top of the file for details).

use strict;
use warnings;

open FH, '<', 'tsg.txt' or die $!;
my $data = do {local $/; <FH>};
close FH or die $!;

my($id, $sym, $alias, $xref, $chromo, $band, $name, $gene_t, $desc, $nuc_seq,
   $pro_seq) = $data =~ /(\S+)\s+
                         (\S+)\s+
                         (\S+)\s+
                         (\S+)\s+
                         (\S+)\s+
                         (\S+)\s+

                         (\S+)\s+
                         /xms;

print "GeneID: $id", "\n";
print "Gene_symbol: $sym", "\n";
print "Alias: $alias", "\n";
print "XRef: $xref", "\n";
print "Chromosome: $chromo", "\n";
print "Cytoband: $band", "\n";

print "Full_name: $name", "\n";
#print "Gene_type: $gene_t", "\n";
#print "Description: $desc", "\n";
#print "Nucleotide_sequence: $nuc_seq", "\n";
#print "Protein_sequence: $pro_seq", "\n";

Thanks for the help.

回答1:

This file looks like its tab separated, you should be able to store each line into an array using split on \t:

my @columns = split( "\t", $data );

And then you can access your columns by indexing in:

my $id = $columns[0];

etc.

来源：https://stackoverflow.com/questions/17438782/parsing-a-text-file-with-multiple-columns

标签

perl

bioinformatics

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!