only taking certain values from a list in perl

血红的双手。 提交于 2019-12-12 13:42:10

问题


First I will describe what I have, then the problem.

I have a text file that is structured as such

----------- Start of file-----
<!-->
name,name2,ignore,name4,jojobjim,name3,name6,name9,pop
-->
<csv counter="1">
1,2,3,1,6,8,2,8,2,
2,6,5,1,5,8,7,7,9,
1,4,3,1,2,8,9,3,4,
4,1,6,1,5,6,5,2,9
</csv>
-------- END OF FILE-----------

I also have a perl program that has a map:

 my %column_mapping = (
"name" => 'name',
"name1" => 'name_1',
"name2" => 'name_2',
"name3" => 'name_3',
"name4" => 'name_4',
"name5" => 'name_5',
"name6" => 'name_6',
"name7" => 'name_7',
"name9" => 'name_9',
)

My dynamic insert statement (assume I connected to database proper, and headers is my array of header names, such as test1, test2, ect)

my $sql = sprintf 'INSERT INTO tablename ( %s ) VALUES ( %s )',
    join( ',', map { $column_mapping{$_} } @headers ),
    join( ',', ('?') x scalar @headers ); 

my $sth = $dbh->prepare($sql);

Now for the problem I am actually having: I need a way to only do an insert on the headers and for the values that are in the map. In the data file given as an exmaple, there are several names that are not in the map, is there a way I can ignore them and the numbers associated with them in the csv section?

basically to make a subset csv, to turn it into:

name,name2,name4,name3,name6,name9,
 1,2,1,8,2,8,
 2,6,1,8,7,7,
 1,4,1,8,9,3,
 4,1,1,6,5,2,

so that my insert statment will only insert the ones in the map. The data file is always different, and are not in same order, and an unknown amount will be in the map.

Ideally a efficient way to do this, since this script will be going through thousands of files, and each files behind millions of lines of the csv with hundreds of columns.

It is just a text file being read though, not a csv, not sure if csv libraries can work in this scenario or not.


回答1:


You would typically put the set of valid indices in a list and use array slices after that.

@valid = grep { defined($column_mapping{ $headers[$_] }) } 0 .. $#headers;

...

my $sql = sprintf 'INSERT INTO tablename ( %s ) VALUES ( %s )',
  join( ',', map { $column_mapping{$_} } @headers[@valid] ),
  join( ',', ('?') x scalar @valid);
my $sth = $dbh->prepare($sql);

...

my @row = split /,/, <INPUT>; 
$sth->execute( @row[@valid] );

...



回答2:


Because this is about four different questions in one, I'm going to take a higher level approach to the broad set of problems and leave the programming details to you (or you can ask new questions about the details).

I would get the data format changed as quickly as possible. Mixing CSV columns into an XML file is bizarre and inefficient, as I'm sure you're aware. Use a CSV file for bulk data. Use an XML file for complicated metadata.

Having the headers be an XML comment is worse, now you're parsing comments; comments are supposed to be ignored. If you must retain the mixed XML/CSV format put the headers into a proper XML tag. Otherwise what's the point of using XML?

Since you're going to be parsing a large file, use an XML SAX parser. Unlike a more traditional DOM parser which must parse the whole document before doing anything, a SAX parser will process it as it reads the file. This will save a lot of memory. I leave SAX processing as an exercise, start with XML::SAX::Intro.

Within the SAX parser, extract the data from the <csv> and use a CSV parser on that. Text::CSV_XS is a good choice. It is efficient and has solved all the problems of parsing CSV data you are likely to run into.

When you finally have it down to a Text::CSV_XS object, call getline_hr in a loop to get the rows as hashes, apply your mapping, and insert into your database. @mob's solution is fine, but I would go with SQL::Abstract to generate the SQL rather than doing it by hand. This will protect against both SQL injection attacks as well as more mundane things like the headers containing SQL meta characters and reserved words.

It's important to separate the processing of the parsed data from the parsing of the data. I'm quite sure that hideous data format will change, either for the worse or the better, and you don't want to tie the code to it.



来源:https://stackoverflow.com/questions/31841529/only-taking-certain-values-from-a-list-in-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!