Split files based on file content and pattern matching

自作多情 提交于 2019-12-18 03:17:22

问题


I need your help with formate a txt file using bash/linux. The file looks like the following, it always has a line called Rate: Sth then it follows with the details in the very specific format. I'd like to split the file up with one rate for each file. In this example, I'd like to have 3 file, and each has the corresponding line says what the Rate value was.

How will you approach this?

line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

回答1:


This might work for you:

csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}

This will produce files temp00.txt, temp01.txt...

If you only want the Rate line then;

sed -i '/Rate/!d' temp*.txt



回答2:


I'd do this in perl:

#!/usr/bin/perl

use strict;
use warnings;

open (my $out, ">-") or die "oops";

while(<>)
{
    if (m/^Rate: (\w+)/o)
    {
        close $out and open ($out, ">$1") or die "oops";
        next;
    }

    print $out $_
}

Use it like

perl ./test.pl input.txt



回答3:


A one-liner inspired by sehe's answer:

>perl -pwe '
> if (/^Rate: (.+)/) { 
>    open $out, ">", "Rate_$1.txt" or die $!; 
>    select $out; 
> }' gasdata.txt

The -p option will read a line and print it after the code in -e is evaluated. select will choose a default filehandle for print. So, basically, what we are doing is simply juggling the filehandle around, depending on which Rate is currently the active one.

Here's the code deparsed:

>perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_$1.txt" or die $!; select $out; }' gasdata.txt
BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    if (/^Rate: (.+)/) {
        die $! unless open $out, '>', "output/Rate_$1.txt";
        select $out;
    }
}
continue {
    die "-p destination: $!\n" unless print $_;
}
-e syntax OK



回答4:


(g)awk to the rescue:

awk '/^Rate:/ {output_file_name=$2; getline } 
     { print $0 >> ( output_file_name ) }' INPUT_FILE

The first rule and command executes for the lines that starts with Rate: and only sets the output file name, then gets the next line from the input file. Then this next line is processed and gets written to the output file. After that the next line is processed by only the second command (gets written to the output file), but only if it not matches Rate:.

NOTE: The above solution might fail if there is a section in the input file with two continuous lines of Rate:s, like this:

... DATA ...
Rate: GBP
Rate: CHF
... DATA ...

should do (assuming that the line numbers are not part of the original file).

HTH




回答5:


Another solution: It just makes your input file into a script and then runs it:

sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash

I assumed the line numbers are not part of the file.




回答6:


You can use something like this in perl -

Perl Script:

#!/usr/bin/perl

undef $/;
$_ = <>;
$n = 0;

for $match (split(/(?=Rate)/)) {
      open(O, '>temp' . ++$n);
      print O $match;
      close(O);
}

Execution:

[jaypal~/temp]$ ./spl.pl temp.file

[jaypal~/temp]$ **cat temp.file**
Line No. Main Text
1    Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated
211  Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated
1002 Rate: USD
1003 21/11/11,-0.004419534,Validated

[jaypal~/temp]$ cat temp1
Line No. Main Text
1    

[jaypal~/temp]$ cat temp2
Rate: GBP
2    12/01/1999,90.5911501,Validated
     .....
     .....
210  18/01/1999,90.954996,Validated

211  

[jaypal~/temp]$ cat temp3
Rate: RMB
212  24/04/2008,132.2542,Validated
     .....
1000 25/04/2008,132.2279,Validated
1001 28/04/2008,131.69915,Validated

1002 [jaypal~/temp]$ cat temp4
Rate: USD
1003 21/11/11,-0.004419534,Validated
[jaypal~/temp]$ 


来源:https://stackoverflow.com/questions/8272017/split-files-based-on-file-content-and-pattern-matching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!