Split files based on file content and pattern matching

前端 未结 6 2077
执笔经年
执笔经年 2020-12-15 22:45

I need your help with formate a txt file using bash/linux. The file looks like the following, it always has a line called Rate: Sth then it follows with the details in the v

相关标签:
6条回答
  • 2020-12-15 22:47

    I'd do this in perl:

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    open (my $out, ">-") or die "oops";
    
    while(<>)
    {
        if (m/^Rate: (\w+)/o)
        {
            close $out and open ($out, ">$1") or die "oops";
            next;
        }
    
        print $out $_
    }
    

    Use it like

    perl ./test.pl input.txt
    
    0 讨论(0)
  • 2020-12-15 22:49

    A one-liner inspired by sehe's answer:

    >perl -pwe '
    > if (/^Rate: (.+)/) { 
    >    open $out, ">", "Rate_$1.txt" or die $!; 
    >    select $out; 
    > }' gasdata.txt
    

    The -p option will read a line and print it after the code in -e is evaluated. select will choose a default filehandle for print. So, basically, what we are doing is simply juggling the filehandle around, depending on which Rate is currently the active one.

    Here's the code deparsed:

    >perl -MO=Deparse -pwe 'if (/^Rate: (.+)/) { open $out, ">", "output/Rate_$1.txt" or die $!; select $out; }' gasdata.txt
    BEGIN { $^W = 1; }
    LINE: while (defined($_ = <ARGV>)) {
        if (/^Rate: (.+)/) {
            die $! unless open $out, '>', "output/Rate_$1.txt";
            select $out;
        }
    }
    continue {
        die "-p destination: $!\n" unless print $_;
    }
    -e syntax OK
    
    0 讨论(0)
  • 2020-12-15 22:52

    This might work for you:

    csplit -z -f 'temp' -b '%02d.txt' file /Rate/ {*}
    

    This will produce files temp00.txt, temp01.txt...

    If you only want the Rate line then;

    sed -i '/Rate/!d' temp*.txt
    
    0 讨论(0)
  • 2020-12-15 22:56

    (g)awk to the rescue:

    awk '/^Rate:/ {output_file_name=$2; getline } 
         { print $0 >> ( output_file_name ) }' INPUT_FILE
    

    The first rule and command executes for the lines that starts with Rate: and only sets the output file name, then gets the next line from the input file. Then this next line is processed and gets written to the output file. After that the next line is processed by only the second command (gets written to the output file), but only if it not matches Rate:.

    NOTE: The above solution might fail if there is a section in the input file with two continuous lines of Rate:s, like this:

    ... DATA ...
    Rate: GBP
    Rate: CHF
    ... DATA ...
    

    should do (assuming that the line numbers are not part of the original file).

    HTH

    0 讨论(0)
  • 2020-12-15 22:56

    You can use something like this in perl -

    Perl Script:

    #!/usr/bin/perl
    
    undef $/;
    $_ = <>;
    $n = 0;
    
    for $match (split(/(?=Rate)/)) {
          open(O, '>temp' . ++$n);
          print O $match;
          close(O);
    }
    

    Execution:

    [jaypal~/temp]$ ./spl.pl temp.file
    
    [jaypal~/temp]$ **cat temp.file**
    Line No. Main Text
    1    Rate: GBP
    2    12/01/1999,90.5911501,Validated
         .....
         .....
    210  18/01/1999,90.954996,Validated
    211  Rate: RMB
    212  24/04/2008,132.2542,Validated
         .....
    1000 25/04/2008,132.2279,Validated
    1001 28/04/2008,131.69915,Validated
    1002 Rate: USD
    1003 21/11/11,-0.004419534,Validated
    
    [jaypal~/temp]$ cat temp1
    Line No. Main Text
    1    
    
    [jaypal~/temp]$ cat temp2
    Rate: GBP
    2    12/01/1999,90.5911501,Validated
         .....
         .....
    210  18/01/1999,90.954996,Validated
    
    211  
    
    [jaypal~/temp]$ cat temp3
    Rate: RMB
    212  24/04/2008,132.2542,Validated
         .....
    1000 25/04/2008,132.2279,Validated
    1001 28/04/2008,131.69915,Validated
    
    1002 [jaypal~/temp]$ cat temp4
    Rate: USD
    1003 21/11/11,-0.004419534,Validated
    [jaypal~/temp]$ 
    
    0 讨论(0)
  • 2020-12-15 23:13

    Another solution: It just makes your input file into a script and then runs it:

    sed 's/^Rate:/cat <<EOF >/; 1!s/^cat <<EOF/EOF\n&/; $aEOF' input.txt | bash
    

    I assumed the line numbers are not part of the file.

    0 讨论(0)
提交回复
热议问题