How can I extract a string between matching braces in Perl?

前端 未结 7 819
再見小時候
再見小時候 2020-12-06 06:19

My input file is as below :

HEADER 
{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}

{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}

{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{         


        
相关标签:
7条回答
  • 2020-12-06 06:44

    I second ysth's suggestion to use the Text::Balanced module. A few lines will get you on your way.

    use strict;
    use warnings;
    use Text::Balanced qw/extract_multiple extract_bracketed/;
    
    my $file;
    open my $fileHandle, '<', 'file.txt';
    
    { 
      local $/ = undef; # or use File::Slurp
      $file = <$fileHandle>;
    }
    
    close $fileHandle;
    
    my @array = extract_multiple(
                                   $file,
                                   [ sub{extract_bracketed($_[0], '{}')},],
                                   undef,
                                   1
                                );
    
    print $_,"\n" foreach @array;
    

    OUTPUT

    {ABC|*|DEF {GHI 0 1 0} {{Points {}}}}
    {ABC|*|DEF {GHI 0 2 0} {{Points {}}}}
    {ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}
    {ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}}
    {ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}}
    {
        ABC|*|XYZ:abc:pqr {GHI 0 68 0}
            {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}}
            }
    
    0 讨论(0)
  • 2020-12-06 06:45

    This can certainly be done with regex at least in modern versions of Perl:

    my @array = $str =~ /( \{ (?: [^{}]* | (?0) )* \} )/xg;
    
    print join "\n" => @array;
    

    The regex matches a curly brace block that contains either non curly brace characters, or a recursion into itself (matches nested braces)

    Edit: the above code works in Perl 5.10+, for earlier versions the recursion is a bit more verbose:

    my $re; $re = qr/ \{ (?: [^{}]* | (??{$re}) )* \} /x;
    
    my @array = $str =~ /$re/xg;
    
    0 讨论(0)
  • 2020-12-06 06:45

    You're much better off using a state machine than a regex for this type of parsing.

    0 讨论(0)
  • 2020-12-06 06:51

    Use Text::Balanced

    0 讨论(0)
  • 2020-12-06 06:59

    Regular expressions are actually pretty bad for matching braces. Depending how deep you want to go, you could write a full grammar (which is a lot easier than it sounds!) for Parse::RecDescent. Or, if you just want to get the blocks, search through for opening '{' marks and closing '}', and just keep count of how many are open at any given time.

    0 讨论(0)
  • 2020-12-06 07:03

    You can always count braces:

    my $depth = 0;
    my $out = "";
    my @list=();
    foreach my $fr (split(/([{}])/,$data)) {
        $out .= $fr;
        if($fr eq '{') {
            $depth ++;
        }
        elsif($fr eq '}') {
            $depth --;
            if($depth ==0) {
                $out =~ s/^.*?({.*}).*$/$1/s; # trim
                push @list, $out;
                $out = "";
            }
        }
    }
    print join("\n==================\n",@list);
    

    This is old, plain Perl style (and ugly, probably).

    0 讨论(0)
提交回复
热议问题