How do I access the captures within a match?

前端未结

关注

 2  1449

梦谈多话 2021-01-20 11:35

I am trying to parse a csv file, and I am trying to access names regex in proto regex in Perl6. It turns out to be Nil. What is the proper way to do it?

gra


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   攒了一身酷
                                             
                
                
                (楼主)
            
              
              
                2021-01-20 12:14
              

            
            
                        
Detailed discussion complementing Christoph's answer

I am trying to parse a csv file

Perhaps you are focused on learning Perl 6 parsing and are writing some throwaway code. But if you want industrial strength CSV parsing out of the box, please be aware of the Text::CSV modules^[1].

I am trying to access a named regex

If you are learning Perl 6 parsing, please be aware of jnthn's grammar tracer and debugger^[2].

in proto regex in Perl6

Your issue is unrelated to it being a proto regex.
Instead the issue is that, while the match object corresponding to your named capture is stored in the overall match object you stored in $m1, it is not stored precisely where you are looking for it.
Where do match objects corresponding to captures appear?
To see what's going on, I'll start by simulating what you were trying to do. I'll use a regex that declares just one capture, a "named" (aka "Associative") capture that matches the string ab.
given 'ab'
{
    my $m1 = m/ $ = ( ab ) /;

    say $m1;
    # ｢ab｣
}

The match object corresponding to the named capture is stored where you'd presumably expect it to appear within $m1, at $m1.
But you were getting Nil with $m1. What gives?
Why your $m1 did not work
There are two types of capture: named (aka "Associative") and numbered (aka "Positional"). The parens you wrote in your regex that surrounded  introduced a numbered capture:
given 'ab'
{
    my $m1 = m/ ( $ = ( ab ) ) /; # extra parens added

    say $m1[0];
    # ｢ab｣
}

The parens in / ( ... ) / declare a single top level numbered capture. If it matches, then the corresponding  match object is stored in $m1[0]. (If your regex looked like / ... ( ... ) ... ( ... ) ... ( ... ) ... / then another match object corresponding to what matches the second pair of parentheses would be stored in $m1[1], another in $m1[2] for the third, and so on.)
The match result for $ = ( ab ) is then stored inside $m1[0]. That's why say $m1[0] works.
So far so good. But this is only half the story...
Why $m1[0] in your code would not work either
While $m1[0] in the immediately above code is working, you would still not get a match object in $m1[0] in your original code. This is because you also asked for multiple matches of the zeroth capture because you used a * quantifier:
given 'ab'
{
    my $m1 = m/ ( $ = ( ab ) )* /; # * is a quantifier

    say $m1[0][0];
    # ｢ab｣
}

Because the * quantifier asks for multiple matches, Perl 6 writes a list of match objects into $m1[0]. (In this case there's only one such match so you end up with a list of length 1, i.e. just $m1[0][0] (and not $m1[0][1], $m1[0][2], etc.).)
Summary

captures nest;

a capture quantified by either * or + corresponds to two levels of nesting not just one.

In your original code, you'd have to write say $m1[0][0];  to get to the match object you're looking for.



^[1] Install relevant modules and write use Text::CSV; (for a pure Perl 6 implementation) or use Text::CSV:from; (for a Perl 5 plus XS implementation) at the start of your code. (talk slides (click on top word, eg. "csv", to advance through slides), video, Perl 6 module, Perl 5 XS module.)
^[2] Install relevant modules and write use Grammar::Tracer; or use Grammar::Debugger; at the start of your code`. (talk slides, video, modules.)
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复

How do I access the captures within a match?

Detailed discussion complementing Christoph's answer

Where do match objects corresponding to captures appear?

Why your `$m1` did not work

Why `$m1[0]` in your code would not work either

Summary