Parse single quoted string using Marpa:r2 perl

安稳与你 提交于 2020-01-04 06:03:10

问题


How to parse single quoted string using Marpa:r2? In my below code, the single quoted strings appends '\' on parsing.

Code:

use strict;
use Marpa::R2;
use Data::Dumper;


my $grammar = Marpa::R2::Scanless::G->new(
   {  default_action => '[values]',
      source         => \(<<'END_OF_SOURCE'),
  lexeme default = latm => 1

:start ::= Expression

# include begin

Expression ::= Param
Param ::= Unquoted                                         
        | ('"') Quoted ('"') 
        | (') Quoted (')

:discard      ~ whitespace 
whitespace    ~ [\s]+

Unquoted      ~ [^\s\/\(\),&:\"~]+
Quoted        ~ [^\s&:\"~]+

END_OF_SOURCE
   });

my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';

my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });

print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);

Output's:

Trying to parse:
foo

Output:
$VAR1 = [
          [
            'foo'
          ]
        ];

Trying to parse:
"foo"

Output:
$VAR1 = [
          [
            'foo'
          ]
        ];

Trying to parse:
'foo'

Output:
$VAR1 = [
          [
            '\'foo\''
          ]
        ]; (don't want it to be parsed like this)

Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.

Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')


回答1:


The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.

When you parse the input 'foo', Marpa will consider the three Param alternatives. The predicted lexemes at that position are:

  • Unquoted ~ [^\s\/\(\),&:\"~]+
  • '"'
  • ') Quoted ('

Yes, the last is literally ) Quoted (, not anything containing a single quote.

Even if it were ([']) Quoted ([']): Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.

What would happen for an input like " foo " (with double quotes)? Now, only the '"' lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing " is matched.

To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:

Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"'  DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body [']  SQ_Body ~ [^']*

These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.

Since lexemes cannot have actions, it is usually best to add a proxy production:

Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"'  DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body [']  SQ_Body ~ [^']*

The action could then do something like:

sub process_quoted {
  my (undef, $s) = @_;
  # remove delimiters from double-quoted string
  return $1 if $s =~ /^"(.*)"$/s;
  # remove delimiters from single-quoted string
  return $1 if $s =~ /^'(.*)'$/s;
  die "String was not delimited with single or double quotes";
}



回答2:


Your result doesn't contain \', it contains '. Dumper merely formats the result like that so it's clear what's inside the string and what isn't.

You can test this behavior for yourself:

use Data::Dumper;

my $tick = chr(39);
my $back = chr(92);

print "Tick Dumper: " . Dumper($tick);
print "Tick Print:  " . $tick . "\n";
print "Backslash Dumper: " . Dumper($back);
print "Backslash Print:  " . $back . "\n";

You can see a demo here: https://ideone.com/d1V8OE

If you don't want the output to contain single quotes, you'll probably need to remove them from the input yourself.




回答3:


I am not so familar with Marpa::R2, but could you try to use an action on the Expression rule:

Expression ::= Param action => strip_quotes

Then, implement a simple quote stripper like:

sub MyActions::strip_quotes {
    @{$_[1]}[0] =~ s/^'|'$//gr;
}


来源:https://stackoverflow.com/questions/50108574/parse-single-quoted-string-using-marpar2-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!