Elisp mechanism for converting PCRE regexps to emacs regexps

前端 未结 4 1870
傲寒
傲寒 2021-01-31 18:17

I admit significant bias toward liking PCRE regexps much better than emacs, if no no other reason that when I type a \'(\' I pretty much always want a grouping operator. And, o

4条回答
  •  情书的邮戳
    2021-01-31 18:50

    I made a few minor modifications to a perl script I found on perlmonks (to take values from the command line) and saved it as re_pl2el.pl (given below). Then the following does a decent job of converting PCRE to elisp regexps, at least for non-exotic the cases that I tested.

    (defun pcre-to-elre (regex)
      (interactive "MPCRE expression: ")
      (shell-command-to-string (concat "re_pl2el.pl -i -n "
                                       (shell-quote-argument regex))))
    
    (pcre-to-elre "__\\w: \\d+") ;-> "__[[:word:]]: [[:digit:]]+"
    

    It doesn't handle a few "corner" cases like perl's shy {N,M}? constructs, and of course not code execution etc. but it might serve your needs or be a good starting place for such. Since you like PCRE I presume you know enough perl to fix any cases you use often. If not let me know and we can probably fix them.

    I would be happier with a script that parsed the regex into an AST and then spit it back out in elisp format (since then it could spit it out in rx format too), but I couldn't find anything doing that and it seemed like a lot of work when I should be working on my thesis. :-) I find it hard to believe that noone has done it though.

    Below is my "improved" version of re_pl2el.pl. -i means don't double escape for strings, and -n means don't print a final newline.

    #! /usr/bin/perl
    #
    # File: re_pl2el.pl
    # Modified from http://perlmonks.org/?node_id=796020
    #
    # Description:
    #
    use strict;
    use warnings;
    
    # version 0.4
    
    
    # TODO
    # * wrap converter to function
    # * testsuite
    
    #--- flags
    my $flag_interactive; # true => no extra escaping of backslashes
    if ( int(@ARGV) >= 1 and $ARGV[0] eq '-i' ) {
        $flag_interactive = 1;
        shift @ARGV;
    }
    
    if ( int(@ARGV) >= 1 and $ARGV[0] eq '-n' ) {
        shift @ARGV;
    } else {
        $\="\n";
    }
    
    if ( int(@ARGV) < 1 ) {
        print "usage: $0 [-i] [-n] REGEX";
        exit;
    }
    
    my $RE='\w*(a|b|c)\d\(';
    $RE='\d{2,3}';
    $RE='"(.*?)"';
    $RE="\0".'\"\t(.*?)"';
    $RE=$ARGV[0];
    
    # print "Perlcode:\t $RE";
    
    #--- encode all \0 chars as escape sequence
    $RE=~s#\0#\\0#g;
    
    #--- substitute pairs of backslashes with \0
    $RE=~s#\\\\#\0#g;
    
    #--- hide escape sequences of \t,\n,... with
    #    corresponding ascii code
    my %ascii=(
           t =>"\t",
           n=> "\n"
          );
    my $kascii=join "|",keys %ascii;
    
    $RE=~s#\\($kascii)#$ascii{$1}#g;
    
    
    #---  normalize needless escaping
    # e.g.  from /\"/ to /"/, since it's no difference in perl
    # but might confuse elisp
    
    $RE=~s#\\"#"#g;
    
    #--- toggle escaping of 'backslash constructs'
    my $bsc='(){}|';
    $RE=~s#[$bsc]#\\$&#g;  # escape them once
    $RE=~s#\\\\##g;        # and erase double-escaping
    
    
    
    #--- replace character classes
    my %charclass=(
            w => 'word' ,   # TODO: emacs22 already knows \w ???
            d => 'digit',
            s => 'space'
           );
    
    my $kc=join "|",keys %charclass;
    $RE=~s#\\($kc)#[[:$charclass{$1}:]]#g;
    
    
    
    #--- unhide pairs of backslashes
    $RE=~s#\0#\\\\#g;
    
    #--- escaping for elisp string
    unless ($flag_interactive){
      $RE=~s#\\#\\\\#g; # ... backslashes
      $RE=~s#"#\\"#g;   # ... quotes
    }
    
    #--- unhide escape sequences of \t,\n,...
    my %rascii= reverse %ascii;
    my $vascii=join "|",keys %rascii;
    $RE=~s#($vascii)#\\$rascii{$1}#g;
    
    # print "Elispcode:\t $RE";
    print "$RE";
    #TODO whats the elisp syntax for \0 ???
    

提交回复
热议问题