Elisp mechanism for converting PCRE regexps to emacs regexps

前端未结

关注

 4  1870

傲寒 2021-01-31 18:17

I admit significant bias toward liking PCRE regexps much better than emacs, if no no other reason that when I type a \'(\' I pretty much always want a grouping operator. And, o

4条回答

情书的邮戳 (楼主)

2021-01-31 18:50

I made a few minor modifications to a perl script I found on perlmonks (to take values from the command line) and saved it as re_pl2el.pl (given below). Then the following does a decent job of converting PCRE to elisp regexps, at least for non-exotic the cases that I tested.

(defun pcre-to-elre (regex)
  (interactive "MPCRE expression: ")
  (shell-command-to-string (concat "re_pl2el.pl -i -n "
                                   (shell-quote-argument regex))))

(pcre-to-elre "__\\w: \\d+") ;-> "__[[:word:]]: [[:digit:]]+"

It doesn't handle a few "corner" cases like perl's shy {N,M}? constructs, and of course not code execution etc. but it might serve your needs or be a good starting place for such. Since you like PCRE I presume you know enough perl to fix any cases you use often. If not let me know and we can probably fix them.

I would be happier with a script that parsed the regex into an AST and then spit it back out in elisp format (since then it could spit it out in rx format too), but I couldn't find anything doing that and it seemed like a lot of work when I should be working on my thesis. :-) I find it hard to believe that noone has done it though.

Below is my "improved" version of re_pl2el.pl. -i means don't double escape for strings, and -n means don't print a final newline.

#! /usr/bin/perl
#
# File: re_pl2el.pl
# Modified from http://perlmonks.org/?node_id=796020
#
# Description:
#
use strict;
use warnings;

# version 0.4


# TODO
# * wrap converter to function
# * testsuite

#--- flags
my $flag_interactive; # true => no extra escaping of backslashes
if ( int(@ARGV) >= 1 and $ARGV[0] eq '-i' ) {
    $flag_interactive = 1;
    shift @ARGV;
}

if ( int(@ARGV) >= 1 and $ARGV[0] eq '-n' ) {
    shift @ARGV;
} else {
    $\="\n";
}

if ( int(@ARGV) < 1 ) {
    print "usage: $0 [-i] [-n] REGEX";
    exit;
}

my $RE='\w*(a|b|c)\d\(';
$RE='\d{2,3}';
$RE='"(.*?)"';
$RE="\0".'\"\t(.*?)"';
$RE=$ARGV[0];

# print "Perlcode:\t $RE";

#--- encode all \0 chars as escape sequence
$RE=~s#\0#\\0#g;

#--- substitute pairs of backslashes with \0
$RE=~s#\\\\#\0#g;

#--- hide escape sequences of \t,\n,... with
#    corresponding ascii code
my %ascii=(
       t =>"\t",
       n=> "\n"
      );
my $kascii=join "|",keys %ascii;

$RE=~s#\\($kascii)#$ascii{$1}#g;


#---  normalize needless escaping
# e.g.  from /\"/ to /"/, since it's no difference in perl
# but might confuse elisp

$RE=~s#\\"#"#g;

#--- toggle escaping of 'backslash constructs'
my $bsc='(){}|';
$RE=~s#[$bsc]#\\$&#g;  # escape them once
$RE=~s#\\\\##g;        # and erase double-escaping



#--- replace character classes
my %charclass=(
        w => 'word' ,   # TODO: emacs22 already knows \w ???
        d => 'digit',
        s => 'space'
       );

my $kc=join "|",keys %charclass;
$RE=~s#\\($kc)#[[:$charclass{$1}:]]#g;



#--- unhide pairs of backslashes
$RE=~s#\0#\\\\#g;

#--- escaping for elisp string
unless ($flag_interactive){
  $RE=~s#\\#\\\\#g; # ... backslashes
  $RE=~s#"#\\"#g;   # ... quotes
}

#--- unhide escape sequences of \t,\n,...
my %rascii= reverse %ascii;
my $vascii=join "|",keys %rascii;
$RE=~s#($vascii)#\\$rascii{$1}#g;

# print "Elispcode:\t $RE";
print "$RE";
#TODO whats the elisp syntax for \0 ???

0 讨论(0)

查看其它4个回答