I admit significant bias toward liking PCRE regexps much better than emacs, if no no other reason that when I type a \'(\' I pretty much always want a grouping operator. And, o
I made a few minor modifications to a perl script I found on perlmonks (to take values from the command line) and saved it as re_pl2el.pl (given below). Then the following does a decent job of converting PCRE to elisp regexps, at least for non-exotic the cases that I tested.
(defun pcre-to-elre (regex)
(interactive "MPCRE expression: ")
(shell-command-to-string (concat "re_pl2el.pl -i -n "
(shell-quote-argument regex))))
(pcre-to-elre "__\\w: \\d+") ;-> "__[[:word:]]: [[:digit:]]+"
It doesn't handle a few "corner" cases like perl's shy {N,M}? constructs, and of course not code execution etc. but it might serve your needs or be a good starting place for such. Since you like PCRE I presume you know enough perl to fix any cases you use often. If not let me know and we can probably fix them.
I would be happier with a script that parsed the regex into an AST and then spit it back out in elisp format (since then it could spit it out in rx format too), but I couldn't find anything doing that and it seemed like a lot of work when I should be working on my thesis. :-) I find it hard to believe that noone has done it though.
Below is my "improved" version of re_pl2el.pl. -i means don't double escape for strings, and -n means don't print a final newline.
#! /usr/bin/perl
#
# File: re_pl2el.pl
# Modified from http://perlmonks.org/?node_id=796020
#
# Description:
#
use strict;
use warnings;
# version 0.4
# TODO
# * wrap converter to function
# * testsuite
#--- flags
my $flag_interactive; # true => no extra escaping of backslashes
if ( int(@ARGV) >= 1 and $ARGV[0] eq '-i' ) {
$flag_interactive = 1;
shift @ARGV;
}
if ( int(@ARGV) >= 1 and $ARGV[0] eq '-n' ) {
shift @ARGV;
} else {
$\="\n";
}
if ( int(@ARGV) < 1 ) {
print "usage: $0 [-i] [-n] REGEX";
exit;
}
my $RE='\w*(a|b|c)\d\(';
$RE='\d{2,3}';
$RE='"(.*?)"';
$RE="\0".'\"\t(.*?)"';
$RE=$ARGV[0];
# print "Perlcode:\t $RE";
#--- encode all \0 chars as escape sequence
$RE=~s#\0#\\0#g;
#--- substitute pairs of backslashes with \0
$RE=~s#\\\\#\0#g;
#--- hide escape sequences of \t,\n,... with
# corresponding ascii code
my %ascii=(
t =>"\t",
n=> "\n"
);
my $kascii=join "|",keys %ascii;
$RE=~s#\\($kascii)#$ascii{$1}#g;
#--- normalize needless escaping
# e.g. from /\"/ to /"/, since it's no difference in perl
# but might confuse elisp
$RE=~s#\\"#"#g;
#--- toggle escaping of 'backslash constructs'
my $bsc='(){}|';
$RE=~s#[$bsc]#\\$g; # escape them once
$RE=~s#\\\\##g; # and erase double-escaping
#--- replace character classes
my %charclass=(
w => 'word' , # TODO: emacs22 already knows \w ???
d => 'digit',
s => 'space'
);
my $kc=join "|",keys %charclass;
$RE=~s#\\($kc)#[[:$charclass{$1}:]]#g;
#--- unhide pairs of backslashes
$RE=~s#\0#\\\\#g;
#--- escaping for elisp string
unless ($flag_interactive){
$RE=~s#\\#\\\\#g; # ... backslashes
$RE=~s#"#\\"#g; # ... quotes
}
#--- unhide escape sequences of \t,\n,...
my %rascii= reverse %ascii;
my $vascii=join "|",keys %rascii;
$RE=~s#($vascii)#\\$rascii{$1}#g;
# print "Elispcode:\t $RE";
print "$RE";
#TODO whats the elisp syntax for \0 ???