Edit: tchrist has informed me that my original accusations about Perl\'s insecurity are unfounded. However, the question still stands.
I know that i
User-supplied regex, or in general, user input, should never be treated as safe - regardless of the programming language. If your program fails to do so, it is vulnerable to attacks by deliberately crafted inputs.
In the case of Regex, it can be ReDos
: Regex Denial of Service. Basically, a regex which consumes an excessive amount of CPU and memory to process.
For e.g: if you try to evaluate this regex
^(([a-z])+.)+[A-Z]([a-z])+$
on this input:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
you'll notice it may hang - it's called catastrophic backtrack. See it for yourself here: https://regex101.com/r/Qhn3Vb/1
Read more about Regex DoS: https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
Bottomline: never assume user input is safe!
It's generally dynamic languages with an eval
facility that tend to have the ability to execute code from regular expressions. In static languages (i.e. those requiring a separate compilation step) there is generally no way to execute code that wasn't compiled, so evaluating code from within a regex is impossible.
Without a way to embed code in a regex, the worst a user can do is write a regex that takes a long time to evaluate.
This is not true: you cannot execute code callbacks in Perl by sneaking them in an evaluated regex. This is forbidden. You have to specifically override that with a lexically scoped
use re "eval";
if you expect to have both interpolation and code escapes happening in the same pattern.
Watch:
% perl -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
Eval-group not allowed at runtime, use re 'eval' in regex m/(?{ die naughty })/ at -e line 1.
Exit 255
% perl -Mre=eval -le '$x = "(?{ die 'naughty' })"; "aaa" =~ /$x/'
naughty at (re_eval 1) line 1.
Exit 255
In most languages allowing users to supply regular expression means that you allow for a denial of service attack.
Some types of regular expressions are extremely cpu intensive to execute. So in general it's a bad idea to allow users to enter regular expressions that will be executed on a remote system.
For more info, read this page: http://www.regular-expressions.info/catastrophic.html
Regular expressions are a programming language. I don't think they're quite Turing-complete, but they're close enough that allowing your users to enter them into your web site IS allowing other people to run code on your server. QED, yes, it's a security hole.
You might be able to get away with allowing a subset of whatever regexp language you want to use, whitelist a particular set of constructs to make it a not-big-enough-to-sweat-over hole... other people have already mentioned the possible dooms of nesting and * . How much you're willing to let people load down your server is up to you. Personally, I'd be comfortable with letting 'em have one SQL "CONTAINS" statement and maybe a "BETWEEN()". :)
I suspect ruby would allow /#{system("rm -rf really_important_directory")}/
- is that the kind of thing you're worried about?