Sanitization of User-Supplied Regular Expressions in PHP

雨燕双飞 提交于 2019-11-29 13:56:37

I think PHP itself will check the regex. Here's a sample script I made :

// check for input, and set max size of input
if(@!empty($_POST['regex'])
    && @!empty($_POST['text'])
    && strlen($_POST['regex'])<1000
    && strlen($_POST['text'])<2000
    ){
    // set script timeout in case something goes wrong (SAFE MODE must be OFF)
    $old_time=ini_get('max_execution_time');
    if(!set_time_limit(1)) die('SAFE MODE MUST BE OFF'); // 1 sec is more then enough

    // trim input, it's up to you to do more checks
    $regex=trim($_POST['regex']);
    // don't trim the text, it can be needed
    $input=$_POST['text'];
    // escape slashes
    $regex=preg_replace('/([\\/]+)?//', '\/', $regex);

    // go for the regex
    if(false===$matched=@preg_match('/'.$regex.'/', $input, $matches)){
            // regex was tested, show results
            echo 'Matches: '.$matched.'<br />';
            if($matched>0){
                    echo 'matches: <br />';
                    foreach($matches as $i =>  $match){
                            echo $i.' = '.$match.'<br />';
                }
            }
    }
    // set back original execution time
    set_time_limit($old_time);
}

Anyways, NEVER EVER use eval() with user submitted strings.

Additionally, you can do some simple minimalistic sanitizing, but that's up to you. ;)

If you allow user-submitted values for preg_replace make sure you disallow the e flag! Not doing so could allow a malicious user to delete your entire site, or worse.

Otherwise, the worst thing that can happen is what the other answers already point out. Set a low script timeout, and maybe you should even make sure that the page can only be called X times per minute.

The only problem I can think of is that someone can DOS you by entering a bad regex (one that is O(2^n) or O(n!) or whatever), and the easiest way to prevent this might be to set your page timeout short.

If the regex is being stored in a database, you should use whatever method you would normally use to escape the data, such as prepared statements.

Otherwise, my only concern is that the user could supply malicious regex in the sense that it could contain a mischeviously complex regex, and I'm not sure there is a way to check that.

One thought is that you could make your regex evaluator all client side by doing it in JS, but there are inconsistencies between php's preg functions and JavaScript regex functions.

Afaik there are now "vulnerabilities" when trying to evaluate user-supplied regexps. The worst thing that could possibly happen is - like erik points out - a DOS attack or fatal error within your script.

I'm afraid to tell you that you won't be (even theoretically) able to "sanitize" every possible regexp out there. The best you can do is to check for lexical and/or syntactic errors.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!