Python eval: is it still dangerous if I disable builtins and attribute access?

后端 未结 6 949
甜味超标
甜味超标 2020-12-02 12:38

We all know that eval is dangerous, even if you hide dangerous functions, because you can use Python\'s introspection features to dig down into things and re-extract them. F

相关标签:
6条回答
  • 2020-12-02 13:03

    Here is a safe_eval example which will ensure that the evaluated expression do not contain unsafe tokens. It does not try to take the literal_eval approach of interpreting the AST but rather whitelist the token types and use the real eval if expression passed test.

    # license: MIT (C) tardyp
    import ast
    
    
    def safe_eval(expr, variables):
        """
        Safely evaluate a a string containing a Python
        expression.  The string or node provided may only consist of the following
        Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
        and None. safe operators are allowed (and, or, ==, !=, not, +, -, ^, %, in, is)
        """
        _safe_names = {'None': None, 'True': True, 'False': False}
        _safe_nodes = [
            'Add', 'And', 'BinOp', 'BitAnd', 'BitOr', 'BitXor', 'BoolOp',
            'Compare', 'Dict', 'Eq', 'Expr', 'Expression', 'For',
            'Gt', 'GtE', 'Is', 'In', 'IsNot', 'LShift', 'List',
            'Load', 'Lt', 'LtE', 'Mod', 'Name', 'Not', 'NotEq', 'NotIn',
            'Num', 'Or', 'RShift', 'Set', 'Slice', 'Str', 'Sub',
            'Tuple', 'UAdd', 'USub', 'UnaryOp', 'boolop', 'cmpop',
            'expr', 'expr_context', 'operator', 'slice', 'unaryop']
        node = ast.parse(expr, mode='eval')
        for subnode in ast.walk(node):
            subnode_name = type(subnode).__name__
            if isinstance(subnode, ast.Name):
                if subnode.id not in _safe_names and subnode.id not in variables:
                    raise ValueError("Unsafe expression {}. contains {}".format(expr, subnode.id))
            if subnode_name not in _safe_nodes:
                raise ValueError("Unsafe expression {}. contains {}".format(expr, subnode_name))
    
        return eval(expr, variables)
    
    
    
    class SafeEvalTests(unittest.TestCase):
    
        def test_basic(self):
            self.assertEqual(safe_eval("1", {}), 1)
    
        def test_local(self):
            self.assertEqual(safe_eval("a", {'a': 2}), 2)
    
        def test_local_bool(self):
            self.assertEqual(safe_eval("a==2", {'a': 2}), True)
    
        def test_lambda(self):
            self.assertRaises(ValueError, safe_eval, "lambda : None", {'a': 2})
    
        def test_bad_name(self):
            self.assertRaises(ValueError, safe_eval, "a == None2", {'a': 2})
    
        def test_attr(self):
            self.assertRaises(ValueError, safe_eval, "a.__dict__", {'a': 2})
    
        def test_eval(self):
            self.assertRaises(ValueError, safe_eval, "eval('os.exit()')", {})
    
        def test_exec(self):
            self.assertRaises(SyntaxError, safe_eval, "exec 'import os'", {})
    
        def test_multiply(self):
            self.assertRaises(ValueError, safe_eval, "'s' * 3", {})
    
        def test_power(self):
            self.assertRaises(ValueError, safe_eval, "3 ** 3", {})
    
        def test_comprehensions(self):
            self.assertRaises(ValueError, safe_eval, "[i for i in [1,2]]", {'i': 1})
    
    0 讨论(0)
  • 2020-12-02 13:04

    I'm going to mention one of the new features of Python 3.6 - f-strings.

    They can evaluate expressions,

    >>> eval('f"{().__class__.__base__}"', {'__builtins__': None}, {})
    "<class 'object'>"
    

    but the attribute access won't be detected by Python's tokenizer:

    0,0-0,0:            ENCODING       'utf-8'        
    1,0-1,1:            ERRORTOKEN     "'"            
    1,1-1,27:           STRING         'f"{().__class__.__base__}"'
    2,0-2,0:            ENDMARKER      '' 
    
    0 讨论(0)
  • 2020-12-02 13:06

    Users can still DoS you by inputting an expression that evaluates to a huge number, which would fill your memory and crash the Python process, for example

    '10**10**100'
    

    I am definitely still curious if more traditional attacks, like recovering builtins or creating a segfault, are possible here.

    EDIT:

    It turns out, even Python's parser has this issue.

    lambda: 10**10**100
    

    will hang, because it tries to precompute the constant.

    0 讨论(0)
  • 2020-12-02 13:07

    It is possible to construct a return value from eval that would throw an exception outside eval if you tried to print, log, repr, anything:

    eval('''((lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args))))
            (lambda f: lambda n: (1,(1,(1,(1,f(n-1))))) if n else 1)(300))''')
    

    This creates a nested tuple of form (1,(1,(1,(1...; that value cannot be printed (on Python 3), stred or repred; all attempts to debug it would lead to

    RuntimeError: maximum recursion depth exceeded while getting the repr of a tuple
    

    pprint and saferepr fails too:

    ...
      File "/usr/lib/python3.4/pprint.py", line 390, in _safe_repr
        orepr, oreadable, orecur = _safe_repr(o, context, maxlevels, level)
      File "/usr/lib/python3.4/pprint.py", line 340, in _safe_repr
        if issubclass(typ, dict) and r is dict.__repr__:
    RuntimeError: maximum recursion depth exceeded while calling a Python object
    

    Thus there is no safe built-in function to stringify this: the following helper could be of use:

    def excsafe_repr(obj):
        try:
            return repr(obj)
        except:
            return object.__repr__(obj).replace('>', ' [exception raised]>')
    

    And then there is the problem that print in Python 2 does not actually use str/repr, so you do not have any safety due to lack of recursion checks. That is, take the return value of the lambda monster above, and you cannot str, repr it, but ordinary print (not print_function!) prints it nicely. However, you can exploit this to generate a SIGSEGV on Python 2 if you know it will be printed using the print statement:

    print eval('(lambda i: [i for i in ((i, 1) for j in range(1000000))][-1])(1)')
    

    crashes Python 2 with SIGSEGV. This is WONTFIX in the bug tracker. Thus never use print-the-statement if you want to be safe. from __future__ import print_function!


    This is not a crash, but

    eval('(1,' * 100 + ')' * 100)
    

    when run, outputs

    s_push: parser stack overflow
    Traceback (most recent call last):
      File "yyy.py", line 1, in <module>
        eval('(1,' * 100 + ')' * 100)
    MemoryError
    

    The MemoryError can be caught, is a subclass of Exception. The parser has some really conservative limits to avoid crashes from stackoverflows (pun intended). However, s_push: parser stack overflow is output to stderr by C code, and cannot be suppressed.


    And just yesterday I asked why doesn't Python 3.4 be fixed for a crash from,

    % python3  
    Python 3.4.3 (default, Mar 26 2015, 22:03:40) 
    [GCC 4.9.2] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class A:
    ...     def f(self):
    ...         nonlocal __x
    ... 
    [4]    19173 segmentation fault (core dumped)  python3
    

    and Serhiy Storchaka's answer confirmed that Python core devs do not consider SIGSEGV on seemingly well-formed code a security issue:

    Only security fixes are accepted for 3.4.

    Thus it can be concluded that it can never be considered safe to execute any code from 3rd party in Python, sanitized or not.

    And Nick Coghlan then added:

    And as some additional background as to why segmentation faults provoked by Python code aren't currently considered a security bug: since CPython doesn't include a security sandbox, we're already relying entirely on the OS to provide process isolation. That OS level security boundary isn't affected by whether the code is running "normally", or in a modified state following a deliberately triggered segmentation fault.

    0 讨论(0)
  • 2020-12-02 13:07

    Controlling the locals and globals dictionaries is extremely important. Otherwise, someone could just pass in eval or exec, and call it recursively

    safe_eval('''e("""[c for c in ().__class__.__base__.__subclasses__() 
        if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__""")''', 
        globals={'e': eval})
    

    The expression in the recursive eval is just a string.

    You also need to set the eval and exec names in the global namespace to something that isn't the real eval or exec. The global namespace is important. If you use a local namespace, anything that creates a separate namespace, such as comprehensions and lambdas, will work around it

    safe_eval('''[eval("""[c for c in ().__class__.__base__.__subclasses__()
        if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__""") for i in [1]][0]''', locals={'eval': None})
    
    safe_eval('''(lambda: eval("""[c for c in ().__class__.__base__.__subclasses__()
        if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__"""))()''',
        locals={'eval': None})
    

    Again, here, safe_eval only sees a string and a function call, not attribute accesses.

    You also need to clear out the safe_eval function itself, if it has a flag to disable safe parsing. Otherwise you could simply do

    safe_eval('safe_eval("<dangerous code>", safe=False)')
    
    0 讨论(0)
  • 2020-12-02 13:12

    I don't believe Python is designed to have any security against untrusted code. Here's an easy way to induce a segfault via stack overflow (on the C stack) in the official Python 2 interpreter:

    eval('()' * 98765)
    

    From my answer to the "Shortest code that returns SIGSEGV" Code Golf question.

    0 讨论(0)
提交回复
热议问题