Importing a python module without actually executing it

后端 未结 6 2014
抹茶落季
抹茶落季 2020-12-10 01:48

In the context of a complex application, I need to import user-supplied \'scripts\'. Ideally, a script would have

def init():
    blah

def execute():
    mo         


        
相关标签:
6条回答
  • 2020-12-10 02:14

    Not sure if you'll consider this elegant, but it is somewhat intelligent in the sense that it recognizes when def init are tokens and not just part of a tricky multi-line string:

    '''
    def init does not define init...
    '''
    

    It will not recognize when init is defined in tricky alternate ways such as

    init = lambda ...
    

    or

    codestr='def  i'+'nit ...'
    exec(codestr)
    

    The only way to handle all such cases is to run the code (e.g. in a sandbox or by importing) and inspect the result.


    import tokenize
    import token
    import io
    import collections
    
    userscript = '''\
    def init():
        blah
    
    """
    def execute():
        more blah
    """
    
    yadda
    '''
    
    class Token(object):
        def __init__(self, tok):
            toknum, tokval, (srow, scol), (erow, ecol), line = tok
            self.toknum = toknum
            self.tokname = token.tok_name[toknum]
            self.tokval = tokval
            self.srow = srow
            self.scol = scol
            self.erow = erow
            self.ecol = ecol
            self.line = line    
    
    class Validator(object):
        def __init__(self, codestr):
            self.codestr = codestr
            self.toks = collections.deque(maxlen = 2)
            self.names = set()
        def validate(self):
            tokens = tokenize.generate_tokens(io.StringIO(self.codestr).readline)
            self.toks.append(Token(next(tokens)))
            for tok in tokens:
                self.toks.append(Token(tok))            
                if (self.toks[0].tokname == 'NAME'     # First token is a name
                    and self.toks[0].scol == 0         # First token starts at col 0
                    and self.toks[0].tokval == 'def'   # First token is 'def'
                    and self.toks[1].tokname == 'NAME' # Next token is a name
                    ):
                    self.names.add(self.toks[1].tokval)
            delta = set(['init', 'cleanup', 'execute']) - self.names
            if delta:
                raise ValueError('{n} not defined'.format(n = ' and '.join(delta)))
    
    v = Validator(userscript)
    v.validate()
    

    yields

    ValueError: execute and cleanup not defined
    
    0 讨论(0)
  • 2020-12-10 02:20

    A solution to 1 to 3, ( not the yadda part ) is to hand out "generic_class.py" with all the methods that you need. So,

    class Generic(object):
    
        def __init__(self):
            return
    
        def execute(self):
            return
    
        # etc
    

    You can then check for the existence of "generic" in what you've imported. If it doesn't exist you can ignore it and if it does then you know exactly what's there. Anything extra will never be called unless it's called from within one of your pre-defined methods.

    0 讨论(0)
  • 2020-12-10 02:21

    I'd first of all not require some functions, but a class that conforms to a specified interface, using either the abc module, or zope.interface. This forces the maker of the module to supply the functions you want.

    Secondly, I would not bother looking for module-level code. It's the module-makers problem if he does this. It's too much work with no actual benefit.

    If you are worried about security issues, you need to sandbox the code somehow anyway.

    0 讨论(0)
  • 2020-12-10 02:22

    My attempt using the ast module:

    import ast
    
    # which syntax elements are allowed at module level?
    whitelist = [
      # docstring
      lambda x: isinstance(x, ast.Expr) \
                 and isinstance(x.value, ast.Str),
      # import
      lambda x: isinstance(x, ast.Import),
      # class
      lambda x: isinstance(x, ast.ClassDef),
      # function
      lambda x: isinstance(x, ast.FunctionDef),
    ]
    
    def validate(source, required_functions):
      tree = ast.parse(source)
    
      functions = set()
      required_functions = set(required_functions)
    
      for item in tree.body:
        if isinstance(item, ast.FunctionDef):
          functions.add(item.name)
          continue
    
        if all(not checker(item) for checker in whitelist):
          return False
    
      # at least the required functions must be there
      return len(required_functions - functions) == 0
    
    
    if __name__ == "__main__":
      required_funcs = [ "init", "execute", "cleanup" ]
      with open("/tmp/test.py", "rb") as f:
        print("yay!" if validate(f.read(), required_funcs) else "d'oh!")
    
    0 讨论(0)
  • 2020-12-10 02:26

    One very simple solution could be to check the first characters of every line of code: The only permitted should be:

    • def init():
    • def execute():
    • def cleanup():
    • lines starting with 4 spaces
    • [optionally]: lines starting with #

    This is very primitive but it fulfills your requirements...

    Update: After a second though about it I realized that it isn't so easy after all. Consider for example this piece of code:

    def init():
        v = """abc
    def
    ghi"""
        print(v)
    

    This means that you'd need a more complex code parsing algorithm... so forget about my solution...

    0 讨论(0)
  • 2020-12-10 02:33

    Here's a simpler (and more naive) alternative to the AST approach:

    import sys
    from imp import find_module, new_module, PY_SOURCE
    
    
    EXPECTED = ("init", "execute", "cleanup")
    
    def import_script(name):
        fileobj, path, description = find_module(name)
    
        if description[2] != PY_SOURCE:
            raise ImportError("no source file found")
    
        code = compile(fileobj.read(), path, "exec")
    
        expected = list(EXPECTED)
        for const in code.co_consts:
            if isinstance(const, type(code)) and const.co_name in expected:
                expected.remove(const.co_name)
        if expected:
            raise ImportError("missing expected function: {}".format(expected))
    
        module = new_module(name)
        exec(code, module.__dict__)
        sys.modules[name] = module
        return module
    

    Keep in mind, this is a very direct way of doing it and circumvents extensions to Python's import machinery.

    0 讨论(0)
提交回复
热议问题