I am writing a program that categorizes a list of Python files by which modules they import. As such I need to scan the collection of .py files ad return a list of which mod
I was looking for something similar and I found a gem in a package called PyScons. The Scanner does just what you want (in 7 lines), using an import_hook. Here is an abbreviated example:
import modulefinder, sys
class SingleFileModuleFinder(modulefinder.ModuleFinder):
def import_hook(self, name, caller, *arg, **kwarg):
if caller.__file__ == self.name:
# Only call the parent at the top level.
return modulefinder.ModuleFinder.import_hook(self, name, caller, *arg, **kwarg)
def __call__(self, node):
self.name = str(node)
self.run_script(self.name)
if __name__ == '__main__':
# Example entry, run with './script.py filename'
print 'looking for includes in %s' % sys.argv[1]
mf = SingleFileModuleFinder()
mf(sys.argv[1])
print '\n'.join(mf.modules.keys())
I recently needed all the dependencies for a given python script and I took a different approach than the other answers. I only cared about top level module module names (eg, I wanted foo
from import foo.bar
).
This is the code using the ast module:
import ast
modules = set()
def visit_Import(node):
for name in node.names:
modules.add(name.name.split(".")[0])
def visit_ImportFrom(node):
# if node.module is missing it's a "from . import ..." statement
# if level > 0 it's a "from .submodule import ..." statement
if node.module is not None and node.level == 0:
modules.add(node.module.split(".")[0])
node_iter = ast.NodeVisitor()
node_iter.visit_Import = visit_Import
node_iter.visit_ImportFrom = visit_ImportFrom
Testing with a python file foo.py
that contains:
# foo.py
import sys, os
import foo1
from foo2 import bar
from foo3 import bar as che
import foo4 as boo
import foo5.zoo
from foo6 import *
from . import foo7, foo8
from .foo12 import foo13
from foo9 import foo10, foo11
def do():
import bar1
from bar2 import foo
from bar3 import che as baz
I could get all the modules in foo.py
by doing something like this:
with open("foo.py") as f:
node_iter.visit(ast.parse(f.read()))
print(modules)
which would give me this output:
set(['bar1', 'bar3', 'bar2', 'sys', 'foo9', 'foo4', 'foo5', 'foo6', 'os', 'foo1', 'foo2', 'foo3'])
It's actually working quite good with
print [key for key in locals().keys()
if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
It depends how thorough you want to be. Used modules is a turing complete problem: some python code uses lazy importing to only import things they actually use on a particular run, some generate things to import dynamically (e.g. plugin systems).
python -v will trace import statements - its arguably the simplest thing to check.
For the majority of scripts which only import modules at the top level, it is quite sufficient to load the file as a module, and scan its members for modules:
import sys,io,imp,types
scriptname = 'myfile.py'
with io.open(scriptname) as scriptfile:
code = compile(scriptfile.readall(),scriptname,'exec')
newmodule = imp.new_module('__main__')
exec(codeobj,newmodule.__dict__)
scriptmodules = [name for name in dir(newmodule) if isinstance(newmodule.__dict__[name],types.ModuleType)]
This simulates the module being run as a script, by setting the module's name to '__main__'
. It should therefore also capture funky dynamic module loading. The only modules it won't capture are those which are imported only into local scopes.
This works - using importlib to actually import the module, and inspect to get the members :
#! /usr/bin/env python
#
# test.py
#
# Find Modules
#
import inspect, importlib as implib
if __name__ == "__main__":
mod = implib.import_module( "example" )
for i in inspect.getmembers(mod, inspect.ismodule ):
print i[0]
#! /usr/bin/env python
#
# example.py
#
import sys
from os import path
if __name__ == "__main__":
print "Hello World !!!!"
Output :
tony@laptop .../~:$ ./test.py
path
sys