I am writing a program that categorizes a list of Python files by which modules they import. As such I need to scan the collection of .py files ad return a list of which mod
I'm editing my original answer to say this. This is doable with a code snippet like the one below, but parsing the AST may be the best way to go.
def iter_imports(fd):
""" Yield only lines that appear to be imports from an iterable.
fd can be an open file, a list of lines, etc.
"""
for line in fd:
trimmed = line.strip()
if trimmed.startswith('import '):
yield trimmed
elif trimmed.startswith('from ') and ('import ' in trimmed):
yield trimmed
def main():
# File name to read.
filename = '/my/path/myfile.py'
# Safely open the file, exit on error
try:
with open(filename) as f:
# Iterate over the lines in this file, and generate a list of
# lines that appear to be imports.
import_lines = list(iter_imports(f))
except (IOError, OSError) as exIO:
print('Error opening file: {}\n{}'.format(filename, exIO))
return 1
else:
# From here, import_lines should be a list of lines like this:
# from module import thing
# import os, sys
# from module import *
# Do whatever you need to do with the import lines.
print('\n'.join(import_lines))
return 0
if __name__ == '__main__':
sys.exit(main())
Further string parsing will be needed to grab just the module names. This does not catch cases where multi-line strings or doc strings contain the words 'import ' or 'from X import '. This is why I suggested parsing the AST.