Finding dead code in large python project [closed]

安稳与你 提交于 2019-11-29 22:37:29

You might want to try out vulture. It can't catch everything due to Python's dynamic nature, but it catches quite a bit without needing a full test suite like coverage.py and others need to work.

Peter Wood

Try running Ned Batchelder's coverage.py.

Coverage.py is a tool for measuring code coverage of Python programs. It monitors your program, noting which parts of the code have been executed, then analyzes the source to identify code that could have been executed but was not.

It is very hard to determine which functions and methods are called without executing the code, even if the code doesn't do any fancy stuff. Plain function invocations are rather easy to detect, but method calls are really hard. Just a simple example:

class A(object):
    def f(self):
        pass

class B(A):
    def f(self):
        pass

a = []
a.append(A())
a.append(B())
a[1].f()

Nothing fancy going on here, but any script that tries to determine whether A.f() or B.f() is called will have a rather hard time to do so without actually executing the code.

While the above code doesn't do anything useful, it certainly uses patterns that appear in real code -- namely putting instances in containers. Real code will usually do even more complex things -- pickling and unpickling, hierarchical data structures, conditionals.

As stated before, just detecting plain function invocations of the form

function(...)

or

module.function(...)

will be rather easy. You can use the ast module to parse your source files. You will need to record all imports, and the names used to import other modules. You will also need to track top-level function definitions and the calls inside these functions. This will give you a dependency graph, and you can use NetworkX to detect the connected components of this graph.

While this might sound rather complex, it can probably done with less than 100 lines of code. Unfortunately, almost all major Python projects use classes and methods, so it will be of little help.

Brian Postow

Here's the solution I'm using at least tentatively:

grep 'def ' *.py > defs
# ...
# edit defs so that it just contains the function names
# ...
for f in `cat defs` do
    cat $f >> defCounts
    cat *.py | grep -c $f >> defCounts
    echo >> defCounts
done

Then I look at the individual functions that have very few references (< 3 say)

it's ugly, and it only gives me approximate answers, but I think it's good enough for a start. What are you-all's thoughts?

With the following line you can list all function definitions that are obviously not used as an attribute, a function call, a decorator or a return value. So it is approximately what you are looking for. It is not perfect, it is slow, but I never got any false positives. (With linux you have to replace ack with ack-grep)

for f in $(ack --python --ignore-dir tests -h --noheading "def ([^_][^(]*).*\):\s*$" --output '$1' | sort| uniq); do c=$(ack --python -ch "^\s*(|[^#].*)(@|return\s+|\S*\.|.*=\s*|)"'(?<!def\s)'"$f\b"); [ $c == 0 ] && (echo -n "$f: "; ack --python --noheading "$f\b"); done

If you have your code covered with a lot of tests (it is quite useful at all), run them with code-coverage plugin and you can see unused code then .)

IMO that could be achieved pretty quickly with a simple pylint plugin that :

  • remember each analysed function / method (/ class ?) in a S1 set
  • track each called function / method (/ class ?) in a S2 set
  • display S1 - S2 in a report

Then you would have to call pylint on all your code base to get something that make sense. Of course as said this would need to checked, as there may have been inference failures or such that would introduce false positive. Anyway that would probably greatly reduce the number of grep to be done.

I've not much time to do it myself yet but anyone would find help on the python-projects@logilab.org mailing list.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!