I would like to analyze the dependency tree of Python packages. How can I obtain this data?
Things I already know
setup.py
sometimes contains arequires
field that lists package dependencies- PyPi is an online repository of Python packages
- PyPi has an API
Things that I don't know
- Very few projects (around 10%) on PyPi explicitly list dependencies in the
requires
field butpip/easy_install
still manage to download the correct packages. What am I missing? For example the popular library for statistical computing,pandas
, doesn't listrequires
but still manages to installnumpy
,pytz
, etc.... Is there a better way to automatically collect the full list of dependencies? - Is there a pre-existing database somewhere? Am I repeating existing work?
- Do similar, easily accessible, databases exist for other languages with distribution systems (R, Clojure, etc...?)
You should be looking at the install_requires
field instead, see New and changed setup
keywords.
requires
is deemed too vague a field to rely on for dependency installation. In addition, there are setup_requires
and test_requires
fields for dependencies required for setup.py
and for running tests.
Certainly, the dependency graph has been analyzed before; from this blog article by Olivier Girardot comes this fantastic image:
The image is linked to the interactive version of the graph.
Here is how you can do it programmatically using python pip
package:
from pip._vendor import pkg_resources # Ensure pip conf index-url pointed to real PyPi Index # Get dependencies from pip package_name = 'Django' try: package_resources = pkg_resources.working_set.by_key[package_name.lower()] # Throws KeyError if not found dependencies = package_resources._dep_map.keys() + ([str(r) for r in package_resources.requires()]) dependencies = list(set(dependencies)) except KeyError: dependencies = []
And here is how you can get dependencies from the PyPi API:
import requests import json package_name = 'Django' # Package info url PYPI_API_URL = 'https://pypi.python.org/pypi/{package_name}/json' package_details_url = PYPI_API_URL.format(package_name=package_name) response = requests.get(package_details_url) data = json.loads(response.content) if response.status_code == 200: dependencies = data['info'].get('requires_dist') dependencies2 = data['info'].get('requires') dependencies3 = data['info'].get('setup_requires') dependencies4 = data['info'].get('test_requires') dependencies5 = data['info'].get('install_requires') if dependencies2: dependencies.extend(dependencies2) if dependencies3: dependencies.extend(dependencies3) if dependencies4: dependencies.extend(dependencies4) if dependencies5: dependencies.extend(dependencies5) dependencies = list(set(dependencies))
You can use recursion to call dependencies of dependencies to get the full tree. Cheers!
来源:https://stackoverflow.com/questions/15708723/python-package-dependency-tree