Python - Get path of root project structure

后端 未结 16 1811
陌清茗
陌清茗 2020-12-04 08:03

I\'ve got a python project with a configuration file in the project root. The configuration file needs to be accessed in a few different files throughout the project.

相关标签:
16条回答
  • 2020-12-04 08:24

    If you are working with anaconda-project, you can query the PROJECT_ROOT from the environment variable --> os.getenv('PROJECT_ROOT'). This works only if the script is executed via anaconda-project run .

    If you do not want your script run by anaconda-project, you can query the absolute path of the executable binary of the Python interpreter you are using and extract the path string up to the envs directory exclusiv. For example: The python interpreter of my conda env is located at:

    /home/user/project_root/envs/default/bin/python

    # You can first retrieve the env variable PROJECT_DIR.
    # If not set, get the python interpreter location and strip off the string till envs inclusiv...
    
    if os.getenv('PROJECT_DIR'):
        PROJECT_DIR = os.getenv('PROJECT_DIR')
    else:
        PYTHON_PATH = sys.executable
        path_rem = os.path.join('envs', 'default', 'bin', 'python')
        PROJECT_DIR = py_path.split(path_rem)[0]
    

    This works only with conda-project with fixed project structure of a anaconda-project

    0 讨论(0)
  • 2020-12-04 08:25

    At the time of writing, none of the other solutions are very self-contained. They depend either on an environment variable or the position of the module in the package structure. The top answer with the ‘Django’ solution falls victim to the latter by requiring a relative import. It also has the disadvantage of having to modify a module at the top level.

    This should be the correct approach for finding the directory path of the top-level package:

    import sys
    import os
    
    root_name, _, _ = __name__.partition('.')
    root_module = sys.modules[root_name]
    root_dir = os.path.dirname(root_module.__file__)
    
    config_path = os.path.join(root_dir, 'configuration.conf')
    

    It works by taking the first component in the dotted string contained in __name__ and using it as a key in sys.modules which returns the module object of the top-level package. Its __file__ attribute contains the path we want after trimming off /__init__.py using os.path.dirname().

    This solution is self-contained. It works anywhere in any module of the package, including in the top-level __init__.py file.

    0 讨论(0)
  • 2020-12-04 08:26

    I've recently been trying to do something similar and I have found these answers inadequate for my use cases (a distributed library that needs to detect project root). Mainly I've been battling different environments and platforms, and still haven't found something perfectly universal.

    Code local to project

    I've seen this example mentioned and used in a few places, Django, etc.

    import os
    print(os.path.dirname(os.path.abspath(__file__)))
    

    Simple as this is, it only works when the file that the snippet is in is actually part of the project. We do not retrieve the project directory, but instead the snippet's directory

    Similarly, the sys.modules approach breaks down when called from outside the entrypoint of the application, specifically I've observed a child thread cannot determine this without relation back to the 'main' module. I've explicitly put the import inside a function to demonstrate an import from a child thread, moving it to top level of app.py would fix it.

    app/
    |-- config
    |   `-- __init__.py
    |   `-- settings.py
    `-- app.py
    

    app.py

    #!/usr/bin/env python
    import threading
    
    
    def background_setup():
        # Explicitly importing this from the context of the child thread
        from config import settings
        print(settings.ROOT_DIR)
    
    
    # Spawn a thread to background preparation tasks
    t = threading.Thread(target=background_setup)
    t.start()
    
    # Do other things during initialization
    
    t.join()
    
    # Ready to take traffic
    

    settings.py

    import os
    import sys
    
    
    ROOT_DIR = None
    
    
    def setup():
        global ROOT_DIR
        ROOT_DIR = os.path.dirname(sys.modules['__main__'].__file__)
        # Do something slow
    

    Running this program produces an attribute error:

    >>> import main
    >>> Exception in thread Thread-1:
    Traceback (most recent call last):
      File "C:\Python2714\lib\threading.py", line 801, in __bootstrap_inner
        self.run()
      File "C:\Python2714\lib\threading.py", line 754, in run
        self.__target(*self.__args, **self.__kwargs)
      File "main.py", line 6, in background_setup
        from config import settings
      File "config\settings.py", line 34, in <module>
        ROOT_DIR = get_root()
      File "config\settings.py", line 31, in get_root
        return os.path.dirname(sys.modules['__main__'].__file__)
    AttributeError: 'module' object has no attribute '__file__'
    

    ...hence a threading-based solution

    Location independent

    Using the same application structure as before but modifying settings.py

    import os
    import sys
    import inspect
    import platform
    import threading
    
    
    ROOT_DIR = None
    
    
    def setup():
        main_id = None
        for t in threading.enumerate():
            if t.name == 'MainThread':
                main_id = t.ident
                break
    
        if not main_id:
            raise RuntimeError("Main thread exited before execution")
    
        current_main_frame = sys._current_frames()[main_id]
        base_frame = inspect.getouterframes(current_main_frame)[-1]
    
        if platform.system() == 'Windows':
            filename = base_frame.filename
        else:
            filename = base_frame[0].f_code.co_filename
    
        global ROOT_DIR
        ROOT_DIR = os.path.dirname(os.path.abspath(filename))
    

    Breaking this down: First we want to accurately find the thread ID of the main thread. In Python3.4+ the threading library has threading.main_thread() however, everybody doesn't use 3.4+ so we search through all threads looking for the main thread save it's ID. If the main thread has already exited, it won't be listed in the threading.enumerate(). We raise a RuntimeError() in this case until I find a better solution.

    main_id = None
    for t in threading.enumerate():
        if t.name == 'MainThread':
            main_id = t.ident
            break
    
    if not main_id:
        raise RuntimeError("Main thread exited before execution")
    

    Next we find the very first stack frame of the main thread. Using the cPython specific function sys._current_frames() we get a dictionary of every thread's current stack frame. Then utilizing inspect.getouterframes() we can retrieve the entire stack for the main thread and the very first frame. current_main_frame = sys._current_frames()[main_id] base_frame = inspect.getouterframes(current_main_frame)[-1] Finally, the differences between Windows and Linux implementations of inspect.getouterframes() need to be handled. Using the cleaned up filename, os.path.abspath() and os.path.dirname() clean things up.

    if platform.system() == 'Windows':
        filename = base_frame.filename
    else:
        filename = base_frame[0].f_code.co_filename
    
    global ROOT_DIR
    ROOT_DIR = os.path.dirname(os.path.abspath(filename))
    

    So far I've tested this on Python2.7 and 3.6 on Windows as well as Python3.4 on WSL

    0 讨论(0)
  • 2020-12-04 08:28

    I had to implement a custom solution because it's not as simple as you might think. My solution is based on stack trace inspection (inspect.stack()) + sys.path and is working fine no matter the location of the python module in which the function is invoked nor the interpreter (I tried by running it in PyCharm, in a poetry shell and other...). This is the full implementation with comments:

    def get_project_root_dir() -> str:
        """
        Returns the name of the project root directory.
    
        :return: Project root directory name
        """
    
        # stack trace history related to the call of this function
        frame_stack: [FrameInfo] = inspect.stack()
    
        # get info about the module that has invoked this function
        # (index=0 is always this very module, index=1 is fine as long this function is not called by some other
        # function in this module)
        frame_info: FrameInfo = frame_stack[1]
    
        # if there are multiple calls in the stacktrace of this very module, we have to skip those and take the first
        # one which comes from another module
        if frame_info.filename == __file__:
            for frame in frame_stack:
                if frame.filename != __file__:
                    frame_info = frame
                    break
    
        # path of the module that has invoked this function
        caller_path: str = frame_info.filename
    
        # absolute path of the of the module that has invoked this function
        caller_absolute_path: str = os.path.abspath(caller_path)
    
        # get the top most directory path which contains the invoker module
        paths: [str] = [p for p in sys.path if p in caller_absolute_path]
        paths.sort(key=lambda p: len(p))
        caller_root_path: str = paths[0]
    
        if not os.path.isabs(caller_path):
            # file name of the invoker module (eg: "mymodule.py")
            caller_module_name: str = Path(caller_path).name
    
            # this piece represents a subpath in the project directory
            # (eg. if the root folder is "myproject" and this function has ben called from myproject/foo/bar/mymodule.py
            # this will be "foo/bar")
            project_related_folders: str = caller_path.replace(os.sep + caller_module_name, '')
    
            # fix root path by removing the undesired subpath
            caller_root_path = caller_root_path.replace(project_related_folders, '')
    
        dir_name: str = Path(caller_root_path).name
    
        return dir_name
    
    0 讨论(0)
  • 2020-12-04 08:29

    Other answers advice to use a file in the top-level of the project. This is not necessary if you use pathlib.Path and parent (Python 3.4 and up). Consider the following directory structure where all files except README.md and utils.py have been omitted.

    project
    │   README.md
    |
    └───src
    │   │   utils.py
    |   |   ...
    |   ...
    

    In utils.py we define the following function.

    from pathlib import Path
    
    def get_project_root() -> Path:
        return Path(__file__).parent.parent
    

    In any module in the project we can now get the project root as follows.

    from src.utils import get_project_root
    
    root = get_project_root()
    

    Benefits: Any module which calls get_project_root can be moved without changing program behavior. Only when the module utils.py is moved we have to update get_project_root and the imports (refactoring tools can be used to automate this).

    0 讨论(0)
  • 2020-12-04 08:29

    I struggled with this problem too until I came to this solution. This is the cleanest solution in my opinion.

    In your setup.py add "packages"

    setup(
    name='package_name'
    version='0.0.1'
    .
    .
    .
    packages=['package_name']
    .
    .
    .
    )
    

    In your python_script.py

    import pkg_resources
    import os
    
    resource_package = pkg_resources.get_distribution(
        'package_name').location
    config_path = os.path.join(resource_package,'configuration.conf')
    
    0 讨论(0)
提交回复
热议问题