Construct a tree from list os file paths (Python) - Performance dependent

后端 未结 3 741
盖世英雄少女心
盖世英雄少女心 2020-12-08 22:49

Hey I am working on a very high performance file-managing/analyzing toolkit written in python. I want to create a function that gives me a list or something like that in a

3条回答
  •  悲哀的现实
    2020-12-08 23:33

    Now that you clarified the question a bit more, I guess the following is what you want:

    from collections import defaultdict
    
    input_ = '''dir/file
    dir/dir2/file2
    dir/file3
    dir2/alpha/beta/gamma/delta
    dir2/alpha/beta/gamma/delta/
    dir3/file4
    dir3/file5'''
    
    FILE_MARKER = ''
    
    def attach(branch, trunk):
        '''
        Insert a branch of directories on its trunk.
        '''
        parts = branch.split('/', 1)
        if len(parts) == 1:  # branch is a file
            trunk[FILE_MARKER].append(parts[0])
        else:
            node, others = parts
            if node not in trunk:
                trunk[node] = defaultdict(dict, ((FILE_MARKER, []),))
            attach(others, trunk[node])
    
    def prettify(d, indent=0):
        '''
        Print the file tree structure with proper indentation.
        '''
        for key, value in d.iteritems():
            if key == FILE_MARKER:
                if value:
                    print '  ' * indent + str(value)
            else:
                print '  ' * indent + str(key)
                if isinstance(value, dict):
                    prettify(value, indent+1)
                else:
                    print '  ' * (indent+1) + str(value)
    
    
    
    main_dict = defaultdict(dict, ((FILE_MARKER, []),))
    for line in input_.split('\n'):
        attach(line, main_dict)
    
    prettify(main_dict)
    

    It outputs:

    dir3
      ['file4', 'file5']
    dir2
      alpha
        beta
          gamma
            ['delta']
            delta
              ['']
    dir
      dir2
        ['file2']
      ['file', 'file3']
    

    A few thing to note:

    • The script make heavy use of defaultdicts, basically this allows to skip checking for the existence of a key and its initialisation if it is not there
    • Directory names are mapped to dictionary keys, I thought this might be a good feature for you, as key are hashed and you will able to retrieve information much faster this way than with lists. You can access the hierarchy in the form main_dict['dir2']['alpha']['beta']...
    • Note the difference between .../delta and .../delta/. I thought this was helpful for you to be able to quickly differenciate between your leaf being a directory or a file.

    I hope this answers your question. If anything is unclear, post a comment.

提交回复
热议问题