Python file parsing: Build tree from text file

前端 未结 3 1152
慢半拍i
慢半拍i 2020-12-13 20:20

I have an indented text file that will be used to build a tree. Each line represents a node, and indents represent depth as well as node the current node is a child of.

相关标签:
3条回答
  • 2020-12-13 20:48

    If you don't insist on recursion, this works too:

    from itertools import takewhile
    
    is_tab = '\t'.__eq__
    
    def build_tree(lines):
        lines = iter(lines)
        stack = []
        for line in lines:
            indent = len(list(takewhile(is_tab, line)))
            stack[indent:] = [line.lstrip()]
            print stack
    
    source = '''ROOT
    \tNode1
    \t\tNode2
    \t\t\tNode3
    \t\t\t\tNode4
    \tNode5
    \tNode6'''
    
    build_tree(source.split('\n'))
    

    Result:

    ['ROOT']
    ['ROOT', 'Node1']
    ['ROOT', 'Node1', 'Node2']
    ['ROOT', 'Node1', 'Node2', 'Node3']
    ['ROOT', 'Node1', 'Node2', 'Node3', 'Node4']
    ['ROOT', 'Node5']
    ['ROOT', 'Node6']
    
    0 讨论(0)
  • 2020-12-13 20:49

    The big issue is the "lookahead" that I think caused the ugliness in question. It can be shortened slightly:

    def _recurse_tree(parent, depth, source):
        last_line = source.readline().rstrip()
        while last_line:
            tabs = last_line.count('\t')
            if tabs < depth:
                break
            node = last_line.strip()
            if tabs >= depth:
                if parent is not None:
                    print "%s: %s" %(parent, node)
                last_line = _recurse_tree(node, tabs+1, source)
        return last_line
    
    inFile = open("test.txt")
    _recurse_tree(None, 0, inFile)
    

    Since we're talking recursion, I took pains to avoid any global variables (source and last_line). It would be more pythonic to make them members on some parser object.

    0 讨论(0)
  • 2020-12-13 20:53

    I would not use recursion for something like this at all (Ok, maybe I would if I was coding this in a language like Scheme, but this is Python here). Recursion is great for iterating over data that is shaped like a tree, and in those cases it would simplify your design greatly when compared to normal loops.

    However, this is not the case here. Your data surely represents a tree, but it's formatted sequentially, i.e. it is a simple sequence of lines. Such data is most easily processed with a simple loop, although you could make the design more general, if you wish, by separating it into three different layers: the sequential reader (which will parse the tabs as a specification of depth level), the tree inserter (which inserts a node into a tree in a specific depth level, by keeping track of the last node which was inserted into the tree) and the tree itself:

    import re
    
    # *** Tree representation ***
    class Node(object):
        def __init__(self, title):
            self.title = title
            self.parent = None
            self.children = []
    
        def add(self, child):
            self.children.append(child)
            child.parent = self
    
    # *** Node insertion logic ***
    class Inserter(object):
        def __init__(self, node, depth = 0):
            self.node = node
            self.depth = depth
    
        def __call__(self, title, depth):
            newNode = Node(title)
            if (depth > self.depth):
                self.node.add(newNode)
                self.depth = depth
            elif (depth == self.depth):
                self.node.parent.add(newNode)
            else:
                parent = self.node.parent
                for i in xrange(0, self.depth - depth):
                    parent = parent.parent
                parent.add(newNode)
                self.depth = depth
    
            self.node = newNode
    
    # *** File iteration logic ***
    with open(r'tree.txt', 'r') as f:    
        tree = Node(f.readline().rstrip('\n'))
        inserter = Inserter(tree)
    
        for line in f:
            line = line.rstrip('\n')
            # note there's a bug with your original tab parsing code:
            # it would count all tabs in the string, not just the ones
            # at the beginning
            tabs = re.match('\t*', line).group(0).count('\t')
            title = line[tabs:]
            inserter(title, tabs)
    

    When I had to test this code before pasting it here, I wrote a very simple function to pretty print the tree that I read to memory. For this function, the most natural thing was to use recursion of course, because now the tree is indeed represented as tree data:

    def print_tree(node, depth = 0):
        print '%s%s' % ('  ' * depth, node.title)
        for child in node.children:
            print_tree(child, depth + 1)
    
    print_tree(tree)
    
    0 讨论(0)
提交回复
热议问题