Stanford NLP parse tree format

后端 未结 2 1701
一生所求
一生所求 2020-12-15 02:05

This may be a silly question, but how does one iterate through a parse tree as an output of an NLP parser (like Stanford NLP)? It\'s all nested brackets, which is neither an

相关标签:
2条回答
  • 2020-12-15 02:29

    This particular output format of the Stanford Parser is call the "bracketed parse (tree)". It is supposed to be read as a graph with

    • words as nodes (e.g. As, an, accountant)
    • phrase/clause as labels (e.g. S, NP, VP)
    • edges are linked hierarchically and
    • typically the parses TOP or root node is a hallucinated ROOT

    (In this case you can read it as a Directed Acyclic Graph (DAG) since it's unidirectional and non-cyclic)

    There are libraries out there to read bracketed parse, e.g. in NLTK's nltk.tree.Tree (http://www.nltk.org/howto/tree.html):

    >>> from nltk.tree import Tree
    >>> output = '(ROOT (S (PP (IN As) (NP (DT an) (NN accountant))) (NP (PRP I)) (VP (VBP want) (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))'
    >>> parsetree = Tree.fromstring(output)
    >>> print parsetree
    (ROOT
      (S
        (PP (IN As) (NP (DT an) (NN accountant)))
        (NP (PRP I))
        (VP
          (VBP want)
          (S (VP (TO to) (VP (VB make) (NP (DT a) (NN payment))))))))
    >>> parsetree.pretty_print()
                               ROOT                             
                                |                                
                                S                               
          ______________________|________                        
         |                  |            VP                     
         |                  |    ________|____                   
         |                  |   |             S                 
         |                  |   |             |                  
         |                  |   |             VP                
         |                  |   |     ________|___               
         PP                 |   |    |            VP            
      ___|___               |   |    |    ________|___           
     |       NP             NP  |    |   |            NP        
     |    ___|______        |   |    |   |         ___|_____     
     IN  DT         NN     PRP VBP   TO  VB       DT        NN  
     |   |          |       |   |    |   |        |         |    
     As  an     accountant  I  want  to make      a      payment
    
    >>> parsetree.leaves()
    ['As', 'an', 'accountant', 'I', 'want', 'to', 'make', 'a', 'payment']
    
    0 讨论(0)
  • 2020-12-15 02:43

    Note that if you're interested in specific nodes in the tree, identified by regex-like rules, you can use this very, very hand class to extract all such nodes using a regex-like matcher:

    http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/tregex/TregexPattern.html

    0 讨论(0)
提交回复
热议问题