How to read constituency based parse tree

问题

I have a corpus of sentences that were preprocessed by Stanford's CoreNLP systems. One of the things it provides is the sentence's Parse Tree (Constituency-based). While I can understand a parse tree when it's drawn (like a tree), I'm not sure how to read it in this format:

E.g.:

          (ROOT
          (FRAG
          (NP (NN sent28))
          (: :)
          (S
          (NP (NNP Rome))
          (VP (VBZ is)
          (PP (IN in)
          (NP
          (NP (NNP Lazio) (NN province))
          (CC and)
          (NP
          (NP (NNP Naples))
          (PP (IN in)
          (NP (NNP Campania))))))))
          (. .)))

The original sentence is:

sent28: Rome is in Lazio province and Naples in Campania .

How am I supposed to read this tree, or alternatively, is there a code (in python) that does it properly? Thanks.

回答1:

NLTK has a class for reading parse trees: nltk.tree.Tree. The relevant method is called fromstring. You can then iterate its subtrees, leaves, etc...

As an aside: you might want to remove the bit that says sent28: as it confuses the parser (it's also not a part of the sentence). You are not getting a full parse tree, but just a sentence fragment.

回答2:

You can just use stanford parser like:

sentences = parser.raw_parse_sents(["Hello, My name is Melroy.", "What is your name?"])  #probably raw_parse(just a string) or parse_sents(list but has been splited)
for line in sentences:
    for sentence in line:
        ***sentence.draw()***

来源：https://stackoverflow.com/questions/28674417/how-to-read-constituency-based-parse-tree

标签

python

parsing

nlp

parse-tree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!