可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have some questions about NLTK's tree functions. I am trying to extract a certain word from the tree structure like the one given below.

test = Tree.parse('(ROOT(SBARQ(WHADVP(WRB How))(SQ(VBP do)(NP (PRP you))(VP(VB ask)(NP(DT a)(JJ total)(NN stranger))(PRT (RP out))(PP (IN on)(NP (DT a)(NN date)))))))')  print "Input tree: ", test print test.leaves()  (SBARQ     (WHADVP (WRB How))     (SQ       (VBP do)       (NP (PRP you))       (VP         (VB ask)         (NP (DT a) (JJ total) (NN stranger))         (PRT (RP out))         (PP (IN on) (NP (DT a) (NN date)))))))  ['How', 'do', 'you', 'ask', 'a', 'total', 'stranger', 'out', 'on', 'a', 'date']

I can find a list of all the words using the leaves() function. Is there a way to get a specific leaf only? For example: I would like to get the first/last noun from the NP phrase only? The answer would be 'stranger' for the first noun and 'date' as the last noun.

回答1:

Although noun phrases can be nested inside other types of phrases, I believe most grammars always have nouns in noun phrases. So your question can probably be rephrased as: How do you find the first and last nouns?

You can simply get all tuples of words and POS tags and filter like this,

>>> [word for word,pos in test.pos() if pos=='NN'] ['stranger', 'date']

Which in this case is only two so you're done. If you had more nouns, you would just index the list at [0] and [-1].

If you were looking for another POS that could be used in different phrases but you only wanted its use inside a particular one or if you had a strange grammar that allowed nouns outside of NPs, you can do the following...

You can find subtrees of 'NP' by doing,

>>> NPs = list(test.subtrees(filter=lambda x: x.node=='NP')) >>> NPs [Tree('NP', [Tree('PRP', ['you'])]), Tree('NP', [Tree('DT', ['a']), Tree('JJ', ['total']), Tree('NN', ['stranger'])]), Tree('NP', [Tree('DT', ['a']), Tree('NN', ['date'])])]

Continuing to narrow down the subtrees, we can use this result to look for 'NN' words,

>>> NNs_inside_NPs = map(lambda x: list(x.subtrees(filter=lambda x: x.node=='NN')), NPs) >>> NNs_inside_NPs [[], [Tree('NN', ['stranger'])], [Tree('NN', ['date'])]]

So this is a list of lists of all the 'NN's inside each 'NP' phrases. In this case there happens to only be zero or one noun in each phrase.

Now we just need to go through the 'NP's and get all the leaves of the individual nouns (which really means we just want to access the 'stranger' part of Tree('NN', ['stranger'])).

>>> [noun.leaves()[0] for nouns in NNs_inside_NPs for noun in nouns] ['stranger', 'date']

文章来源: Extracting specific leaf value from nltk tree structure with Python

标签

np问题