Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP

ⅰ亾dé卋堺 提交于 2019-12-03 11:36:14

There are built-in string to Tree object in NLTK (http://www.nltk.org/_modules/nltk/tree.html), see https://github.com/nltk/nltk/blob/develop/nltk/tree.py#L541.

>>> from nltk.tree import Tree
>>> parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i
... 
(NP
  (NP (DT The) (JJ old) (NN oak) (NN tree))
  (PP (IN from) (NP (NNP India))))
(NP (DT The) (JJ old) (NN oak) (NN tree))
(NP (NNP India))


>>> for i in Tree.fromstring(parsestr).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()
... 
['The', 'old', 'oak', 'tree', 'from', 'India']
['The', 'old', 'oak', 'tree']
['India']

Note that it's not always the case that right most noun is the head noun of an NP, e.g.

>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()[-1]
... 
Magnificent
talk

Arguably, Magnificent can still be the head noun. Another example is when the NP includes a relative clause:

(NP (NP the person) that gave (NP the talk)) went home

The head noun of the subject is person but the last leave node of the NP the person that gave the talk is talk.

I was looking for a python script using NLTK that does this task and stumbled across this post. Here's the solution I came up with. It's a little bit noisy and arbitrary, and definitely doesn't always pick the right answer (e.g. for compound nouns). But I wanted to post it in case it was helpful for others to have a solution that mostly works.

#!/usr/bin/env python

from nltk.tree import Tree

examples = [
    '(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
    "(ROOT\n  (S\n    (NP\n      (NP (DT the) (NN person))\n      (SBAR\n        (WHNP (WDT that))\n        (S\n          (VP (VBD gave)\n            (NP (DT the) (NN talk))))))\n    (VP (VBD went)\n      (NP (NN home)))))",
    '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
]

def find_noun_phrases(tree):
    return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]

def find_head_of_np(np):
    noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
    top_level_trees = [np[i] for i in range(len(np)) if type(np[i]) is Tree]
    ## search for a top-level noun
    top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
    if len(top_level_nouns) > 0:
        ## if you find some, pick the rightmost one, just 'cause
        return top_level_nouns[-1][0]
    else:
        ## search for a top-level np
        top_level_nps = [t for t in top_level_trees if t.label()=='NP']
        if len(top_level_nps) > 0:
            ## if you find some, pick the head of the rightmost one, just 'cause
            return find_head_of_np(top_level_nps[-1])
        else:
            ## search for any noun
            nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
            if len(nouns) > 0:
                ## if you find some, pick the rightmost one, just 'cause
                return nouns[-1]
            else:
                ## return the rightmost word, just 'cause
                return np.leaves()[-1]

for example in examples:
    tree = Tree.fromstring(example)
    for np in find_noun_phrases(tree):
        print "noun phrase:",
        print " ".join(np.leaves())
        head = find_head_of_np(np)
        print "head:",
        print head

For the examples discussed in the question and in the other answers, this is the output:

noun phrase: The old oak tree from India
head: tree
noun phrase: The old oak tree
head: tree
noun phrase: India
head: India
noun phrase: the person that gave the talk
head: person
noun phrase: the person
head: person
noun phrase: the talk
head: talk
noun phrase: home
head: home
noun phrase: Carnac the Magnificent
head: Magnificent
noun phrase: a talk
head: talk
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!