Finding head of a noun phrase in NLTK and stanford parse according to the rules of finding head of a NP

后端 未结 2 1547
予麋鹿
予麋鹿 2021-02-14 03:44

generally A head of a nounphrase is a noun which is rightmost of the NP as shown below tree is the head of the parent NP. So

            ROOT                                  


        
2条回答
  •  耶瑟儿~
    2021-02-14 04:07

    I was looking for a python script using NLTK that does this task and stumbled across this post. Here's the solution I came up with. It's a little bit noisy and arbitrary, and definitely doesn't always pick the right answer (e.g. for compound nouns). But I wanted to post it in case it was helpful for others to have a solution that mostly works.

    #!/usr/bin/env python
    
    from nltk.tree import Tree
    
    examples = [
        '(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
        "(ROOT\n  (S\n    (NP\n      (NP (DT the) (NN person))\n      (SBAR\n        (WHNP (WDT that))\n        (S\n          (VP (VBD gave)\n            (NP (DT the) (NN talk))))))\n    (VP (VBD went)\n      (NP (NN home)))))",
        '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
    ]
    
    def find_noun_phrases(tree):
        return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]
    
    def find_head_of_np(np):
        noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
        top_level_trees = [np[i] for i in range(len(np)) if type(np[i]) is Tree]
        ## search for a top-level noun
        top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
        if len(top_level_nouns) > 0:
            ## if you find some, pick the rightmost one, just 'cause
            return top_level_nouns[-1][0]
        else:
            ## search for a top-level np
            top_level_nps = [t for t in top_level_trees if t.label()=='NP']
            if len(top_level_nps) > 0:
                ## if you find some, pick the head of the rightmost one, just 'cause
                return find_head_of_np(top_level_nps[-1])
            else:
                ## search for any noun
                nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
                if len(nouns) > 0:
                    ## if you find some, pick the rightmost one, just 'cause
                    return nouns[-1]
                else:
                    ## return the rightmost word, just 'cause
                    return np.leaves()[-1]
    
    for example in examples:
        tree = Tree.fromstring(example)
        for np in find_noun_phrases(tree):
            print "noun phrase:",
            print " ".join(np.leaves())
            head = find_head_of_np(np)
            print "head:",
            print head
    

    For the examples discussed in the question and in the other answers, this is the output:

    noun phrase: The old oak tree from India
    head: tree
    noun phrase: The old oak tree
    head: tree
    noun phrase: India
    head: India
    noun phrase: the person that gave the talk
    head: person
    noun phrase: the person
    head: person
    noun phrase: the talk
    head: talk
    noun phrase: home
    head: home
    noun phrase: Carnac the Magnificent
    head: Magnificent
    noun phrase: a talk
    head: talk
    

提交回复
热议问题