问题
The Penn Treebank format does not annotate the internal structure of a noun phrase, e.g.
(NP (JJ crude) (NN oil) (NNS prices))
or
(NP
(NP (DT the) (JJ big) (JJ blue) (NN house))
(SBAR
(WHNP (WDT that))
(S
(VP (VBD was)
(VP (VBN built)
(PP (IN near)
(NP (DT the) (NN river)))))))
I would like to extract the heads (prices and house). Do you know of any tool that can do this?
回答1:
Michael Collins dissertation (Appendix A) includes head-finding rules for the Penn Treebank that work reasonably well and are not difficult to implement. They're far from perfect, though, since it's not the easiest task.
The work by David Vadas and James Curran on NP structure in the Penn Treebank could also be relevant:
- David Vadas's website with additional NP annotation:
- Papers:
- Adding Noun Phrase Structure to the Penn Treebank
- Parsing Noun Phrases in the Penn Treebank
回答2:
As aab suggested, simple deterministic head-finding rules can work quite well (also see references to Magerman or Charniak head-finding rules for similar approaches).
You might also look at extracting dependency structure from the constituent trees. The Stanford toolset does this quite well: See http://nlp.stanford.edu/software/stanford-dependencies.shtml
回答3:
You can also find head finding rules of English in Dan Bikel 's thesis (if you need source code, you can find in his homepage in parser software)
来源:https://stackoverflow.com/questions/10297345/head-finding-rules-for-noun-phrases