问题
I have one example where Stanford NLP outputs a weird parse tree for the sentence:
Clean my desk
(ROOT
(NP
(NP (JJ Clean))
(NP (PRP$ my) (NN desk))))
As you can see, it tags the word Clean
as an adjective dependent on the verb desk
with the whole phrase being tagged as a Noun Phrase
, while my expectation is for Clean
to be tagged as a verb, and the phase as a Verb Phrase
.
The JJ-PRP$-NN combination simply doesn't make sense in English to me. Anyone ever run into something similar? I know that Stanford NLP results sometimes differ based on the sequence (?) of parsing tools run. How to make this tag properly?
回答1:
CoreNLP is notoriously bad at these sorts of imperative statements. This error is likely from the POS tagger mis-tagging "clean" as an adjective, although it appears the parser is also making the same mistake.
回答2:
As it happens, if you feed the sentence "Clean my desk"
directly to the parser (actually, the 'tokenize', 'ssplit' and 'parse' tools), it gives the following result:
(ROOT (NP (NP (NNP Clean)) (NP (PRP$ my) (NN desk))))
However, now "Clean"
is a Proper Noun - very clever, Stanford. So, if we feed the sentence in with the first word in lowercase - "clean my desk"
- we finally get what we are looking for:
(ROOT (S (VP (VB clean) (NP (PRP$ my) (NN desk)))))
Be careful not to convert the full sentence into lowercase. While testing I've noticed the the word "I"
turned into lowercase "i"
is tagged as FW (Foreign Word), so only covert the first word to lowercase.
来源:https://stackoverflow.com/questions/35872324/stanford-nlp-vp-vs-np