问题
I've been playing with NLTK for awhile already and am at the point to define custom parser grammar for special chunking. I am following the description in http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html but what I am interested to do is slightly different than what is described in the chapter. For instance in example 7.10 instead using the following for the verb phase: VP: {<VB.*><NP|PP|CLAUSE>+$} I would like to just match sentences that use one particular verb and not any verb. Something like: VP: {go<NP|PP|CLAUSE>+$}
In other words I would like to match the actual word and not the PoS tag for the word and mix and match actual words and PoS tags in the regular expression.
Is this possible?
回答1:
Not with the standard PoS tags churned out by the nltk pos-tagger.
If you need to do grammars for different verbs, a useful hack might be to preprocess the tags and append the token to the tag for all the verbs. Hence you could use a regex string that looks like VP: {+$}
来源:https://stackoverflow.com/questions/12755638/mixing-words-and-pos-tags-in-nltk-parser-grammars