Mixing words and PoS tags in NLTK parser grammars

问题

I've been playing with NLTK for awhile already and am at the point to define custom parser grammar for special chunking. I am following the description in http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html but what I am interested to do is slightly different than what is described in the chapter. For instance in example 7.10 instead using the following for the verb phase: VP: {<VB.*><NP|PP|CLAUSE>+$} I would like to just match sentences that use one particular verb and not any verb. Something like: VP: {go<NP|PP|CLAUSE>+$}

In other words I would like to match the actual word and not the PoS tag for the word and mix and match actual words and PoS tags in the regular expression.

Is this possible?

回答1:

Not with the standard PoS tags churned out by the nltk pos-tagger.

If you need to do grammars for different verbs, a useful hack might be to preprocess the tags and append the token to the tag for all the verbs. Hence you could use a regex string that looks like VP: {+$}

来源：https://stackoverflow.com/questions/12755638/mixing-words-and-pos-tags-in-nltk-parser-grammars

标签

python

python-2.7

nlp

nltk

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!