Ngram model and perplexity in NLTK

送分小仙女□ 提交于 2019-12-02 20:54:41

You are getting a low perplexity because you are using a pentagram model. If you'd use a bigram model your results will be in more regular ranges of about 50-1000 (or about 5 to 10 bits).

Given your comments, are you using NLTK-3.0alpha? You shouldn't, at least not for language modeling:

https://github.com/nltk/nltk/issues?labels=model

As a matter of fact, the whole model module has been dropped from the NLTK-3.0a4 pre-release until the issues are fixed.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!