What created `maxent_treebank_pos_tagger/english.pickle`?

☆樱花仙子☆ 提交于 2019-12-06 20:19:08

问题


The nltk package's built-in part-of-speech tagger does not seem to be optimized for my use-case (here, for instance). The source code here shows that it's using a saved, pre-trained classifier called maxent_treebank_pos_tagger.

What created maxent_treebank_pos_tagger/english.pickle? I'm guessing that there is a tagged corpus out there somewhere that was used to train this tagger, so I think I'm looking for (a) that tagged corpus and (b) the exact code that trains the tagger based on the tagged corpus.

In addition to lots of googling, so far I tried to look at the .pickle object directly to find any clues inside it, starting like this

from nltk.data import load
x = load("nltk_data/taggers/maxent_treebank_pos_tagger/english.pickle")
dir(x)

回答1:


The NLTK source is https://github.com/nltk/nltk/blob/develop/nltk/tag/init.py#L83

The original source of NLTK's MaxEnt POS tagger is from https://github.com/arne-cl/nltk-maxent-pos-tagger

Training Data: Wall Street Journal subset of the Penn Tree bank corpus

Features: Ratnaparki (1996)

Algorithm: Maximum Entropy

Accuracy: What is the accuracy of nltk pos_tagger?



来源:https://stackoverflow.com/questions/31386224/what-created-maxent-treebank-pos-tagger-english-pickle

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!