How do you use the GATE Twitter part-of-speech tagger as a model in the StanfordCoreNLP code?

会有一股神秘感。 提交于 2019-12-04 21:31:57

I just tried the model from the recent release (2014 April 11 for v3.3.1) and things worked great with this command:

./corenlp.sh -file tweets.txt -pos.model gate-EN-twitter.model -ssplit.newlineIsSentenceBreak always

Just tested against Stanford 3.3.1

Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
props.put("pos.model", "gate-EN-twitter.model");
props.put("dcoref.score", true);
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);

Sample tweet:

ikr smh he asked fir yo last name so he can add u on fb lololol

Results without gate-EN-twitter.model

word: ikr :: pos: NN :: ne:O
word: smh :: pos: NN :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: NNP :: ne:O
word: yo :: pos: NNP :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: NN :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NN :: ne:O
word: lololol :: pos: NN :: ne:O

Results with gate-EN-twitter.model

word: ikr :: pos: UH :: ne:O
word: smh :: pos: UH :: ne:O
word: he :: pos: PRP :: ne:O
word: asked :: pos: VBD :: ne:O
word: fir :: pos: IN :: ne:O
word: yo :: pos: PRP$ :: ne:O
word: last :: pos: JJ :: ne:O
word: name :: pos: NN :: ne:O
word: so :: pos: IN :: ne:O
word: he :: pos: PRP :: ne:O
word: can :: pos: MD :: ne:O
word: add :: pos: VB :: ne:O
word: u :: pos: PRP :: ne:O
word: on :: pos: IN :: ne:O
word: fb :: pos: NNP :: ne:O
word: lololol :: pos: UH :: ne:O

I've also just tested this against v3.6.0 (Apr 2016) of StanfordCoreNLP and the POS tagger worked fine straightaway (from .NET code).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!