Stanford NLP core 4.0.0 no longer splitting verbs and pronouns in Spanish

此生再无相见时 提交于 2021-01-29 16:14:40

问题


Very helpfully Stanford NLP core 3.9.2 used to split rolled together Spanish verbs and pronouns

This is the 4.0.0 output:

The previous version had more .tagger files. These have not been included with the 4.0.0 distribution.

Is that the cause. Will be they added back?


回答1:


There are some documentation updates that still need to be made for Stanford CoreNLP 4.0.0.

A major change is that a new multi-word-token annotator has been added, that makes tokenization conform with the UD standard. So the new default Spanish pipeline should run tokenize,ssplit,mwt,pos,depparse,ner. It may not be possible to run such a pipeline from the server demo at this time, as some modifications will need to be made. I can try to send you what such modifications would be soon. We will try to make a new release in early summer to handle issues like this that we missed.

It won't split the word in your example unfortunately, but I think in many cases it will do the correct thing. The Spanish mwt model is just based off of a large dictionary of terms, and was tuned to optimize performance on the Spanish training data.



来源:https://stackoverflow.com/questions/61540771/stanford-nlp-core-4-0-0-no-longer-splitting-verbs-and-pronouns-in-spanish

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!