How to work around 100K character limit for the StanfordNLP server?

泄露秘密 提交于 2019-12-04 01:42:36

问题


I am trying to parse book-length blocks of text with StanfordNLP. The http requests work great, but there is a non-configurable 100KB limit to the text length, MAX_CHAR_LENGTH in StanfordCoreNLPServer.java.

For now, I am chopping up the text before I send it to the server, but even if I try to split between sentences and paragraphs, there is some useful coreference information that gets lost between these chunks. Presumably, I could parse chunks with large overlap and link them together, but that seems (1) inelegant and (2) like quite a bit of maintenance.

Is there a better way to configure the server or the requests to either remove the manual chunking or preserve the information across chunks?

BTW, I am POSTing using the python requests module, but I doubt that makes a difference unless a corenlp python wrapper deals with this problem somehow.


回答1:


You should be able to start the server with the flag -maxCharLength -1 and that'll get rid of the sentence length limit. Note that this is inadvisable in production: arbitrarily large documents can consume arbitrarily large amounts of memory (and time), especially with things like coref.

The list of options to the server should be accessible by calling the server with -help, and are documented in code here.



来源:https://stackoverflow.com/questions/46678204/how-to-work-around-100k-character-limit-for-the-stanfordnlp-server

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!