Chunking some text with the stanford-nlp

老子叫甜甜 提交于 2019-12-09 09:56:53

问题


I'm using the stanford core NLP and I use this line to load some modules to process my text:

props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");

Is ther a module that i can load to chunks the text?

Or any suggestion with a alterantive way to use the stanford core to chunk some text?

Thank you


回答1:


I think the parser output can be used to obtain NP chunks. Take a look at the context-free representation on the Stanford Parser website which provides example output.




回答2:


To use chunking with Stanford NLP you can use the following packages:

  • YamCha: SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)
  • Mark Greenwood's Noun Phrase Chunker: A Java reimplementation of Ramshaw and Marcus (1995).
  • fnTBL: A fast and flexible implementation of Transformation-Based Learning in C++. Includes a POS tagger, but also NP chunking and general chunking models.

Source: http://www-nlp.stanford.edu/links/statnlp.html#NPchunk




回答3:


What you need is the output of constituency parsing in CoreNLP which gives you the information of chunks e.g. Verb Phrases (VPs,) Noun Phrases (NPs,) and etc. To the best of my knowledge though, there is no method in CoreNLP to give you a list of chunks. It means that you have to parse the actual output of the constituency parsing to extract the chunks.

For example, this is the output of constituency parser of CoreNLP for a sample sentence:

(ROOT (S ("" "") (NP (NNP Anarchism)) (VP (VBZ is) (NP (NP (DT a) (JJ political) (NN philosophy)) (SBAR (WHNP (WDT that)) (S (VP (VBZ advocates) (NP (NP (JJ self-governed) (NNS societies)) (VP (VBN based) (PP (IN on) (NP (JJ voluntary) (, ,) (JJ cooperative) (NNS institutions))))))))) (, ,) (S (VP (VBG rejecting) (NP (JJ unjust) (NN hierarchy))))) (. .)))

As you see, there are NP and VP tags in the string, now you have to go and extract the actual text of chunks by parsing this string. Let me know if you could find a method that gives you the list of chunks?!



来源:https://stackoverflow.com/questions/8299897/chunking-some-text-with-the-stanford-nlp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!