Chunking some text with the stanford-nlp

问题

I'm using the stanford core NLP and I use this line to load some modules to process my text:

props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");

Is ther a module that i can load to chunks the text?

Or any suggestion with a alterantive way to use the stanford core to chunk some text?

Thank you

回答1:

I think the parser output can be used to obtain NP chunks. Take a look at the context-free representation on the Stanford Parser website which provides example output.

回答2:

To use chunking with Stanford NLP you can use the following packages:

YamCha: SVM-based NP-chunker, also usable for POS tagging, NER, etc. C/C++ open source. Won CoNLL 2000 shared task. (Less automatic than a specialized POS tagger for an end user.)
Mark Greenwood's Noun Phrase Chunker: A Java reimplementation of Ramshaw and Marcus (1995).
fnTBL: A fast and flexible implementation of Transformation-Based Learning in C++. Includes a POS tagger, but also NP chunking and general chunking models.

Source: http://www-nlp.stanford.edu/links/statnlp.html#NPchunk

回答3:

What you need is the output of constituency parsing in CoreNLP which gives you the information of chunks e.g. Verb Phrases (VPs,) Noun Phrases (NPs,) and etc. To the best of my knowledge though, there is no method in CoreNLP to give you a list of chunks. It means that you have to parse the actual output of the constituency parsing to extract the chunks.

For example, this is the output of constituency parser of CoreNLP for a sample sentence:

(ROOT (S ("" "") (NP (NNP Anarchism)) (VP (VBZ is) (NP (NP (DT a) (JJ political) (NN philosophy)) (SBAR (WHNP (WDT that)) (S (VP (VBZ advocates) (NP (NP (JJ self-governed) (NNS societies)) (VP (VBN based) (PP (IN on) (NP (JJ voluntary) (, ,) (JJ cooperative) (NNS institutions))))))))) (, ,) (S (VP (VBG rejecting) (NP (JJ unjust) (NN hierarchy))))) (. .)))

As you see, there are NP and VP tags in the string, now you have to go and extract the actual text of chunks by parsing this string. Let me know if you could find a method that gives you the list of chunks?!

来源：https://stackoverflow.com/questions/8299897/chunking-some-text-with-the-stanford-nlp

标签

stanford-nlp