stanford-nlp

Chunking some text with the stanford-nlp

老子叫甜甜 提交于 2019-12-09 09:56:53
问题 I'm using the stanford core NLP and I use this line to load some modules to process my text: props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); Is ther a module that i can load to chunks the text? Or any suggestion with a alterantive way to use the stanford core to chunk some text? Thank you 回答1: I think the parser output can be used to obtain NP chunks. Take a look at the context-free representation on the Stanford Parser website which provides example output. 回答2:

Why does Stanford CoreNLP NER-annotator load 3 models by default?

∥☆過路亽.° 提交于 2019-12-09 07:09:38
问题 When I add the "ner" annotator to my StanfordCoreNLP object pipeline, I can see that it loads 3 models, which takes a lot of time: Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].

CWB encoding Corpus

早过忘川 提交于 2019-12-09 06:40:29
According to the Corpus Work Bench, to encode a corpus i need to use the cwb-encode perl script "encode the corpus, i.e. convert the verticalized text to CWB binary format with the cwb-encode tool. Note that the command below has to be entered on a single line." http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html $ cwb-encode -d /corpora/data/example -f example.vrt -R /usr/local/share/cwb/registry/example -P pos -S s when i tried it, it says the file is missing but i'm sure the file is in $HOME/corpora/data/example, the error was $ cwb-encode -d /corpora/data

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

≯℡__Kan透↙ 提交于 2019-12-09 06:05:38
问题 I'm using Eclipse to run java program class, while I run it i got this error Exception in thread "main" java.lang.OutOfMemoryError: Java heap space then i changed the VM from the Properties > Run > VM Options, and I run the program again i got a new error, Error occurred during initialization of VM Incompatible initial and maximum heap sizes specified I'm trying to apply stanford libraries in my program, any idea how to solve this error . 回答1: to change the VM for Eclipse you can change the

Determining whether a word is a noun or not

淺唱寂寞╮ 提交于 2019-12-09 04:57:12
问题 Given an input word, I want to determine whether it is a noun or not (in case of ambiguity, for instance cook can be a noun or a verb, the word must be identified as a noun). Actually I use the POS tagger from the Stanford Parser (i give it a single word as input, and i extract only the POS tag from the result). The results are quite good but it takes a very long time. Is there a way (in python, please :) to perform this task quicker than what I do actually? 回答1: If you simply want to check

Stanford classifier cross validation averaged or aggregate metrics

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-09 03:43:26
With Stanford Classifier it is possible to use cross validation by setting the options in the properties file, such as this for 10-fold cross validation: crossValidationFolds=10 printCrossValidationDecisions=true shuffleTrainingData=true shuffleSeed=1 Running this will output, per fold, the various metrics, such as precision, recall, Accuracy/micro-averaged F1 and Macro-averaged F1. Is there an option to get an averaged or otherwise aggregated score of all 10 Accuracy/micro-averaged F1 or all 10 Macro-averaged F1 as part of the output? In Weka, by default the output after 10-fold cross

How can I differentiate between a person's name and other names that are derived from verbs [closed]

岁酱吖の 提交于 2019-12-08 14:07:02
问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . How can I extract person names from the text? I have applied some NLP toolkit for this, specifically I used the Stanford NER toolkit to extract names from text. With that, I can extract person names from the text

NER is over writing the custom NERin stanford NLP

坚强是说给别人听的谎言 提交于 2019-12-08 13:02:45
问题 In the stanford nlp, I used a pattern to match the phone number in regexner. But the NER is over writing it as Number. If I remove the ner annotation then it is showing as PHONE_NUMBER. Can any one of you please help me. Thanks in Advance. Here is my regexner line: ^(?:(?:\+|0{0,2})91(\s*[\-]\s*)?|[0]?)?[789]\d{9}$ PHONENUMBER 回答1: java command: java -Xmx10g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file phone-number-example.txt -outputFormat text

checking if a sentence is grammatically correct using stanford parser [duplicate]

巧了我就是萌 提交于 2019-12-08 12:56:21
问题 This question already has answers here : How to check whether a sentence is correct (simple grammar check in Python)? (2 answers) Closed 6 years ago . Is there any method to check if a sentence is grammatically correct or not using stanford parser? As of now am able to get the parse tree of a sentence using stanford parser. I got stuck here and don't know how to proceed further. 回答1: larsmans is right that those parsers are not designed for that, but here is a hack: You can try using the

program for tagger and sentiment analysis in stanford nlp

◇◆丶佛笑我妖孽 提交于 2019-12-08 12:02:09
问题 I have a c# code (though copied) im getting error at this statement ->var pipeline = new StanfordCoreNLP(props); (An unhandled exception of type 'java.lang.RuntimeException' occurred in stanford-corenlp-3.7.0.dll Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)) my models n core nlp are of same version stanford-corenlp-3.7.0-models.jar stanford-corenlp-3.7.0.jar any help wold be greatly appreciated !! 回答1: Many