stanford-nlp | 易学教程

Chunking some text with the stanford-nlp

阅读更多关于 Chunking some text with the stanford-nlp

问题 I'm using the stanford core NLP and I use this line to load some modules to process my text: props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); Is ther a module that i can load to chunks the text? Or any suggestion with a alterantive way to use the stanford core to chunk some text? Thank you 回答1: I think the parser output can be used to obtain NP chunks. Take a look at the context-free representation on the Stanford Parser website which provides example output. 回答2:

Why does Stanford CoreNLP NER-annotator load 3 models by default?

阅读更多关于 Why does Stanford CoreNLP NER-annotator load 3 models by default?

问题 When I add the "ner" annotator to my StanfordCoreNLP object pipeline, I can see that it loads 3 models, which takes a lot of time: Adding annotator ner Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [10.3 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [10.1 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [6.5 sec].

CWB encoding Corpus

阅读更多关于 CWB encoding Corpus

According to the Corpus Work Bench, to encode a corpus i need to use the cwb-encode perl script "encode the corpus, i.e. convert the verticalized text to CWB binary format with the cwb-encode tool. Note that the command below has to be entered on a single line." http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html $ cwb-encode -d /corpora/data/example -f example.vrt -R /usr/local/share/cwb/registry/example -P pos -S s when i tried it, it says the file is missing but i'm sure the file is in $HOME/corpora/data/example, the error was $ cwb-encode -d /corpora/data

Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

阅读更多关于 Exception in thread “main” java.lang.OutOfMemoryError: Java heap space

问题 I'm using Eclipse to run java program class, while I run it i got this error Exception in thread "main" java.lang.OutOfMemoryError: Java heap space then i changed the VM from the Properties > Run > VM Options, and I run the program again i got a new error, Error occurred during initialization of VM Incompatible initial and maximum heap sizes specified I'm trying to apply stanford libraries in my program, any idea how to solve this error . 回答1: to change the VM for Eclipse you can change the

Determining whether a word is a noun or not

阅读更多关于 Determining whether a word is a noun or not

问题 Given an input word, I want to determine whether it is a noun or not (in case of ambiguity, for instance cook can be a noun or a verb, the word must be identified as a noun). Actually I use the POS tagger from the Stanford Parser (i give it a single word as input, and i extract only the POS tag from the result). The results are quite good but it takes a very long time. Is there a way (in python, please :) to perform this task quicker than what I do actually? 回答1: If you simply want to check

Stanford classifier cross validation averaged or aggregate metrics

阅读更多关于 Stanford classifier cross validation averaged or aggregate metrics

With Stanford Classifier it is possible to use cross validation by setting the options in the properties file, such as this for 10-fold cross validation: crossValidationFolds=10 printCrossValidationDecisions=true shuffleTrainingData=true shuffleSeed=1 Running this will output, per fold, the various metrics, such as precision, recall, Accuracy/micro-averaged F1 and Macro-averaged F1. Is there an option to get an averaged or otherwise aggregated score of all 10 Accuracy/micro-averaged F1 or all 10 Macro-averaged F1 as part of the output? In Weka, by default the output after 10-fold cross

How can I differentiate between a person's name and other names that are derived from verbs [closed]

阅读更多关于 How can I differentiate between a person's name and other names that are derived from verbs [closed]

问题 It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center. Closed 7 years ago . How can I extract person names from the text? I have applied some NLP toolkit for this, specifically I used the Stanford NER toolkit to extract names from text. With that, I can extract person names from the text

NER is over writing the custom NERin stanford NLP

阅读更多关于 NER is over writing the custom NERin stanford NLP

问题 In the stanford nlp, I used a pattern to match the phone number in regexner. But the NER is over writing it as Number. If I remove the ner annotation then it is showing as PHONE_NUMBER. Can any one of you please help me. Thanks in Advance. Here is my regexner line: ^(?:(?:\+|0{0,2})91(\s*[\-]\s*)?|[0]?)?[789]\d{9}$ PHONENUMBER 回答1: java command: java -Xmx10g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -file phone-number-example.txt -outputFormat text

checking if a sentence is grammatically correct using stanford parser [duplicate]

阅读更多关于 checking if a sentence is grammatically correct using stanford parser [duplicate]

问题 This question already has answers here : How to check whether a sentence is correct (simple grammar check in Python)? (2 answers) Closed 6 years ago . Is there any method to check if a sentence is grammatically correct or not using stanford parser? As of now am able to get the parse tree of a sentence using stanford parser. I got stuck here and don't know how to proceed further. 回答1: larsmans is right that those parsers are not designed for that, but here is a hack: You can try using the

program for tagger and sentiment analysis in stanford nlp

阅读更多关于 program for tagger and sentiment analysis in stanford nlp

问题 I have a c# code (though copied) im getting error at this statement ->var pipeline = new StanfordCoreNLP(props); (An unhandled exception of type 'java.lang.RuntimeException' occurred in stanford-corenlp-3.7.0.dll Additional information: edu.stanford.nlp.io.RuntimeIOException: Error while loading a tagger model (probably missing model file)) my models n core nlp are of same version stanford-corenlp-3.7.0-models.jar stanford-corenlp-3.7.0.jar any help wold be greatly appreciated !! 回答1: Many