stanford-nlp | 易学教程

Running Stanford POS tagger in NLTK leads to “not a valid Win32 application” on Windows

阅读更多关于 Running Stanford POS tagger in NLTK leads to “not a valid Win32 application” on Windows

问题 I am trying to use stanford POS tagger in NLTK by the following code: import nltk from nltk.tag.stanford import POSTagger st = POSTagger('E:\Assistant\models\english-bidirectional-distsim.tagger', 'E:\Assistant\stanford-postagger.jar') st.tag('What is the airspeed of an unladen swallow?'.split()) and here is the output: Traceback (most recent call last): File "E:\J2EE\eclipse\WSNLP\nlp\src\tagger.py", line 5, in <module> st.tag('What is the airspeed of an unladen swallow?'.split()) File "C:

How do I reach the leaves of the tree generatod by a stanford parser in python?

阅读更多关于 How do I reach the leaves of the tree generatod by a stanford parser in python?

问题 I am using the stanford parser in python by doing the following: import os sentence = "Did Matt win the men slalom?" os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt") parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh ~/stanfordtemp.txt").readlines() for tree in parser_out: print tree However, I dont know how I can access the leaves of the tree being returned by the parser.Can you help me with this? I also have to write a code which will be able to generate sql queries from

Stanford CRFClassifier performance evaluation output

阅读更多关于 Stanford CRFClassifier performance evaluation output

问题 I'm following this FAQ https://nlp.stanford.edu/software/crf-faq.shtml for training my own classifier and I noticed that the performance evaluation output does not match the results (or at least not in the way I expect). Specifically this section CRFClassifier tagged 16119 words in 1 documents at 13824.19 words per second. Entity P R F1 TP FP FN MYLABEL 1.0000 0.9961 0.9980 255 0 1 Totals 1.0000 0.9961 0.9980 255 0 1 I expect TP to be all instances where the predicted label matched the golden

How to get a parse in a bracketed format (without POS tags)?

阅读更多关于 How to get a parse in a bracketed format (without POS tags)?

问题 I want to parse a sentence to a binary parse of this form (Format used in the SNLI corpus): sentence:"A person on a horse jumps over a broken down airplane." parse: ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) I'm unable to find a parser which does this. note: This question has been asked earlier(How to get a binary parse in Python). But the answers are not helpful. And I was unable to comment because I do not have the required reputation

Stanford NER with python NLTK fails with strings containing multiple “!!”s?

阅读更多关于 Stanford NER with python NLTK fails with strings containing multiple “!!”s?

问题 Suppose this is my filecontent : When they are over 45 years old!! It would definitely help Michael Jordan. Below is my code for tagging setences. st = NERTagger('stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar') tokenized_sents = [word_tokenize(sent) for sent in sent_tokenize(filecontent)] taggedsents = st.tag_sents(tokenized_sents) I would expect both tokenized_sents and taggedsents contain the same number of sentences. But here is what they

Preventing tokens from containing a space in Stanford CoreNLP

阅读更多关于 Preventing tokens from containing a space in Stanford CoreNLP

问题 Is there an option in Stanford CoreNLP's tokenizer to prevent tokens from containing a space? E.g. if the sentence is "my phone is 617 1555-6644", the substring "617 1555" should be Into two different tokens. I am aware of the option normalizeSpace: normalizeSpace: Whether any spaces in tokens (phone numbers, fractions get turned into U+00A0 (non-breaking space). It's dangerous to turn this off for most of our Stanford NLP software, which assumes no spaces in tokens. but I don't want tokens

Cast from GrammaticalStructure to Tree

阅读更多关于 Cast from GrammaticalStructure to Tree

问题 I am trying out the new NN Dependency Parser from Stanford. According to the demo they have provided, this is how the parsing is done: import edu.stanford.nlp.process.DocumentPreprocessor; import edu.stanford.nlp.trees.GrammaticalStructure; import edu.stanford.nlp.parser.nndep.DependencyParser; ... GrammaticalStructure gs = null; DocumentPreprocessor tokenizer = new DocumentPreprocessor(new StringReader(sentence)); for (List<HasWord> sent : tokenizer) { List<TaggedWord> tagged = tagger

Stanford-NLP: Could not find main class error

阅读更多关于 Stanford-NLP: Could not find main class error

问题 This question seems to have been answered a few times (What does "Could not find or load main class" mean? and https://stackoverflow.com/a/16208709/2771315) but for some reason none of the shared methods are working. What I've done so far. 1) Navigated to the directory containing the CoreNLP source files in terminal: ~/Downloads/CoreNLP-master/src 2) Selected one of the packages as a test case e.g. executed the command java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file foo

NLP- Sentiment Processing for Junk Data takes time

阅读更多关于 NLP- Sentiment Processing for Junk Data takes time

问题 I am trying to find the Sentiment for the input text. This test is a junk sentence and when I tried to find the Sentiment the Annotation to parse the sentence is taking around 30 seconds. For normal text it takes less than a second. If i need to process around millions of data it will add up the time to process. Any solution to this. String text = "Nm n n 4 n n bkj nun4hmnun Onn njnb hm5bn nm55m nbbh n mnrrnut but n rym4n nbn 4nn65 m nun m n nn nun 4nm 5 gm n my b bb b b rtmrt55tmmm5tttn b b

How to keep punctuation in Stanford dependency parser

阅读更多关于 How to keep punctuation in Stanford dependency parser

问题 I am using Stanford CoreNLP (01.2016 version) and I would like to keep the punctuation in the dependency relations. I have found some ways for doing that when you run it from command line, but I didn't find anything regarding the java code which extracts the dependency relations. Here is my current code. It works, but no punctuation is included: Annotation document = new Annotation(text); Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse