stanford-nlp

Running Stanford POS tagger in NLTK leads to “not a valid Win32 application” on Windows

允我心安 提交于 2019-12-11 21:49:08
问题 I am trying to use stanford POS tagger in NLTK by the following code: import nltk from nltk.tag.stanford import POSTagger st = POSTagger('E:\Assistant\models\english-bidirectional-distsim.tagger', 'E:\Assistant\stanford-postagger.jar') st.tag('What is the airspeed of an unladen swallow?'.split()) and here is the output: Traceback (most recent call last): File "E:\J2EE\eclipse\WSNLP\nlp\src\tagger.py", line 5, in <module> st.tag('What is the airspeed of an unladen swallow?'.split()) File "C:

How do I reach the leaves of the tree generatod by a stanford parser in python?

淺唱寂寞╮ 提交于 2019-12-11 21:11:36
问题 I am using the stanford parser in python by doing the following: import os sentence = "Did Matt win the men slalom?" os.popen("echo '"+sentence+"' > ~/stanfordtemp.txt") parser_out = os.popen("~/stanford-parser-2012-11-12/lexparser.sh ~/stanfordtemp.txt").readlines() for tree in parser_out: print tree However, I dont know how I can access the leaves of the tree being returned by the parser.Can you help me with this? I also have to write a code which will be able to generate sql queries from

Stanford CRFClassifier performance evaluation output

做~自己de王妃 提交于 2019-12-11 18:33:43
问题 I'm following this FAQ https://nlp.stanford.edu/software/crf-faq.shtml for training my own classifier and I noticed that the performance evaluation output does not match the results (or at least not in the way I expect). Specifically this section CRFClassifier tagged 16119 words in 1 documents at 13824.19 words per second. Entity P R F1 TP FP FN MYLABEL 1.0000 0.9961 0.9980 255 0 1 Totals 1.0000 0.9961 0.9980 255 0 1 I expect TP to be all instances where the predicted label matched the golden

How to get a parse in a bracketed format (without POS tags)?

非 Y 不嫁゛ 提交于 2019-12-11 16:24:05
问题 I want to parse a sentence to a binary parse of this form (Format used in the SNLI corpus): sentence:"A person on a horse jumps over a broken down airplane." parse: ( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) ) I'm unable to find a parser which does this. note: This question has been asked earlier(How to get a binary parse in Python). But the answers are not helpful. And I was unable to comment because I do not have the required reputation

Stanford NER with python NLTK fails with strings containing multiple “!!”s?

大兔子大兔子 提交于 2019-12-11 13:02:26
问题 Suppose this is my filecontent : When they are over 45 years old!! It would definitely help Michael Jordan. Below is my code for tagging setences. st = NERTagger('stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar') tokenized_sents = [word_tokenize(sent) for sent in sent_tokenize(filecontent)] taggedsents = st.tag_sents(tokenized_sents) I would expect both tokenized_sents and taggedsents contain the same number of sentences. But here is what they

Preventing tokens from containing a space in Stanford CoreNLP

微笑、不失礼 提交于 2019-12-11 11:58:35
问题 Is there an option in Stanford CoreNLP's tokenizer to prevent tokens from containing a space? E.g. if the sentence is "my phone is 617 1555-6644", the substring "617 1555" should be Into two different tokens. I am aware of the option normalizeSpace: normalizeSpace: Whether any spaces in tokens (phone numbers, fractions get turned into U+00A0 (non-breaking space). It's dangerous to turn this off for most of our Stanford NLP software, which assumes no spaces in tokens. but I don't want tokens

Cast from GrammaticalStructure to Tree

我们两清 提交于 2019-12-11 11:54:10
问题 I am trying out the new NN Dependency Parser from Stanford. According to the demo they have provided, this is how the parsing is done: import edu.stanford.nlp.process.DocumentPreprocessor; import edu.stanford.nlp.trees.GrammaticalStructure; import edu.stanford.nlp.parser.nndep.DependencyParser; ... GrammaticalStructure gs = null; DocumentPreprocessor tokenizer = new DocumentPreprocessor(new StringReader(sentence)); for (List<HasWord> sent : tokenizer) { List<TaggedWord> tagged = tagger

Stanford-NLP: Could not find main class error

早过忘川 提交于 2019-12-11 11:37:11
问题 This question seems to have been answered a few times (What does "Could not find or load main class" mean? and https://stackoverflow.com/a/16208709/2771315) but for some reason none of the shared methods are working. What I've done so far. 1) Navigated to the directory containing the CoreNLP source files in terminal: ~/Downloads/CoreNLP-master/src 2) Selected one of the packages as a test case e.g. executed the command java -cp "*" -mx5g edu.stanford.nlp.sentiment.SentimentPipeline -file foo

NLP- Sentiment Processing for Junk Data takes time

柔情痞子 提交于 2019-12-11 11:24:15
问题 I am trying to find the Sentiment for the input text. This test is a junk sentence and when I tried to find the Sentiment the Annotation to parse the sentence is taking around 30 seconds. For normal text it takes less than a second. If i need to process around millions of data it will add up the time to process. Any solution to this. String text = "Nm n n 4 n n bkj nun4hmnun Onn njnb hm5bn nm55m nbbh n mnrrnut but n rym4n nbn 4nn65 m nun m n nn nun 4nm 5 gm n my b bb b b rtmrt55tmmm5tttn b b

How to keep punctuation in Stanford dependency parser

拜拜、爱过 提交于 2019-12-11 11:05:57
问题 I am using Stanford CoreNLP (01.2016 version) and I would like to keep the punctuation in the dependency relations. I have found some ways for doing that when you run it from command line, but I didn't find anything regarding the java code which extracts the dependency relations. Here is my current code. It works, but no punctuation is included: Annotation document = new Annotation(text); Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse