stanford-nlp

how to speed up NE recognition with stanford NER with python nltk

社会主义新天地 提交于 2019-11-29 13:29:05
问题 First I tokenize the file content into sentences and then call Stanford NER on each of the sentences. But this process is really slow. I know if I call it on the whole file content if would be faster, but I'm calling it on each sentence as I want to index each sentence before and after NE recognition. st = NERTagger('stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar') for filename in filelist: sentences = sent_tokenize(filecontent) #break file

Stanford NER tagger generates 'file not found' exception with provided models

守給你的承諾、 提交于 2019-11-29 12:58:10
I downloaded stanford NER 3.4.1 , unpacked it, and tried to run named entity recognition on a local file using the default (provided) trained model. I got this: `java.io.FileNotFoundException: /u/nlp/data/pos_tags_are_useless/egw4-reut.512.clusters (No such file or directory) at edu.stanford.nlp.io.IOUtils.inputStreamFromFile(IOUtils.java:481)` What's wrong and how can I fix it? It turns out that the provided models use "distributional similarity features" that require a .clusters file at a location specified in the compressed model file (tricky to change). If you're on the stanford network,

PTB treebank from CoNLL-X

廉价感情. 提交于 2019-11-29 11:52:27
I have a CoNLL-X format treebank and the corresponding binary parse tree for each sentence and I want to convert it into a PTB format. Is there any converters or can anyone shed light on the PTB format? dmcc There's been a number of efforts to convert from dependencies (representable in CoNLL-X format) to constituents (representable in Penn Treebank , or PTB, format). Two recent papers and their code: Transforming Dependencies into Phrase Structures (Kong, Rush, and Smith, NAACL 2015). Code . Parsing as Reduction (Fernandez-Gonzalez and Martins, ACL 2015). Code . 来源: https://stackoverflow.com

Executing and testing stanford core nlp example

谁说胖子不能爱 提交于 2019-11-29 07:49:11
问题 I downloaded stanford core nlp packages and tried to test it on my machine. Using command: java -cp "*" -mx1g edu.stanford.nlp.sentiment.SentimentPipeline -file input.txt I got sentiment result in form of positive or negative . input.txt contains the sentence to be tested. On more command: java -cp stanford-corenlp-3.3.0.jar;stanford-corenlp-3.3.0-models.jar;xom.jar;joda-time.jar -Xmx600m edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,parse -file input.txt

Error using Stanford POS Tagger in NLTK Python

老子叫甜甜 提交于 2019-11-29 06:17:09
I am trying to use Stanford POS Tagger in NLTK but I am not able to run the example code given here http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford import nltk from nltk.tag.stanford import POSTagger st = POSTagger(r'english-bidirectional-distim.tagger',r'D:/stanford-postagger/stanford-postagger.jar') st.tag('What is the airspeed of an unladen swallow?'.split()) I have already added environment variables as CLASSPATH = D:/stanford-postagger/stanford-postagger.jar STANFORD_MODELS = D:/stanford-postagger/models/ Here is the error I keep getting Traceback (most recent call last):

Coreference resolution in python nltk using Stanford coreNLP

自闭症网瘾萝莉.ら 提交于 2019-11-29 04:17:52
Stanford CoreNLP provides coreference resolution as mentioned here , also this thread , this , provides some insights about its implementation in Java. However, I am using python and NLTK and I am not sure how can I use Coreference resolution functionality of CoreNLP in my python code. I have been able to set up StanfordParser in NLTK, this is my code so far. from nltk.parse.stanford import StanfordDependencyParser stanford_parser_dir = 'stanford-parser/' eng_model_path = stanford_parser_dir + "stanford-parser-models/edu/stanford/nlp/models/lexparser/englishRNN.ser.gz" my_path_to_models_jar =

Stanford NLP parse tree format

我的梦境 提交于 2019-11-29 02:41:44
This may be a silly question, but how does one iterate through a parse tree as an output of an NLP parser (like Stanford NLP)? It's all nested brackets, which is neither an array nor a dictionary or any other collection type I've used. (ROOT\n (S\n (PP (IN As)\n (NP (DT an) (NN accountant)))\n (NP (PRP I))\n (VP (VBP want)\n (S\n (VP (TO to)\n (VP (VB make)\n (NP (DT a) (NN payment)))))))) This particular output format of the Stanford Parser is call the "bracketed parse (tree)". It is supposed to be read as a graph with words as nodes (e.g. As, an, accountant) phrase/clause as labels (e.g. S,

Stanford Dependency Parser Setup and NLTK

▼魔方 西西 提交于 2019-11-29 02:31:02
So I got the "standard" Stanford Parser to work thanks to danger89's answers to this previous post, Stanford Parser and NLTK . However, I am now trying to get the dependency parser to work and it seems the method highlighted in the previous link no longer works. Here is my code: import nltk import os java_path = "C:\\Program Files\\Java\\jre1.8.0_51\\bin\\java.exe" os.environ['JAVAHOME'] = java_path from nltk.parse import stanford os.environ['STANFORD_PARSER'] = 'path/jar' os.environ['STANFORD_MODELS'] = 'path/jar' parser = stanford.StanfordDependencyParser(model_path="path/jar/englishPCFG.ser

nltk StanfordNERTagger : NoClassDefFoundError: org/slf4j/LoggerFactory (In Windows)

偶尔善良 提交于 2019-11-28 23:41:08
NOTE: I am using Python 2.7 as part of Anaconda distribution. I hope this is not a problem for nltk 3.1. I am trying to use nltk for NER as import nltk from nltk.tag.stanford import StanfordNERTagger #st = StanfordNERTagger('stanford-ner/all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar') st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') print st.tag(str) but i get Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41) at edu.stanford.nlp.ie.AbstractSequenceClassifier

How to detect that two sentences are similar?

折月煮酒 提交于 2019-11-28 21:54:07
问题 I want to compute how similar two arbitrary sentences are to each other. For example: A mathematician found a solution to the problem. The problem was solved by a young mathematician. I can use a tagger, a stemmer, and a parser, but I don’t know how detect that these sentences are similar. 回答1: These two sentences are not just similar, they are almost paraphrases, i.e., two alternative ways of expressing the same meaning. It is also a very simple case of paraphrase, in which both utterances