stanford-nlp

Parse out phrasal verbs

二次信任 提交于 2019-11-30 15:10:13
Has anyone ever tried parsing out phrasal verbs with Stanford NLP? The problem is with separable phrasal verbs, e.g.: climb up, do over: We climbed that hill up. I have to do this job over. The first phrase looks like this in the parse tree: (VP (VBD climbed) (ADVP (IN that) (NP (NN hill) ) ) (ADVP (RB up) ) ) the second phrase: (VB do) (NP (DT this) (NN job) ) (PP (IN over) ) So it seems like reading the parse tree would be the right way, but how to know that verb is going to be phrasal? Dependency parsing, dude. Look at the prt (phrasal verb particle) dependency in both sentences. See the

Stanford Parser multithread usage

久未见 提交于 2019-11-30 15:07:36
问题 Stanford Parser is now 'thread-safe' as of version 2.0 (02.03.2012). I am currently running the command line tools and cannot figure out how to make use of my multiple cores by threading the program. In the past, this question has been answered with "Stanford Parser is not thread-safe", as the FAQ still says. I am hoping to find someone who has had success threading the latest version. I have tried using -t flag (-t10 and -tLLP) since that was all I could find in my searches, but both throw

Can't make Stanford POS tagger working in nltk

半城伤御伤魂 提交于 2019-11-30 14:53:21
I'm trying to work with Stanford POS tagger within NLTK. I'm using the example shown here: http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford I'm able to load everything smoothly: >>> import os >>> from nltk.tag import StanfordPOSTagger >>> os.environ['STANFORD_MODELS'] = '/path/to/stanford/folder/models') >>> st = StanfordPOSTagger('english-bidirectional-distsim.tagger',path_to_jar='/path/to/stanford/folder/stanford-postagger.jar') but at the first execution: >>> st.tag('What is the airspeed of an unladen swallow ?'.split()) it gives me the following error: Loading default

how to get a dependency tree with Stanford NLP parser

时光毁灭记忆、已成空白 提交于 2019-11-30 14:48:52
问题 How can I get the dependency tree as Figure below. I can get the dependency relation as pure text, and also the dependency graph with the help of dependencysee tool. But how about the dependency tree which has words as nodes and dependency as edges. Thanks very much! 回答1: These graphs are produced using GraphViz, an open source graph drawing package, originally from AT&T Research. You can find a method toDotFormat() in edu.stanford.nlp.trees.semgraph.SemanticGraph that will convert a

Stanford CoreNLP OpenIE annotator

笑着哭i 提交于 2019-11-30 14:26:02
问题 I have a question regarding Stanford CoreNLP OpenIE annotator. I am using Stanford CoreNLP version stanford-corenlp-full-2015-12-09 in order to extract relations using OpenIE. I don't know much Java that's why I am using the pycorenlp wrapper for Python 3.4. I want to extract relation between all words of a sentence, below is the code I used. I am also interested in showing the confidence of each triplet: import nltk from pycorenlp import * import collections nlp=StanfordCoreNLP("http:/

Stanford-NER customization to classify software programming keywords

浪尽此生 提交于 2019-11-30 14:25:19
问题 I am new in NLP and I used Stanford NER tool to classify some random text to extract special keywords used in software programming. The problem is, I don't no how to do changes to the classifiers and text annotators in Stanford NER to recognize software programming keywords. For example: today Java used in different operating systems (Windows, Linux, ..) the classification results should such as: Java "Programming_Language" Windows "Operating_System" Linux "Operating_system" Would you please

how to train a french NER based on stanford-nlp Conditional Random Fields model?

孤者浪人 提交于 2019-11-30 14:20:20
I discovered the tools of stanford-NLP and found it really interesting. I'm a french dataminer / datascientist, fond of text analysis and would love to use your tools, but the NER being unavailable in french is quite puzzling to me. I would love to make my own french NER, perhaps even provide it as a contribution to the package if it is considered worthy, so... could you brief me on the requirements to train a CRF for french NER based on the stanford coreNLP ? Thank you. NB: I am not a developper of the Stanford tools, nor a NLP expert. Just a lambda user that also needed such informations at

Maven dependency:get does not download Stanford NLP model files

回眸只為那壹抹淺笑 提交于 2019-11-30 11:33:11
The core component of the Stanford Natural Language Processing Toolkit has Java code in a stanford-corenlp-1.3.4.jar file, and has (very large) model files in a separate stanford-corenlp-1.3.4-models.jar file. Maven does not download the model files automatically, but only if you add <classifier>models</classifier> line to the .pom. Here is a .pom snippet that fetches both the code and the models. <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>1.3.4</version> <classifier>models</classifier> </dependency> I'm trying to figure out how to do

Load Custom NER Model Stanford CoreNLP

浪子不回头ぞ 提交于 2019-11-30 10:22:35
I have created my own NER model with Stanford's "Stanford-NER" software and by following these directions. I am aware that CoreNLP loads three NER models out of the box in the following order: edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz I now want to include my NER model in the list above and have the text tagged by my NER model first. I have found two previous StackOverflow questions regarding this topic and they are 'Stanford OpenIE using

nltk StanfordNERTagger : How to get proper nouns without capitalization

为君一笑 提交于 2019-11-30 10:04:23
I am trying to use the StanfordNERTagger and nltk to extract keywords from a piece of text. docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics." words = re.split("\W+",docText) stops = set(stopwords.words("english")) #remove stop words from the list words = [w for w in words if w not in stops and len(w) > 2] str = " ".join(words) print str stn = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') stp = StanfordPOSTagger('english-bidirectional-distsim.tagger') stanfordPosTagList=[word for word,pos in stp.tag(str