stanford-nlp | 易学教程

Training n-gram NER with Stanford NLP

阅读更多关于 Training n-gram NER with Stanford NLP

问题 Recently I have been trying to train n-gram entities with Stanford Core NLP. I have followed the following tutorials - http://nlp.stanford.edu/software/crf-faq.shtml#b With this, I am able to specify only unigram tokens and the class it belongs to. Can any one guide me through so that I can extend it to n-grams. I am trying to extract known entities like movie names from chat data set. Please guide me through in case I have mis-interpretted the Stanford Tutorials and the same can be used for

How to get the root node in Stanford Parse-Tree?

阅读更多关于 How to get the root node in Stanford Parse-Tree?

问题 I have this parse tree here: What I want is to get all words from a common parent given a word in the set of children of a subtree. For example if you take the word " bottles " then I want to get " the Voss bottles " or maybe even " the Voss bottles of water " but I don't know how to do that. Annotation document = new Annotation(sentenceText); this.pipeline.annotate(document); List<CoreMap> sentences = document.get(SentencesAnnotation.class); for (CoreMap sentence : sentences) { Tree tree =

Identify prepositons and individual POS

阅读更多关于 Identify prepositons and individual POS

问题 I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point. I want to identify prepositions from the paragraph. Penn Treebank Tagset says that: IN Preposition or subordinating conjunction how, can I be sure if current word is be preposition or subordinating conjunction . How can I extract only prepositions from paragraph in this case? 回答1: You can't be sure. The reason for this somewhat strange PoS is that it's really

Spark Scala - java.util.NoSuchElementException & Data Cleaning

阅读更多关于 Spark Scala - java.util.NoSuchElementException & Data Cleaning

问题 I have had a similar problem before, but I am looking for a generalizable answer. I am using spark-corenlp to get Sentiment scores on e-mails. Sometimes, sentiment() crashes on some input (maybe it's too long, maybe it had an unexpected character). It does not tell me it crashes on some instances, and just returns the Column sentiment('email) . Thus, when I try to show() beyond a certain point or save() my data frame, I get a java.util.NoSuchElementException because sentiment() must have

Questions about creating stanford CoreNLP training models

阅读更多关于 Questions about creating stanford CoreNLP training models

问题 I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command: java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this: (0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)

How to parse languages other than English with Stanford Parser？ in java, not command lines

阅读更多关于 How to parse languages other than English with Stanford Parser？ in java, not command lines

问题 I have been trying to use Stanford Parser in my Java program to parse some sentences in Chinese. Since I am quite new at both Java and Stanford Parser, I used the 'ParseDemo.java' to practice. The code works fine with sentences in English and outputs the right result. However, when I changed the model to 'chinesePCFG.ser.gz' and tried to parse some segmented Chinese sentences, things went wrong. Here's my code in Java class ParserDemo { public static void main(String[] args) {

TokensRegex rules to get correct output for Named Entities

阅读更多关于 TokensRegex rules to get correct output for Named Entities

问题 I was finally able to get my TokensRegex code to give some kind of output for named entities. But the output is not exactly what I want. I believe the rules need some tweaking. Here's the code: public static void main(String[] args) { String rulesFile = "D:\\Workspace\\resource\\NERRulesFile.rules.txt"; String dataFile = "D:\\Workspace\\data\\GoldSetSentences.txt"; Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); props.setProperty("ner

NER model to recognize Indian names

阅读更多关于 NER model to recognize Indian names

问题 I am planning to use Named Entity Recognition (NER) technique to identify person names (most of which are Indian names) from a given text. I have already explored the CRF-based NER model from Stanford NLP, however it is not quite accurate in recognizing Indian names. Hence I decided to create my own custom NER model via supervised training. I have a fair idea of how to create own NER model using the Stanford NER CRF, but creating a large training corpus with manual annotation is something I

NER model to recognize Indian names

阅读更多关于 NER model to recognize Indian names

Stanford Parser - Traversing the typed dependencies graph

阅读更多关于 Stanford Parser - Traversing the typed dependencies graph

问题 Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much 回答1: The Stanford Parser just returns a list of dependencies between word tokens. (We do this to avoid external library dependencies.) But if you want to manipulate the dependencies, you'll almost certainly want to put them in a graph data structure. We usually use jgrapht: http://jgrapht.sourceforge.net/ 来源：