stanford-nlp

Training n-gram NER with Stanford NLP

﹥>﹥吖頭↗ 提交于 2019-12-20 08:01:24
问题 Recently I have been trying to train n-gram entities with Stanford Core NLP. I have followed the following tutorials - http://nlp.stanford.edu/software/crf-faq.shtml#b With this, I am able to specify only unigram tokens and the class it belongs to. Can any one guide me through so that I can extend it to n-grams. I am trying to extract known entities like movie names from chat data set. Please guide me through in case I have mis-interpretted the Stanford Tutorials and the same can be used for

How to get the root node in Stanford Parse-Tree?

自闭症网瘾萝莉.ら 提交于 2019-12-20 05:35:09
问题 I have this parse tree here: What I want is to get all words from a common parent given a word in the set of children of a subtree. For example if you take the word " bottles " then I want to get " the Voss bottles " or maybe even " the Voss bottles of water " but I don't know how to do that. Annotation document = new Annotation(sentenceText); this.pipeline.annotate(document); List<CoreMap> sentences = document.get(SentencesAnnotation.class); for (CoreMap sentence : sentences) { Tree tree =

Identify prepositons and individual POS

匆匆过客 提交于 2019-12-19 19:52:10
问题 I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point. I want to identify prepositions from the paragraph. Penn Treebank Tagset says that: IN Preposition or subordinating conjunction how, can I be sure if current word is be preposition or subordinating conjunction . How can I extract only prepositions from paragraph in this case? 回答1: You can't be sure. The reason for this somewhat strange PoS is that it's really

Spark Scala - java.util.NoSuchElementException & Data Cleaning

白昼怎懂夜的黑 提交于 2019-12-19 18:36:57
问题 I have had a similar problem before, but I am looking for a generalizable answer. I am using spark-corenlp to get Sentiment scores on e-mails. Sometimes, sentiment() crashes on some input (maybe it's too long, maybe it had an unexpected character). It does not tell me it crashes on some instances, and just returns the Column sentiment('email) . Thus, when I try to show() beyond a certain point or save() my data frame, I get a java.util.NoSuchElementException because sentiment() must have

Questions about creating stanford CoreNLP training models

狂风中的少年 提交于 2019-12-19 11:33:33
问题 I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command: java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this: (0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)

How to parse languages other than English with Stanford Parser? in java, not command lines

时间秒杀一切 提交于 2019-12-19 11:19:38
问题 I have been trying to use Stanford Parser in my Java program to parse some sentences in Chinese. Since I am quite new at both Java and Stanford Parser, I used the 'ParseDemo.java' to practice. The code works fine with sentences in English and outputs the right result. However, when I changed the model to 'chinesePCFG.ser.gz' and tried to parse some segmented Chinese sentences, things went wrong. Here's my code in Java class ParserDemo { public static void main(String[] args) {

TokensRegex rules to get correct output for Named Entities

只愿长相守 提交于 2019-12-19 10:43:14
问题 I was finally able to get my TokensRegex code to give some kind of output for named entities. But the output is not exactly what I want. I believe the rules need some tweaking. Here's the code: public static void main(String[] args) { String rulesFile = "D:\\Workspace\\resource\\NERRulesFile.rules.txt"; String dataFile = "D:\\Workspace\\data\\GoldSetSentences.txt"; Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); props.setProperty("ner

NER model to recognize Indian names

主宰稳场 提交于 2019-12-19 04:25:50
问题 I am planning to use Named Entity Recognition (NER) technique to identify person names (most of which are Indian names) from a given text. I have already explored the CRF-based NER model from Stanford NLP, however it is not quite accurate in recognizing Indian names. Hence I decided to create my own custom NER model via supervised training. I have a fair idea of how to create own NER model using the Stanford NER CRF, but creating a large training corpus with manual annotation is something I

NER model to recognize Indian names

白昼怎懂夜的黑 提交于 2019-12-19 04:25:06
问题 I am planning to use Named Entity Recognition (NER) technique to identify person names (most of which are Indian names) from a given text. I have already explored the CRF-based NER model from Stanford NLP, however it is not quite accurate in recognizing Indian names. Hence I decided to create my own custom NER model via supervised training. I have a fair idea of how to create own NER model using the Stanford NER CRF, but creating a large training corpus with manual annotation is something I

Stanford Parser - Traversing the typed dependencies graph

别等时光非礼了梦想. 提交于 2019-12-19 04:01:35
问题 Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much 回答1: The Stanford Parser just returns a list of dependencies between word tokens. (We do this to avoid external library dependencies.) But if you want to manipulate the dependencies, you'll almost certainly want to put them in a graph data structure. We usually use jgrapht: http://jgrapht.sourceforge.net/ 来源: