stanford-nlp

What do the abbreviations in POS tagging etc mean?

时光毁灭记忆、已成空白 提交于 2020-01-11 02:15:13
问题 Say I have the following Penn Tree: (S (NP-SBJ the steel strike) (VP lasted (ADVP-TMP (ADVP much longer) (SBAR than (S (NP-SBJ he) (VP anticipated (SBAR *?*)))))) .) What do abbrevations like VP and SBAR etc mean? Where can I find these definitions? What are these abbreviations called? 回答1: Those are the Penn Treebank tags, for example, VP means "Verb Phrase". The full list can be found here 回答2: The full list of Penn Treebank POS tags (so-called tagset) including examples can be found on

StanfordNLP - ArrayIndexOutOfBoundsException at TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:696)

怎甘沉沦 提交于 2020-01-07 06:17:20
问题 I want to identify following as SKILL using stanfordNLP's TokensRegexNERAnnotator. AREAS OF EXPERTISE Areas of Knowledge Computer Skills Technical Experience Technical Skills There are many more sequence of text like above. Code - Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); pipeline.addAnnotator(new TokensRegexNERAnnotator("./mapping/test_degree.rule", true)); String[] tests = {

multiple files input to stanford NER preserving naming for each output

送分小仙女□ 提交于 2020-01-06 16:01:48
问题 I have many files, (the NYTimes corpus for '05, '06, & '07) , I want to run them all through the Stanford NER, "easy" you might think, "just follow the commands in the README doc", but if you thought that just now, you would be mistaken, because my situation is a bit more complicated. I don't want them all outputted into some big jumbled mess, I want to preserve the naming structure of each file, so for example, one file is named 1822873.xml and I processed it earlier using the following

multiple files input to stanford NER preserving naming for each output

对着背影说爱祢 提交于 2020-01-06 16:01:12
问题 I have many files, (the NYTimes corpus for '05, '06, & '07) , I want to run them all through the Stanford NER, "easy" you might think, "just follow the commands in the README doc", but if you thought that just now, you would be mistaken, because my situation is a bit more complicated. I don't want them all outputted into some big jumbled mess, I want to preserve the naming structure of each file, so for example, one file is named 1822873.xml and I processed it earlier using the following

Using stanford parser to parse Chinese

眉间皱痕 提交于 2020-01-06 08:14:21
问题 here is my code, mostly from the demo. The program runs perfectly, but the result is very wrong. It did not spilt the words. Thank you public static void main(String[] args) { LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/xinhuaFactored.ser.gz"); demoAPI(lp); } public static void demoAPI(LexicalizedParser lp) { // This option shows loading and using an explicit tokenizer String sent2 = "我爱你"; TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer

Stanford CoreNLP very slow

感情迁移 提交于 2020-01-06 07:23:34
问题 I am doing a NLP project in Windows and the problem is whenever I run Stanford CoreNLP from my command prompt, it takes about 14-15 seconds to generate the XML output of the given input text file. I think that this issue is because the library takes quite a lot of time to load. Can please somebody explain what the problem is and how can I resolve this issue as this time problem is a big issue for my project? 回答1: Stanford CoreNLP uses large model files of parameters for various components.

Execution time of Stanford CoreNLP on other languages

会有一股神秘感。 提交于 2020-01-06 05:49:08
问题 I need to extract sentences, tokens, pos tags and lemma from English and German text of a big corpora. So, I used the Stanford CoreNLP tool. Its output is perfect. However, the problem is the time complexity. The English one executes quickly but the German model takes a long time to annotate the text. I initialize the models with these codes: // To initialize English model propsEN = new Properties(); propsEN.setProperty("annotators", "tokenize, ssplit, pos, lemma"); propsEN.setProperty(

Stanford CoreNLP pipeline coref: parsing some short strings (with few mentions) returns indexoutofbounds exception

倾然丶 夕夏残阳落幕 提交于 2020-01-06 01:54:39
问题 BACKGROUND: I'm importing the Stanford CoreNLP library into my clojure project. I was using version 3.5.1 but recently jumped directly into version 3.6.0, bypassing 3.5.2. As part of this update, because I was getting coreference information using the dcoref annotator, I needed to make small modifications so that my program used the coref annotator instead. In the past (v3.5.1), when I created a pipeline with the following annotators "tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref

Force Stanford CoreNLP Parser to Prioritize 'S' Label at Root Level

旧时模样 提交于 2020-01-06 01:32:23
问题 Greetings NLP Experts, I am using the Stanford CoreNLP software package to produce constituency parses, using the most recent version (3.9.2) of the English language models JAR, downloaded from the CoreNLP Download page. I access the parser via the Python interface from the NLTK module nltk.parse.corenlp. Here is a snippet from the top of my main module: import nltk from nltk.tree import ParentedTree from nltk.parse.corenlp import CoreNLPParser parser = CoreNLPParser(url='http://localhost

How do we get run Stanford Classifier on an array of Strings?

北慕城南 提交于 2020-01-05 13:12:50
问题 I have got an array of strings String strarr[] = { "What a wonderful day", "beautiful beds", "food was awesome" }; I also have a trained dataset Room What a beautiful room Room Wonderful sea-view Room beds are comfortable Room bed-spreads are good Food The dinner was marvellous Food Tasty foods Service people are rude Service waitors were not on time Service service was horrible Pogrammatically I am unable to get the scores and labels of the strings I want to classify. If however, I am using