stanford-nlp | 易学教程

What do the abbreviations in POS tagging etc mean?

阅读更多关于 What do the abbreviations in POS tagging etc mean?

问题 Say I have the following Penn Tree: (S (NP-SBJ the steel strike) (VP lasted (ADVP-TMP (ADVP much longer) (SBAR than (S (NP-SBJ he) (VP anticipated (SBAR *?*)))))) .) What do abbrevations like VP and SBAR etc mean? Where can I find these definitions? What are these abbreviations called? 回答1: Those are the Penn Treebank tags, for example, VP means "Verb Phrase". The full list can be found here 回答2: The full list of Penn Treebank POS tags (so-called tagset) including examples can be found on

StanfordNLP - ArrayIndexOutOfBoundsException at TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:696)

阅读更多关于 StanfordNLP - ArrayIndexOutOfBoundsException at TokensRegexNERAnnotator.readEntries(TokensRegexNERAnnotator.java:696)

问题 I want to identify following as SKILL using stanfordNLP's TokensRegexNERAnnotator. AREAS OF EXPERTISE Areas of Knowledge Computer Skills Technical Experience Technical Skills There are many more sequence of text like above. Code - Properties props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); pipeline.addAnnotator(new TokensRegexNERAnnotator("./mapping/test_degree.rule", true)); String[] tests = {

multiple files input to stanford NER preserving naming for each output

阅读更多关于 multiple files input to stanford NER preserving naming for each output

问题 I have many files, (the NYTimes corpus for '05, '06, & '07) , I want to run them all through the Stanford NER, "easy" you might think, "just follow the commands in the README doc", but if you thought that just now, you would be mistaken, because my situation is a bit more complicated. I don't want them all outputted into some big jumbled mess, I want to preserve the naming structure of each file, so for example, one file is named 1822873.xml and I processed it earlier using the following

multiple files input to stanford NER preserving naming for each output

阅读更多关于 multiple files input to stanford NER preserving naming for each output

Using stanford parser to parse Chinese

阅读更多关于 Using stanford parser to parse Chinese

问题 here is my code, mostly from the demo. The program runs perfectly, but the result is very wrong. It did not spilt the words. Thank you public static void main(String[] args) { LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/xinhuaFactored.ser.gz"); demoAPI(lp); } public static void demoAPI(LexicalizedParser lp) { // This option shows loading and using an explicit tokenizer String sent2 = "我爱你"; TokenizerFactory<CoreLabel> tokenizerFactory = PTBTokenizer

Stanford CoreNLP very slow

阅读更多关于 Stanford CoreNLP very slow

问题 I am doing a NLP project in Windows and the problem is whenever I run Stanford CoreNLP from my command prompt, it takes about 14-15 seconds to generate the XML output of the given input text file. I think that this issue is because the library takes quite a lot of time to load. Can please somebody explain what the problem is and how can I resolve this issue as this time problem is a big issue for my project? 回答1: Stanford CoreNLP uses large model files of parameters for various components.

Execution time of Stanford CoreNLP on other languages

阅读更多关于 Execution time of Stanford CoreNLP on other languages

问题 I need to extract sentences, tokens, pos tags and lemma from English and German text of a big corpora. So, I used the Stanford CoreNLP tool. Its output is perfect. However, the problem is the time complexity. The English one executes quickly but the German model takes a long time to annotate the text. I initialize the models with these codes: // To initialize English model propsEN = new Properties(); propsEN.setProperty("annotators", "tokenize, ssplit, pos, lemma"); propsEN.setProperty(

Stanford CoreNLP pipeline coref: parsing some short strings (with few mentions) returns indexoutofbounds exception

阅读更多关于 Stanford CoreNLP pipeline coref: parsing some short strings (with few mentions) returns indexoutofbounds exception

问题 BACKGROUND: I'm importing the Stanford CoreNLP library into my clojure project. I was using version 3.5.1 but recently jumped directly into version 3.6.0, bypassing 3.5.2. As part of this update, because I was getting coreference information using the dcoref annotator, I needed to make small modifications so that my program used the coref annotator instead. In the past (v3.5.1), when I created a pipeline with the following annotators "tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref

Force Stanford CoreNLP Parser to Prioritize 'S' Label at Root Level

阅读更多关于 Force Stanford CoreNLP Parser to Prioritize 'S' Label at Root Level

问题 Greetings NLP Experts, I am using the Stanford CoreNLP software package to produce constituency parses, using the most recent version (3.9.2) of the English language models JAR, downloaded from the CoreNLP Download page. I access the parser via the Python interface from the NLTK module nltk.parse.corenlp. Here is a snippet from the top of my main module: import nltk from nltk.tree import ParentedTree from nltk.parse.corenlp import CoreNLPParser parser = CoreNLPParser(url='http://localhost

How do we get run Stanford Classifier on an array of Strings?

阅读更多关于 How do we get run Stanford Classifier on an array of Strings?

问题 I have got an array of strings String strarr[] = { "What a wonderful day", "beautiful beds", "food was awesome" }; I also have a trained dataset Room What a beautiful room Room Wonderful sea-view Room beds are comfortable Room bed-spreads are good Food The dinner was marvellous Food Tasty foods Service people are rude Service waitors were not on time Service service was horrible Pogrammatically I am unable to get the scores and labels of the strings I want to classify. If however, I am using