stanford-nlp | 易学教程

CWB encoding Corpus

阅读更多关于 CWB encoding Corpus

问题 According to the Corpus Work Bench, to encode a corpus i need to use the cwb-encode perl script "encode the corpus, i.e. convert the verticalized text to CWB binary format with the cwb-encode tool. Note that the command below has to be entered on a single line." http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html $ cwb-encode -d /corpora/data/example -f example.vrt -R /usr/local/share/cwb/registry/example -P pos -S s when i tried it, it says the file is missing

Does Stanford Core NLP support lemmatization for German?

阅读更多关于 Does Stanford Core NLP support lemmatization for German?

问题 I found German parse and pos-tag models which are compatible with Stanford Core NLP. However I was not able to get German lemmatization working. Is there a way to do so? 回答1: Since the version 3.6 is also German supported. Check it under http://stanfordnlp.github.io/CoreNLP/history.html 回答2: Sorry, as far as I know no implementation of German lemmatization exists for Stanford CoreNLP. 来源： https://stackoverflow.com/questions/29861925/does-stanford-core-nlp-support-lemmatization-for-german

Does Stanford Core NLP support lemmatization for German?

阅读更多关于 Does Stanford Core NLP support lemmatization for German?

Neural Network Stanford parser word2vector format error during training

阅读更多关于 Neural Network Stanford parser word2vector format error during training

问题 I am trying to train a model with Stanford neural network dependency parser for English. It does not accept a standard word2vector file with 100 dimensions. It generates an error message. I am using the embedded words as defined in this Web page: [https://drive.google.com/file/d/0B8nESzOdPhLsdWF2S1Ayb1RkTXc/view?usp=sharing][1] I have dowloaded the data as a text file in myPC. I am using the parameter -embeddingSize 100 but the parser generates an error message: Embedding File /../.../sskip

Issue using Stanford CoreNLP parsing models

阅读更多关于 Issue using Stanford CoreNLP parsing models

问题 I cannot find the Stanford parsing models for German and French: there is no "germanPCFG.ser.gz" or "frenchFactored.ser.gz" in the jar (stanford-corenlp-3.2.0-models.jar) - only english. Have searched through posttagger jar too. Same issue encountered at : How to use Stanford CoreNLP with a Non-English parse model? 回答1: You can find them in the download for the Stanford Parser. Look in the models.jar file. 回答2: With Maven you can use <dependency> <groupId>edu.stanford.nlp</groupId>

How to shutdown Stanford CoreNLP Redwood logging?

阅读更多关于 How to shutdown Stanford CoreNLP Redwood logging?

问题 How can I shut down the Stanford CoreNLP messages (see end of post)? I first tried setting log4j.category.edu.stanford=OFF in log4j.properties but that didn't help so I found out that apparently it uses a nonstandard logging framework called "Redwood". According to http://nlp.stanford.edu/nlp/javadoc/javanlp/ there is a documentation but it is password protected. I tried RedwoodConfiguration.empty().apply(); but that doesn't help either. The logging messages: Adding annotator tokenize Adding

Coreference resolution using Stanford CoreNLP

阅读更多关于 Coreference resolution using Stanford CoreNLP

问题 I am new to the Stanford CoreNLP toolkit and trying to use it for a project to resolve coreferences in news texts. In order to use the Stanford CoreNLP coreference system, we would usually create a pipeline, which requires tokenization, sentence splitting, part-of-speech tagging, lemmarization, named entity recoginition and parsing. For example: Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline =

Date Extraction from Text

阅读更多关于 Date Extraction from Text

问题 I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text. Here's a link! for the demo of this tool Can u help me in how to train the classifier to identify date ( 8/11/2012 ). I tried using training data as Woodhouse PERS 8/18/2012 Date , O handsome O but does not work for same test data . 回答1: Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular

Entities on my gazette are not recognized

阅读更多关于 Entities on my gazette are not recognized

问题 I would like to create a custom NER model. That's what i did: TRAINING DATA (stanford-ner.tsv): Hello O ! O My O name O is O Damiano PERSON . O PROPERTIES (stanford-ner.prop): trainFile = stanford-ner.tsv serializeTo = ner-model.ser.gz map = word=0,answer=1 maxLeft=1 useClassFeature=true useWord=true useNGrams=true noMidNGrams=true maxNGramLeng=6 usePrev=true useNext=true useDisjunctive=true useSequences=true usePrevSequences=true useTypeSeqs=true useTypeSeqs2=true useTypeySequences=true

Stanford typed dependencies using coreNLP in python

阅读更多关于 Stanford typed dependencies using coreNLP in python

问题 In Stanford Dependency Manual they mention "Stanford typed dependencies" and particularly the type "neg" - negation modifier. It is also available when using Stanford enhanced++ parser using the website. for example, the sentence: "Barack Obama was not born in Hawaii" The parser indeed find neg(born,not) but when I'm using the stanfordnlp python library, the only dependency parser I can get will parse the sentence as follow: ('Barack', '5', 'nsubj:pass') ('Obama', '1', 'flat') ('was', '5',