stanford-nlp

CWB encoding Corpus

霸气de小男生 提交于 2020-01-14 03:48:18
问题 According to the Corpus Work Bench, to encode a corpus i need to use the cwb-encode perl script "encode the corpus, i.e. convert the verticalized text to CWB binary format with the cwb-encode tool. Note that the command below has to be entered on a single line." http://cogsci.uni-osnabrueck.de/~korpora/ws/CWBdoc/CWB_Encoding_Tutorial/node3.html $ cwb-encode -d /corpora/data/example -f example.vrt -R /usr/local/share/cwb/registry/example -P pos -S s when i tried it, it says the file is missing

Does Stanford Core NLP support lemmatization for German?

為{幸葍}努か 提交于 2020-01-14 00:35:54
问题 I found German parse and pos-tag models which are compatible with Stanford Core NLP. However I was not able to get German lemmatization working. Is there a way to do so? 回答1: Since the version 3.6 is also German supported. Check it under http://stanfordnlp.github.io/CoreNLP/history.html 回答2: Sorry, as far as I know no implementation of German lemmatization exists for Stanford CoreNLP. 来源: https://stackoverflow.com/questions/29861925/does-stanford-core-nlp-support-lemmatization-for-german

Does Stanford Core NLP support lemmatization for German?

霸气de小男生 提交于 2020-01-14 00:34:12
问题 I found German parse and pos-tag models which are compatible with Stanford Core NLP. However I was not able to get German lemmatization working. Is there a way to do so? 回答1: Since the version 3.6 is also German supported. Check it under http://stanfordnlp.github.io/CoreNLP/history.html 回答2: Sorry, as far as I know no implementation of German lemmatization exists for Stanford CoreNLP. 来源: https://stackoverflow.com/questions/29861925/does-stanford-core-nlp-support-lemmatization-for-german

Neural Network Stanford parser word2vector format error during training

情到浓时终转凉″ 提交于 2020-01-13 05:32:30
问题 I am trying to train a model with Stanford neural network dependency parser for English. It does not accept a standard word2vector file with 100 dimensions. It generates an error message. I am using the embedded words as defined in this Web page: [https://drive.google.com/file/d/0B8nESzOdPhLsdWF2S1Ayb1RkTXc/view?usp=sharing][1] I have dowloaded the data as a text file in myPC. I am using the parameter -embeddingSize 100 but the parser generates an error message: Embedding File /../.../sskip

Issue using Stanford CoreNLP parsing models

ⅰ亾dé卋堺 提交于 2020-01-13 05:21:43
问题 I cannot find the Stanford parsing models for German and French: there is no "germanPCFG.ser.gz" or "frenchFactored.ser.gz" in the jar (stanford-corenlp-3.2.0-models.jar) - only english. Have searched through posttagger jar too. Same issue encountered at : How to use Stanford CoreNLP with a Non-English parse model? 回答1: You can find them in the download for the Stanford Parser. Look in the models.jar file. 回答2: With Maven you can use <dependency> <groupId>edu.stanford.nlp</groupId>

How to shutdown Stanford CoreNLP Redwood logging?

末鹿安然 提交于 2020-01-12 07:17:16
问题 How can I shut down the Stanford CoreNLP messages (see end of post)? I first tried setting log4j.category.edu.stanford=OFF in log4j.properties but that didn't help so I found out that apparently it uses a nonstandard logging framework called "Redwood". According to http://nlp.stanford.edu/nlp/javadoc/javanlp/ there is a documentation but it is password protected. I tried RedwoodConfiguration.empty().apply(); but that doesn't help either. The logging messages: Adding annotator tokenize Adding

Coreference resolution using Stanford CoreNLP

筅森魡賤 提交于 2020-01-11 11:26:07
问题 I am new to the Stanford CoreNLP toolkit and trying to use it for a project to resolve coreferences in news texts. In order to use the Stanford CoreNLP coreference system, we would usually create a pipeline, which requires tokenization, sentence splitting, part-of-speech tagging, lemmarization, named entity recoginition and parsing. For example: Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref"); StanfordCoreNLP pipeline =

Date Extraction from Text

Deadly 提交于 2020-01-11 09:21:09
问题 I am trying to use Stanford NLP tool to extract dates ( 8/11/2012 ) form text. Here's a link! for the demo of this tool Can u help me in how to train the classifier to identify date ( 8/11/2012 ). I tried using training data as Woodhouse PERS 8/18/2012 Date , O handsome O but does not work for same test data . 回答1: Using the NLP tool to extract dates from text seems like overkill if this is all you are trying to accomplish. You should consider other options like a simple Java regular

Entities on my gazette are not recognized

大兔子大兔子 提交于 2020-01-11 09:05:10
问题 I would like to create a custom NER model. That's what i did: TRAINING DATA (stanford-ner.tsv): Hello O ! O My O name O is O Damiano PERSON . O PROPERTIES (stanford-ner.prop): trainFile = stanford-ner.tsv serializeTo = ner-model.ser.gz map = word=0,answer=1 maxLeft=1 useClassFeature=true useWord=true useNGrams=true noMidNGrams=true maxNGramLeng=6 usePrev=true useNext=true useDisjunctive=true useSequences=true usePrevSequences=true useTypeSeqs=true useTypeSeqs2=true useTypeySequences=true

Stanford typed dependencies using coreNLP in python

[亡魂溺海] 提交于 2020-01-11 03:25:27
问题 In Stanford Dependency Manual they mention "Stanford typed dependencies" and particularly the type "neg" - negation modifier. It is also available when using Stanford enhanced++ parser using the website. for example, the sentence: "Barack Obama was not born in Hawaii" The parser indeed find neg(born,not) but when I'm using the stanfordnlp python library, the only dependency parser I can get will parse the sentence as follow: ('Barack', '5', 'nsubj:pass') ('Obama', '1', 'flat') ('was', '5',