Questions about creating stanford CoreNLP training models

狂风中的少年 提交于 2019-12-19 11:33:33

问题


I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command:

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath     dev.txt -train -model model.ser.gz

I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this: (0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)))) (..)))

But I don't understand what goes in the dev.txt file. I read through this question several times to try to understand what goes in dev.txt, but it's still unclear to me. Also, scoring these sentences manually has become a pain, is there a tool available that makes it easier? I'm worried that I've been using the wrong number of parentheses or some other stupid mistake like that.

Also, any suggestions on how long my train.txt file should be? I'm thinking of scoring a 1000 sentences. Is that number too small, too large?

All your help is appreciated :)


回答1:


  1. dev.txt should be the same as train.txt just with a different set of sentences. Note that the same sentence should not appear in dev.txt and train.txt. The development set is used to evaluate the quality of the model you train on the training data.

  2. We don't distribute a tool for tagging sentiment data. This class could be helpful in building data: http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/sentiment/BuildBinarizedDataset.html

  3. Here are the sizes of the train, dev, and test sets used for the sentiment model: train=8544, dev=1101, test=2210




回答2:


Here is some sample code for evaluating a model

// load a model
SentimentModel model = SentimentModel.loadSerialized(modelPath);

// load devTrees
List<Tree> devTrees;
devTrees = SentimentUtils.readTreesWithGoldLabels(devPath);

// evaluate on devTrees
Evaluate eval = new Evaluate(model);
eval.eval(devTrees);
eval.printSummary();

You can find what you need to import, etc... by looking at:

edu/stanford/nlp/sentiment/SentimentTraining.java



来源:https://stackoverflow.com/questions/33712795/questions-about-creating-stanford-corenlp-training-models

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!