Questions about creating stanford CoreNLP training models

问题

I've been working with Stanford's coreNLP to perform sentiment analysis on some data I have and I'm working on creating a training model. I know we can create a training model with the following command:

java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath     dev.txt -train -model model.ser.gz

I know what goes in the train.txt file. You score sentences and put them in train.txt, something like this: (0 (2 Today) (0 (0 (2 is) (0 (2 a) (0 (0 bad) (2 day)))) (..)))

But I don't understand what goes in the dev.txt file. I read through this question several times to try to understand what goes in dev.txt, but it's still unclear to me. Also, scoring these sentences manually has become a pain, is there a tool available that makes it easier? I'm worried that I've been using the wrong number of parentheses or some other stupid mistake like that.

Also, any suggestions on how long my train.txt file should be? I'm thinking of scoring a 1000 sentences. Is that number too small, too large?

All your help is appreciated :)

回答1:

dev.txt should be the same as train.txt just with a different set of sentences. Note that the same sentence should not appear in dev.txt and train.txt. The development set is used to evaluate the quality of the model you train on the training data.
We don't distribute a tool for tagging sentiment data. This class could be helpful in building data: http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/sentiment/BuildBinarizedDataset.html
Here are the sizes of the train, dev, and test sets used for the sentiment model: train=8544, dev=1101, test=2210

回答2:

Here is some sample code for evaluating a model

// load a model
SentimentModel model = SentimentModel.loadSerialized(modelPath);

// load devTrees
List<Tree> devTrees;
devTrees = SentimentUtils.readTreesWithGoldLabels(devPath);

// evaluate on devTrees
Evaluate eval = new Evaluate(model);
eval.eval(devTrees);
eval.printSummary();

You can find what you need to import, etc... by looking at:

edu/stanford/nlp/sentiment/SentimentTraining.java

来源：https://stackoverflow.com/questions/33712795/questions-about-creating-stanford-corenlp-training-models

标签

stanford-nlp

sentiment-analysis

training-data

scoring