Stanford NER: AbstractSequenceClassifier vs NamedEntityTagAnnotation

问题

QUESTIONS

How do I load a custom properties file using AbstractSequenceClassifier? e.g.,

Master's Degree\tDEGREE

MBA\tDEGREE
What are the benefits/drawbacks of each approach?(AbstractSequenceClassifier vs NamedEntityTagAnnotation)
Is there any accessible documentation/tutorial on the internet. I can play with demo code and read javadocs, but a good tutorial would save me and many others a lot of time.

During my perusal of the Stanford NER documentation, I have encountered two java examples.

NamedEntityTagAnnotation

The first uses NamedEntityTagAnnotation. This allows me to add my own properties file for training data (using regexner.mapping).

The key code is as follows: Initialize Pipeline:

   Properties props = new Properties();
   props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner, depparse,  natlog,  openie");
   props.put("regexner.mapping",  "mypath/mytraineddatacodes.properties");

   pipeline = new StanfordCoreNLP(props);

Initialize document:

   Annotation document = new Annotation(pass4);
   pipeline.annotate(document);

Then access the NER tokens and any other data needed:

List<CoreMap> sentences = document.get(SentencesAnnotation.class);


for (CoreMap sentence : sentences)
{

  for (CoreLabel token : sentence.get(TokensAnnotation.class))
  {
     currNeToken = token.get(NamedEntityTagAnnotation.class);

     String word = token.get(TextAnnotation.class);
  }
}

AbstractSequenceClassifier

This is the method demonstrated in the Stanford NERDemo.java example. IT seems to provide much deeper access to the API, but I don't know how to load my customized properties file of trained data.

Initialize Classifier (which bi-passes the pipeline)

   String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";

  AbstractSequenceClassifier classifier =    CRFClassifier.getClassifierNoExceptions(serializedClassifier);

Load the file to analyze:

      byte[] encoded = Files.readAllBytes(p);
  String s = new String(encoded);
     String fileContents = s;
     List<List<CoreLabel>> out = classifier.classify(fileContents);
     for (List<CoreLabel> sentence : out)
     {
        for (CoreLabel word : sentence)
        {
           Log.getLogger().debug(word.word() + '/' + word.get(AnswerAnnotation.class) + ' ');
        }
        System.out.println();
     }

And your off to the races, except it hasn't loaded my custom properties file for trained data.

QUESTIONS

How do I load a custom properties file using AbstractSequenceClassifier? e.g.,

Master's Degree\tDEGREE

MBA\tDEGREE
What are the benefits/drawbacks of each method?
Is there any accessible documentation/tutorial on the internet. I can play with demo code and read javadocs, but a good tutorial would save me and many others a lot of time.

来源：https://stackoverflow.com/questions/37803277/stanford-ner-abstractsequenceclassifier-vs-namedentitytagannotation

标签

java

nlp

stanford-nlp