Stanford NER: AbstractSequenceClassifier vs NamedEntityTagAnnotation

我的未来我决定 提交于 2019-12-24 00:49:55

问题


QUESTIONS

  1. How do I load a custom properties file using AbstractSequenceClassifier? e.g.,

    Master's Degree\tDEGREE

    MBA\tDEGREE

  2. What are the benefits/drawbacks of each approach?(AbstractSequenceClassifier vs NamedEntityTagAnnotation)

  3. Is there any accessible documentation/tutorial on the internet. I can play with demo code and read javadocs, but a good tutorial would save me and many others a lot of time.

During my perusal of the Stanford NER documentation, I have encountered two java examples.

NamedEntityTagAnnotation

The first uses NamedEntityTagAnnotation. This allows me to add my own properties file for training data (using regexner.mapping).

The key code is as follows: Initialize Pipeline:

   Properties props = new Properties();
   props.put("annotators", "tokenize, ssplit, pos, lemma, ner, regexner, depparse,  natlog,  openie");
   props.put("regexner.mapping",  "mypath/mytraineddatacodes.properties");

   pipeline = new StanfordCoreNLP(props);

Initialize document:

   Annotation document = new Annotation(pass4);
   pipeline.annotate(document);

Then access the NER tokens and any other data needed:

List<CoreMap> sentences = document.get(SentencesAnnotation.class);


for (CoreMap sentence : sentences)
{

  for (CoreLabel token : sentence.get(TokensAnnotation.class))
  {
     currNeToken = token.get(NamedEntityTagAnnotation.class);

     String word = token.get(TextAnnotation.class);
  }
}

AbstractSequenceClassifier

This is the method demonstrated in the Stanford NERDemo.java example. IT seems to provide much deeper access to the API, but I don't know how to load my customized properties file of trained data.

Initialize Classifier (which bi-passes the pipeline)

   String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";

  AbstractSequenceClassifier classifier =    CRFClassifier.getClassifierNoExceptions(serializedClassifier);

Load the file to analyze:

      byte[] encoded = Files.readAllBytes(p);
  String s = new String(encoded);
     String fileContents = s;
     List<List<CoreLabel>> out = classifier.classify(fileContents);
     for (List<CoreLabel> sentence : out)
     {
        for (CoreLabel word : sentence)
        {
           Log.getLogger().debug(word.word() + '/' + word.get(AnswerAnnotation.class) + ' ');
        }
        System.out.println();
     }

And your off to the races, except it hasn't loaded my custom properties file for trained data.

QUESTIONS

  1. How do I load a custom properties file using AbstractSequenceClassifier? e.g.,

    Master's Degree\tDEGREE

    MBA\tDEGREE

  2. What are the benefits/drawbacks of each method?

  3. Is there any accessible documentation/tutorial on the internet. I can play with demo code and read javadocs, but a good tutorial would save me and many others a lot of time.

来源:https://stackoverflow.com/questions/37803277/stanford-ner-abstractsequenceclassifier-vs-namedentitytagannotation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!