Stanford NER: Can I use two classifiers at once in my code?

穿精又带淫゛_ 提交于 2019-12-04 15:44:50

问题


In my code, I get the Person recognition from the first classifier, and for the second one which I made, I added some words to be recognized or annotated as Organization but it does not annotate Person.

I need to get the benefit from the two of them, how can I do that?

I'm using Netbeans, and this is the code:

String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
String serializedClassifier2 = "/Users/ha/stanford-ner-2014-10-26/classifiers/dept-model.ser.gz";

if (args.length > 0) {
  serializedClassifier = args[0];
}

AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(serializedClassifier);
AbstractSequenceClassifier<CoreLabel> classifier2 = CRFClassifier.getClassifier(serializedClassifier2);

  String fileContents = IOUtils.slurpFile("/Users/ha/NetBeansProjects/NERtry/src/nertry/input.txt");
  List<List<CoreLabel>> out = classifier.classify(fileContents);
  List<List<CoreLabel>> out2 = classifier2.classify(fileContents);

  for (List<CoreLabel> sentence : out) {
      System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: ");
    for (CoreLabel word : sentence) {
      System.out.print(word.word() + '/' + word.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

  for (List<CoreLabel> sentence2 : out2) {
      System.out.print("\ndept-model.ser.gz");
    for (CoreLabel word2 : sentence2) {
      System.out.print(word2.word() + '/' + word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
    }

    System.out.println();
  }
}

The problem comes from the result I get:

english.all.3class.distsim.crf.ser.gz: What/O date/O did/O James/PERSON started/O his/O job/O in/O Human/O and/O Finance/O ?/O 
dept-model.ser.gzWhat/O date/O did/O James/ORGANIZATION started/O his/O job/O in/O Human/ORGANIZATION and/O Finance/ORGANIZATION ?/O 

where it recognize the names as organization from the second classifier, and I need it to be annotated as PERSON. Any help?


回答1:


The class you should use to make this easy is NERClassifierCombiner. Its semantics is that it runs the classifiers in order from left to right as you specify them (any number can be given to it in the constructor), and that later classifiers cannot annotate an entity that overlaps with an entity tagging of an earlier classifier, but are otherwise free to add annotations. So, earlier classifiers are preferred in a simple preference ranking. I give a complete code example below.

(If you are training all your own classifiers, it is generally best to train all the entities together, so they can influence each other in the categories assigned. But this simple preference ordering usually works pretty well, and we use it ourselves.)

import edu.stanford.nlp.ie.NERClassifierCombiner;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreLabel;

import java.io.IOException;
import java.util.List;

public class MultipleNERs {

  public static void main(String[] args) throws IOException {
    String serializedClassifier = "classifiers/english.all.3class.distsim.crf.ser.gz";
    String serializedClassifier2 = "classifiers/english.muc.7class.distsim.crf.ser.gz";

    if (args.length > 0) {
      serializedClassifier = args[0];
    }

    NERClassifierCombiner classifier = new NERClassifierCombiner(false, false, 
            serializedClassifier, serializedClassifier2);

    String fileContents = IOUtils.slurpFile("input.txt");
    List<List<CoreLabel>> out = classifier.classify(fileContents);

    int i = 0;
    for (List<CoreLabel> lcl : out) {
      i++;
      int j = 0;
      for (CoreLabel cl : lcl) {
        j++;
        System.out.printf("%d:%d: %s%n", i, j,
                cl.toShorterString("Text", "CharacterOffsetBegin", "CharacterOffsetEnd", "NamedEntityTag"));
      }
    }
  }

}



回答2:


I'm not quite sure what the question here is. You already have the output of two classifiers. Perhaps this is more of a Java question, i.e. how you can iterate over both sentences at the same time:

Iterator<List<CoreLabel>> it1 = out1.iterator();
Iterator<List<CoreLabel>> it2 = out2.iterator();
while(it1.hasNext() && it2.hasNext()) {
   List<CoreLabel> sentence1 = it1.next();
   List<CoreLabel> sentence2 = it1.next();
   Iterator<CoreLabel> sentence1It = sentence1.iterator();
   Iterator<CoreLabel> sentence2It = sentence2.iterator();
   while(sentence1It.hasNext() && sentence2It.hasNext()) {
       CoreLabel word1 = sentence1It.next();
       CoreLabel word2 = sentence2It.next();
       System.out.print("\nenglish.all.3class.distsim.crf.ser.gz: ");
       System.out.print(word1.word() + '/' +
         word1.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
       System.out.print("\ndept-model.ser.gz");
       System.out.print(word2.word() + '/' + 
         word2.get(CoreAnnotations.AnswerAnnotation.class) + ' ');
   }
   System.out.println();
}


来源:https://stackoverflow.com/questions/27257554/stanford-ner-can-i-use-two-classifiers-at-once-in-my-code

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!