ConllReader (Like RothCONLL04Reader) throws exception while reading relation training data with custom NER and custom relation

我只是一个虾纸丫 提交于 2019-12-25 14:38:27

问题


In continuation of the following question. How to generate custom training data for Stanford relation extraction

Thanks to StanfordNLPHelp i am able to generate relation data with custom ner and on top of it regexner.

I had to run my custom model at the end because otherwise it will misclassify lots of ORGANIZATION PERSON etc. 
Example custom NER classes. 

"DEGREE", "DESG"

Example of relation training data.

0   ELECTEDBODY 0   O   NNP/IN/NNP  BOARD/OF/DIRECTORS  O   O   O
0   ORGANIZATION    1   O   NNP Board   O   O   O
0   O   2   O   NNS committees  O   O   O
0   O   3   O   JJ  key O   O   O
0   ORGANIZATION    4   O   NN/NN/NN/NN/NNP/NN  N/Nomination/committee/A/Audit/committee    O   O   O
0   O   5   O   NN  R   O   O   O
0   MISC    6   O   NN  Remuneration    O   O   O
0   O   7   O   NN  committee   O   O   O
0   O   8   O   NNP EFFECTIVE   O   O   O
0   O   9   O   NNP LEADERSHIP  O   O   O
0   O   10  O   CC  AND O   O   O
0   O   11  O   JJ  STRONG  O   O   O
0   O   12  O   NN  GOVERNANCE  O   O   O
0   O   13  O   NNP George  O   O   O
0   O   14  O   NNP Weston  O   O   O
0   DESG    15  O   NNP/NNP Chief/Executive O   O   O
0   O   16  O   -LRB-   -LRB-   O   O   O
0   O   17  O   NN  age O   O   O
0   NUMBER  18  O   CD  52  O   O   O
0   O   19  O   -RRB-   -RRB-   O   O   O
0   PERSON  20  O   NNP George  O   O   O
0   O   21  O   VBD was O   O   O
0   O   22  O   VBN appointed   O   O   O
0   O   23  O   TO  to  O   O   O
0   O   24  O   DT  the O   O   O
0   ELECTEDBODY 25  O   NN  board   O   O   O
0   DATE    26  O   IN/CD   in/1999 O   O   O
0   O   27  O   CC  and O   O   O
0   O   28  O   VBD took    O   O   O
0   O   29  O   RP  up  O   O   O
0   O   30  O   PRP$    his O   O   O
0   O   31  O   JJ  current O   O   O
0   O   32  O   NN  appointment O   O   O
0   O   33  O   IN  as  O   O   O
0   DESG    34  O   NNP/NNP Chief/Executive O   O   O
0   O   35  O   IN  in  O   O   O
0   DATE    36  O   NNP/CD  April/2005  O   O   O
0   O   37  O   .   .   O   O   O

20  34  cur_desg 
20  36  cur_desg_from

I am trying to train custom relation model and added my custom relation classes.

ex: relation class -> **cur_desg** (current designation) between entities (**PERSON, DESG**)
**Here is the relevant section of my properties file to train the relation classifier.**

datasetReaderClass = com.samrat.nlp.ie.re.CustomConllReader
entityClassifier = com.samrat.nlp.ie.re.CustomConllExtractor
relationResultsPrinters = com.samrat.nlp.ie.re.RelationResultPrinter

serializedTrainingSentencesPath = custom_relation_sentences.ser
serializedEntityExtractorPath = custom_relation_model.ser
serializedRelationExtractorPath = custom-relation-model-pipeline.ser

Relevant section of Code CustomConllReader

private String getNormalizedNERTag(String ner) {
        ......
        }  else if(ner.equalsIgnoreCase("degree")) {
            return "DEGREE";
        }
        else if(ner.equalsIgnoreCase("electedbody")) {
            return "ELECTEDBODY";
        }
...............

Problem 1 (CustomConllReader throws exception at following line while reading training data)

Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());

Relevant portion of CustomConllReader (It is almost same as RothCONLL04Reader)

case 3: // relation
                System.out.println(currentLine);
                String type = pieces.get(2);
                List<ExtractionObject> args = new ArrayList<>();
                EntityMention entity1 = indexToEntityMention.get(pieces.get(0));
                EntityMention entity2 = indexToEntityMention.get(pieces.get(1));
                args.add(entity1);
                args.add(entity2);
                Span span = new Span(entity1.getExtentTokenStart(), entity2.getExtentTokenEnd());
                // identifier = "relation" + sentenceID + "-" + sentence.getAllRelations().size();
                identifier = RelationMention.makeUniqueId();
                RelationMention relationMention = new RelationMention(identifier,
                        sentence, span, type, null, args);
                AnnotationUtils.addRelationMention(sentence, relationMention);
                break;

Exception

    INFO: Reading file: tagged-training-relation-data-conll04.corp
20  34  cur_desg 
20  36  cur_desg_from
0   2   cur_desg
Exception in thread "main" java.io.IOException
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:138)
    at com.wipro.nlp.ie.re.CustomConllReader.main(CustomConllReader.java:292)
Caused by: java.lang.NullPointerException
    at com.wipro.nlp.ie.re.CustomConllReader.readSentence(CustomConllReader.java:144)
    at com.wipro.nlp.ie.re.CustomConllReader.read(CustomConllReader.java:55)
    at edu.stanford.nlp.ie.machinereading.GenericDataSetReader.parse(GenericDataSetReader.java:136)
    ... 1 more

The exception thrown on sentence 3 while parsing the relation (0 2 cur_desg)

3   PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

This problem is solved, my training data has extra line break in between i am able to build a custom relation classifier. But now while using that custom relation classifier it does not understand any custom NER tags or custom relations.

Separate question here below. (for making custom relation classifier understand custom ner tags and relations in new sentences) Custom Relation Classifier does not understand any Custom NER tags and does not find any relations


回答1:


The exception was thrown due to extra line break in between. There has to be exactly two line breaks in the input tagged training data like below.

PERSON  0   O   NNP/NNP John/Bason  O   O   O
3   O   1   O   NNP Finance O   O   O
3   ELECTEDBODY 2   O   NNP Director    O   O   O
3   O   3   O   -LRB-   -LRB-   O   O   O
3   O   4   O   NN  age O   O   O
3   NUMBER  5   O   CD  59  O   O   O
3   O   6   O   -RRB-   -RRB-   O   O   O
3   PERSON  7   O   NNP John    O   O   O
3   O   8   O   VBD was O   O   O
3   O   9   O   VBN appointed   O   O   O
3   O   10  O   IN  as  O   O   O
3   O   11  O   NNP Finance O   O   O
3   ELECTEDBODY 12  O   NNP Director    O   O   O
3   O   13  O   IN  in  O   O   O
3   DATE    14  O   NNP/CD  May/1999    O   O   O
3   O   15  O   .   .   O   O   O

0   2   cur_desg
0   14  cur_desg_from

5   O   0   O   PRP He  O   O   O
5   O   1   O   VBD was O   O   O
5   O   2   O   RB  previously  O   O   O
5   O   3   O   DT  the O   O   O
5   O   4   O   NN  finance O   O   O
5   DESG    5   O   NN  director    O   O   O
5   O   6   O   IN  of  O   O   O
5   ORGANIZATION    7   O   NNP Bunzl   O   O   O
5   O   8   O   NN  plc O   O   O
5   O   9   O   CC  and O   O   O
5   O   10  O   VBZ is  O   O   O


来源:https://stackoverflow.com/questions/43932872/conllreader-like-rothconll04reader-throws-exception-while-reading-relation-tra

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!