Evaluation in a Spacy NER model

前端 未结 1 742
甜味超标
甜味超标 2020-12-24 09:09

I am trying to evaluate a trained NER Model created using spacy lib. Normally for these kind of problems you can use f1 score (a ratio between precision and recall). I coul

1条回答
  •  伪装坚强ぢ
    2020-12-24 09:24

    You can find different metrics including F-score, recall and precision in spaCy/scorer.py.

    This example shows how you can use it:

    import spacy
    from spacy.gold import GoldParse
    from spacy.scorer import Scorer
    
    def evaluate(ner_model, examples):
        scorer = Scorer()
        for input_, annot in examples:
            doc_gold_text = ner_model.make_doc(input_)
            gold = GoldParse(doc_gold_text, entities=annot)
            pred_value = ner_model(input_)
            scorer.score(pred_value, gold)
        return scorer.scores
    
    # example run
    
    examples = [
        ('Who is Shaka Khan?',
         [(7, 17, 'PERSON')]),
        ('I like London and Berlin.',
         [(7, 13, 'LOC'), (18, 24, 'LOC')])
    ]
    
    ner_model = spacy.load(ner_model_path) # for spaCy's pretrained use 'en_core_web_sm'
    results = evaluate(ner_model, examples)
    

    The scorer.scores returns multiple scores. When running the example, the result looks like this: (Note the low scores occuring because the examples classify London and Berlin as 'LOC' while the model classifies them as 'GPE'. You can figure this out by looking at the ents_per_type.)

    {'uas': 0.0, 'las': 0.0, 'las_per_type': {'attr': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'root': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'compound': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'nsubj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'dobj': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'cc': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'conj': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'ents_p': 33.33333333333333, 'ents_r': 33.33333333333333, 'ents_f': 33.33333333333333, 'ents_per_type': {'PERSON': {'p': 100.0, 'r': 100.0, 'f': 100.0}, 'LOC': {'p': 0.0, 'r': 0.0, 'f': 0.0}, 'GPE': {'p': 0.0, 'r': 0.0, 'f': 0.0}}, 'tags_acc': 0.0, 'token_acc': 100.0, 'textcat_score': 0.0, 'textcats_per_cat': {}}
    

    The example is taken from a spaCy example on github (link does not work anymore). It was last tested with spacy 2.2.4.

    0 讨论(0)
提交回复
热议问题