Case-sensitive entity recognition

前端 未结 2 2287
梦毁少年i
梦毁少年i 2021-02-15 17:37

I have keywords that are all stored in lower case, e.g. \"discount nike shoes\", that I am trying to perform entity extraction on. The issue I\'ve run into is that spaCy seems t

2条回答
  •  天命终不由人
    2021-02-15 18:37

    In general, non-standardized casing is problematic for pre-trained models.

    You have a few workarounds:

    • Truecasing: correcting the capitalization in a text so you can use a standard NER model.
    • Caseless models: training NER models that ignore capitalization altogether.
    • Mixed case models: Training NER models on a mix of cased and uncased text.

    I would recommend Truecasing, as there are some decent open-source truecasers out there with good accuracy, and they allow you to then use pre-trained NER solutions such as spaCy.

    Caseless and mixed-case models are more time-consuming to set up and won't necessarily give better results.

提交回复
热议问题