Spacy EN Model issue

瘦欲@ 提交于 2019-12-24 07:39:01

问题


Need to know the difference between spaCy's en and en_core_web_sm model.

I am trying to do NER with Spacy.( For Organization name) Please find bellow the script I am using

import spacy
nlp = spacy.load("en_core_web_sm")
text = "But Google is starting from behind. The company made a late push \
    into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \ 
    Alexa software, which runs on its Echo and Dot devices, have clear 
    leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

And above providing me no output. But when I use “en” model

import spacy
nlp = spacy.load("en")
text = "But Google is starting from behind. The company made a late push \
    into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
    Alexa software, which runs on its Echo and Dot devices, have clear 
    leads in consumer adoption."
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

it provides me desired output: Google 4 10 ORG Apple’s Siri 92 104 ORG iPhones 119 126 ORG Amazon 132 138 ORG Echo and Dot 182 194 ORG

What is going wrong in this? Please help.

can I use en_core_web_sm model to have the same output like en model. if so please advice how to do it. Python 3 script with pandas df as input are solicited. Thanks


回答1:


So each model is a Machine Learning model trained on top of a specific corpus (a text 'dataset'). This makes it so that each model can tag entries differently - especially because some models were trained on less data than others.

Currently Spacy offers 4 models for english, as presented in: https://spacy.io/models/en/

According to https://github.com/explosion/spacy-models, a model can be downloaded in several distinct ways:

# download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm

# out-of-the-box: download best-matching default model
python -m spacy download en

Probably, when you downloaded the 'en' model, the best matching default model was not 'en_core_web_sm'.

Also, keep in mind that these models are updated every once in a while, which may have caused you to have two different versions of the same model.




回答2:


In my system result are same in both case

Code:-

import spacy
nlp = spacy.load("en_core_web_sm")
text = """But Google is starting from behind. The company made a late push 
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s  
Alexa software, which runs on its Echo and Dot devices, have clear 
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
   print(ent.text, ent.start_char, ent.end_char, ent.label_)

import spacy
nlp = spacy.load("en")
text = """But Google is starting from behind. The company made a late push \
into hardware, and Apple’s Siri, available on iPhones, and Amazon’s \
Alexa software, which runs on its Echo and Dot devices, have clear 
leads in consumer adoption."""
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)


来源:https://stackoverflow.com/questions/56446478/spacy-en-model-issue

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!