what is difference between spacy.load(\'en_core_web_sm\')
and spacy.load(\'en\')
? This link explains different model sizes. But i am still not clea
First of all, install spacy using the following command for jupyter notebook
pip install -U spacy
Then write the following code:
import en_core_web_sm
nlp = en_core_web_sm.load()
Using the Spacy language model in Colab requires only the following two steps:
!python -m spacy download en_core_web_lg
Test
import spacy
nlp = spacy.load("en_core_web_lg")
successful!!!
The answer to your misunderstanding is a Unix concept, softlinks which we could say that in Windows are similar to shortcuts. Let's explain this.
When you spacy download en
, spaCy tries to find the best small model that matches your spaCy distribution. The small model that I am talking about defaults to en_core_web_sm
which can be found in different variations which correspond to the different spaCy versions (for example spacy
, spacy-nightly
have en_core_web_sm
of different sizes).
When spaCy finds the best model for you, it downloads it and then links the name en
to the package it downloaded, e.g. en_core_web_sm
. That basically means that whenever you refer to en
you will be referring to en_core_web_sm
. In other words, en
after linking is not a "real" package, is just a name for en_core_web_sm
.
However, it doesn't work the other way. You can't refer directly to en_core_web_sm
because your system doesn't know you have it installed. When you did spacy download en
you basically did a pip install. So pip knows that you have a package named en
installed for your python distribution, but knows nothing about the package en_core_web_sm
. This package is just replacing package en
when you import it, which means that package en
is just a softlink to en_core_web_sm
.
Of course, you can directly download en_core_web_sm
, using the command: python -m spacy download en_core_web_sm
, or you can even link the name en
to other models as well. For example, you could do python -m spacy download en_core_web_lg
and then python -m spacy link en_core_web_lg en
. That would make
en
a name for en_core_web_lg
, which is a large spaCy model for the English language.
Hope it is clear now :)
Open Anaconda Navigator. Click on any IDE. Run the code:
!pip install -U spacy download en_core_web_sm
!pip install -U spacy download en_core_web_sm
It will work. If you are open IDE directly close it and follow this procedure once.
Steps to load up modules based on different versions of spacy
download the best-matching version of a specific model for your spaCy installation
python -m spacy download en_core_web_sm
pip install .tar.gz archive from path or URL
pip install /Users/you/en_core_web_sm-2.2.0.tar.gz
or
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz
Add to your requirements file or environment yaml file. Theres range of version that one spacy version is comptable with you can view more under https://github.com/explosion/spacy-models/releases
if your not sure running below code
nlp = spacy.load('en_core_web_sm')
will give off a warning telling what version model will be compatible with your installed spacy verion
enironment.yml example
name: root
channels:
- defaults
- conda-forge
- anaconda
dependencies:
- python=3.8.3
- pip
- spacy=2.3.2
- scikit-learn=0.23.2
- pip:
- https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz#egg=en_core_web_sm
I tried all the above answers but could not succeed. Below worked for me :
(Specific to WINDOWS os)
pip install -U --user spacy
python -m spacy download en
import spacy
spacy.load('en')