spacy

spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)

萝らか妹 提交于 2019-12-01 04:09:26
what is difference between spacy.load('en_core_web_sm') and spacy.load('en') ? This link explains different model sizes. But i am still not clear how spacy.load('en_core_web_sm') and spacy.load('en') differ spacy.load('en') runs fine for me. But the spacy.load('en_core_web_sm') throws error i have installed spacy as below. when i go to jupyter notebook and run command nlp = spacy.load('en_core_web_sm') I get the below error --------------------------------------------------------------------------- OSError Traceback (most recent call last) <ipython-input-4-b472bef03043> in <module>() 1 #

spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)

孤者浪人 提交于 2019-11-30 22:54:25
问题 what is difference between spacy.load('en_core_web_sm') and spacy.load('en') ? This link explains different model sizes. But i am still not clear how spacy.load('en_core_web_sm') and spacy.load('en') differ spacy.load('en') runs fine for me. But the spacy.load('en_core_web_sm') throws error i have installed spacy as below. when i go to jupyter notebook and run command nlp = spacy.load('en_core_web_sm') I get the below error ---------------------------------------------------------------------

NLP自然语言处理中英文分词工具集锦与基本使用介绍

只谈情不闲聊 提交于 2019-11-30 22:21:21
一、中文分词工具 (1)Jieba (2)snowNLP分词工具 (3)thulac分词工具 (4)pynlpir 分词工具 (5)StanfordCoreNLP分词工具 1. from stanfordcorenlp import StanfordCoreNLP 2. with StanfordCoreNLP(r'E:\Users\Eternal Sun\PycharmProjects\1\venv\Lib\stanford-corenlp-full-2018-10-05', lang='zh') as nlp: 3. print("stanfordcorenlp分词:\n",nlp.word_tokenize(Chinese)) (6)Hanlp分词工具 分词结果如下: 二、英文分词工具 1. NLTK: 二者之间的区别在于,如果先分句再分词,那么将保留句子的独立性,即生成结果是一个二维列表,而对于直接分词来说,生成的是一个直接的一维列表,结果如下: 2. SpaCy: 3. StanfordCoreNLP: 分词结果 来源: oschina 链接: https://my.oschina.net/u/3793864/blog/3056365

Spacy language model installation in python returns ImportError from _mklinit (ImportError: DLL load failed: The specified module could not be found.)

…衆ロ難τιáo~ 提交于 2019-11-30 21:15:38
问题 I am currently trying to set up spaCy in my system. When I downloaded the module, no errors are being shown. However, upon downloading a language model (specifically, the english one), I got an error. The output is as follows: Traceback (most recent call last): File "C:\ProgramData\Anaconda3\lib\runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "C:\ProgramData\Anaconda3\lib\runpy.py", line 142, in _get_module_details return _get

Unable to load the spacy model 'en_core_web_lg' on Google colab

自古美人都是妖i 提交于 2019-11-30 15:57:53
问题 I am using spacy in google colab to build an NER model for which I have downloaded the spaCy 'en_core_web_lg' model using import spacy.cli spacy.cli.download("en_core_web_lg") and I get a message saying ✔ Download and installation successful You can now load the model via spacy.load('en_core_web_lg') However then when i try to load the model nlp = spacy.load('en_core_web_lg') the following error is printed: OSError: [E050] Can't find model 'en_core_web_lg'. It doesn't seem to be a shortcut

How to generate bi/tri-grams using spacy/nltk

狂风中的少年 提交于 2019-11-30 13:26:53
问题 The input text are always list of dish names where there are 1~3 adjectives and a noun Inputs thai iced tea spicy fried chicken sweet chili pork thai chicken curry outputs: thai tea, iced tea spicy chicken, fried chicken sweet pork, chili pork thai chicken, chicken curry, thai curry Basically, I am looking to parse the sentence tree and try to generate bi-grams by pairing an adjective with the noun. And I would like to achieve this with spacy or nltk 回答1: I used spacy 2.0 with english model.

How do I create gold data for TextCategorizer training?

老子叫甜甜 提交于 2019-11-30 04:50:49
问题 I want to train a TextCategorizer model with the following (text, label) pairs. Label COLOR : The door is brown. The barn is red. The flower is yellow. Label ANIMAL : The horse is running. The fish is jumping. The chicken is asleep. I am copying the example code in the documentation for TextCategorizer. textcat = TextCategorizer(nlp.vocab) losses = {} optimizer = nlp.begin_training() textcat.update([doc1, doc2], [gold1, gold2], losses=losses, sgd=optimizer) The doc variables will presumably

Remove a word in a span from SpaCy?

ぃ、小莉子 提交于 2019-11-29 19:28:27
问题 I am parsing a sentence with Spacy like following: import spacy nlp = spacy.load("en") span = nlp("This is some text.") I am wondering if there is a way to delete a word in the span, while still keep the remaining words format like a sentence. Such as del span[3] which could yield a sentence like This is some. If some other methods without SpaCy could achieve the same effect that will be great too. 回答1: There is a workaround for that. The idea is that you create a numpy array from the doc,

Why does spaCy not preserve intra-word-hyphens during tokenization like Stanford CoreNLP does?

扶醉桌前 提交于 2019-11-29 16:40:09
SpaCy Version: 2.0.11 Python Version: 3.6.5 OS: Ubuntu 16.04 My Sentence Samples: Marketing-Representative- won't die in car accident. or Out-of-box implementation Expected Tokens: ["Marketing-Representative", "-", "wo", "n't", "die", "in", "car", "accident", "."] ["Out-of-box", "implementation"] SpaCy Tokens(Default Tokenizer): ["Marketing", "-", "Representative-", "wo", "n't", "die", "in", "car", "accident", "."] ["Out", "-", "of", "-", "box", "implementation"] I tried creating custom tokenizer but it won't handle all edge cases as handled by spaCy using tokenizer_exceptions(Code below):

Import error with spacy: “No module named en”

冷暖自知 提交于 2019-11-29 14:17:22
问题 I'm having trouble using the Python spaCy library. It seems to be installed correctly but at from spacy.en import English I get the following import error: Traceback (most recent call last): File "spacy.py", line 1, in <module> from spacy.en import English File "/home/user/CmdData/spacy.py", line 1, in <module> from spacy.en import English ImportError: No module named en I'm not very familiar with Python but that's the standard import I saw online, and the library is installed: $ pip list |