spacy

Heroku Deployment Error: No matching distribution found for en-core-web-sm

不羁的心 提交于 2019-12-07 17:47:25
问题 I am trying to deploy my Django and spaCy project to Heroku. But I am getting an error: No matching distribution found for en-core-web-sm (It is an ML model downloadable via pip). How can I solve this problem? The model is installed locally in a virtual environment and working alright. I got the requirements file via pip freeze. I am using Python 3.6.4. 回答1: It doesn't look like pip install en-core-web-sm works either, so I'm wondering how you installed it locally? One possible solution is to

Does spacy take as input a list of tokens?

自作多情 提交于 2019-12-07 02:31:56
问题 I would like to use spacy's POS tagging, NER, and dependency parsing without using word tokenization. Indeed, my input is a list of tokens representing a sentence, and I would like to respect the user's tokenization. Is this possible at all, either with spacy or any other NLP package ? For now, I am using this spacy-based function to put a sentence (a unicode string) in the Conll format: import spacy nlp = spacy.load('en') def toConll(string_doc, nlp): doc = nlp(string_doc) block = [] for i,

What is difference between en_core_web_sm, en_core_web_md and en_core_web_lg model of spacy?

笑着哭i 提交于 2019-12-06 19:52:02
问题 I installed spacy on my system and I want to parse/extract person name, organization for english. But I saw here, there is 4 model for english. And there is model versioning. I didn't get which model is large and which I have to choose for development? 回答1: sm / md / lg refer to the sizes of the models (small, medium, large respectively). As it says on the models page you linked to, Model differences are mostly statistical. In general, we do expect larger models to be "better" and more

Implementing custom POS Tagger in Spacy over existing english model : NLP - Python

懵懂的女人 提交于 2019-12-06 12:30:20
I am trying to retrain the existing POS Tagger in spacy to display the proper tags for certain misclassified words using the code below. But it gives me this error : Warning: Unnamed vectors -- this won't allow multiple vectors models to be loaded. (Shape: (0, 0)) from spacy.vocab import Vocab from spacy.tokens import Doc from spacy.gold import GoldParse nlp = spacy.load('en_core_web_sm') optimizer = nlp.begin_training() vocab = Vocab(tag_map={}) doc = Doc(vocab, words=[word for word in ['ThermostatFailedOpen','ThermostatFailedClose','BlahDeBlah']]) gold = GoldParse(doc, tags=['NNP']*3) nlp

How to add custom slangs into spaCy's norm_exceptions.py module?

匆匆过客 提交于 2019-12-06 12:11:15
SpaCy's documentation has some information on adding new slangs here . However, I'd like to know: (1) When should I call the following function? lex_attr_getters[NORM] = add_lookups(Language.Defaults.lex_attr_getters[NORM], NORM_EXCEPTIONS, BASE_NORMS) The typical usage of spaCy, according to the introduction guide here , is something as follows: import spacy nlp = spacy.load('en') # Should I call the function add_lookups(...) here? doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion') (2) When in the processing pipeline are norm exceptions handled? I'm assuming a typical

ValueError with spacy.load('en_core_web_sm')

江枫思渺然 提交于 2019-12-06 11:13:41
I'm getting ValueError: could not broadcast input array from shape (96) into shape (128) for spacy.load('en_core_web_sm') I manually downloaded and installed the model as i'm working on a work computer with download restrictions. I have followed the instructions to download and copy from this link: https://github.com/explosion/spaCy/issues/3113 Copy the folder Python35\lib\site-packages\en_core_web_sm create a folder named en in Python35\Lib\site-packages\spacy\data , paste the copied contents to en, and rename the folder as en_core_web_sm-2.0.0. Copy the __init__.py file in en_core_web_sm and

Train spaCy's existing POS tagger with my own training examples

你说的曾经没有我的故事 提交于 2019-12-06 09:18:51
I am trying to train the existing POS tagger on my own lexicon, not starting off from scratch (I do not want to create an "empty model"). In spaCy's documentation, it says "Load the model you want to stat with", and the next step is "Add the tag map to the tagger using add_label method". However, when I try to load the English small model, and add the tag map, it throws this error: ValueError: [T003] Resizing pre-trained Tagger models is not currently supported. I was wondering how it can be fixed. I have also seen Implementing custom POS Tagger in Spacy over existing english model : NLP -

Highlight verb phrases using spacy and html

风格不统一 提交于 2019-12-06 07:42:20
问题 I have devised a code to red font verb phrases and output it as HTML. from __future__ import unicode_literals import spacy,en_core_web_sm import textacy import codecs nlp = en_core_web_sm.load() sentence = 'The author is writing a new book. The dog is barking.' pattern = r'<VERB>?<ADV>*<VERB>+' doc = textacy.Doc(sentence, lang='en_core_web_sm') lists = textacy.extract.pos_regex_matches(doc, pattern) with open("my.html","w") as fp: for list in lists: search_word = (list.text) fp.write(sentence

Is there a bi gram or tri gram feature in Spacy?

扶醉桌前 提交于 2019-12-06 06:11:38
问题 The below code breaks the sentence into individual tokens and the output is as below "cloud" "computing" "is" "benefiting" " major" "manufacturing" "companies" import en_core_web_sm nlp = en_core_web_sm.load() doc = nlp("Cloud computing is benefiting major manufacturing companies") for token in doc: print(token.text) What I would ideally want is, to read 'cloud computing' together as it is technically one word. Basically I am looking for a bi gram. Is there any feature in Spacy that allows Bi

SpaCy model training data: WikiNER

佐手、 提交于 2019-12-06 06:06:13
For the model xx_ent_wiki_sm of 2.0 version of SpaCy there is mention of "WikiNER" dataset, which leads to article 'Learning multilingual named entity recognition from Wikipedia'. Is there any resource for downloading of such dataset for retraining that model? Or script for Wikipedia dump processing? The data server from Joel (and my) former researcher group seems to be offline: http://downloads.schwa.org/wikiner I found a mirror of the wp3 files here, which are the ones I'm using in spaCy: https://github.com/dice-group/FOX/tree/master/input/Wikiner To retrain the spaCy model, you'll need to