spacy | 易学教程

what to do with non-pip requirement in requirements.txt

阅读更多关于 what to do with non-pip requirement in requirements.txt

问题 So I recently moved my NLP application over to a new machine. Added the same python environment with pyenv as the old machine and installed with pip all the dependencies. Then there was a 'dependency' of sorts that is not installed by pip, maybe 'model' is a better word for it. The command that installed it is: python -m spacy.en.download Now, I'm wanting to note that somewhere in my repository so if one day I or someone else goes to install the whole thing on another PC it's there, noted in

Ignore out-of-vocabulary words when averaging vectors in Spacy

阅读更多关于 Ignore out-of-vocabulary words when averaging vectors in Spacy

问题 I would like to use a pre-trained word2vec model in Spacy to encode titles by (1) mapping words to their vector embeddings and (2) perform the mean of word embeddings. To do this I use the following code: import spacy nlp = spacy.load('myspacy.bioword2vec.model') sentence = "I love Stack Overflow butitsalsodistractive" avg_vector = nlp(sentence).vector Where nlp(sentence).vector (1) tokenizes my sentence with white-space splitting, (2) vectorizes each word according to the dictionary provided

Custom entity ruler with SpaCy did not return a match

阅读更多关于 Custom entity ruler with SpaCy did not return a match

问题 This link shows how to create custom entity ruler. I basically copied and modified the code for another custom entity ruler and used it to find a match in a doc as follows: nlp = spacy.load('en_core_web_lg') ruler = EntityRuler(nlp) grades = ["Level 1", "Level 2", "Level 3", "Level 4"] for item in grades: ruler.add_patterns([{"label": "LEVEL", "pattern": item}]) nlp.add_pipe(ruler) doc = nlp('Level 2 employee first 12 months 1032.70') with doc.retokenize() as retokenizer: for ent in doc.ents:

Percentage Count Verb, Noun using Spacy?

阅读更多关于 Percentage Count Verb, Noun using Spacy?

问题 I want to count percentage split of POS in a sentence using spacy, similiar to Count verbs, nouns, and other parts of speech with python's NLTK Currently able to detect and count POS. How to find percentage split. from __future__ import unicode_literals import spacy,en_core_web_sm from collections import Counter nlp = en_core_web_sm.load() print Counter(([token.pos_ for token in nlp('The cat sat on the mat.')])) Current output: Counter({u'NOUN': 2, u'DET': 2, u'VERB': 1, u'ADP': 1, u'PUNCT':

Spacy LIKE_NUM cast to it's python number equivalent

阅读更多关于 Spacy LIKE_NUM cast to it's python number equivalent

问题 Does spacy provide a quick conversion from LIKE_NUM token to a python float, decimal. Spacy can match a LIKE_NUM token like “31,2”, “10.9”, “10”, “ten”, etc. Does it provide a quick way to get a python number as well? I was expecting a method like .get_value() to return me the number (not the string), but I couldn't find any. nlp = spacy.load('en_core_web_sm') matcher = Matcher(nlp.vocab) text = "this is just a text and a number 10,2 or 10.2 meaning ten point two" doc = nlp(text) pattern = [{

How write code and run python's files using spaCy? (using windows)

阅读更多关于 How write code and run python's files using spaCy? (using windows)

问题 I want to implement a new model language for spaCY. I have installed spaCy (using the guide of the official web site) on my Windows SO but I haven't understand where and how I could write and run my future files. Help me, Thanks. 回答1: I hope I understand your question correctly: If you only want to use spaCy, you can simply create a Python file, import spacy and run it. However, if you want to add things to the spaCy source – for example to add new language data that doesn't yet exist – you

Extracting name as first name last name in python

阅读更多关于 Extracting name as first name last name in python

问题 I have a text file with lines as: Acosta, Christina, M.D. is a heart doctor Alissa Russo, M.D. is a heart doctor is there a way to convert below line: Acosta, Christina, M.D. is a heart doctor to Christina Acosta, M.D. is a heart doctor Expected Output: Christina Acosta, M.D. is a heart doctor Alissa Russo, M.D. is a heart doctor 回答1: You can use the follow regex to group the first and last names and substitute them in reverse order without the comma: import re data = '''Acosta, Christina, M

How to get similar words related to one word?

阅读更多关于 How to get similar words related to one word?

问题 I am trying to solve a nlp problem where i have a dict of words like : list_1={'phone':'android','chair':'netflit','charger':'macbook','laptop','sony'} Now if input is 'phone' i can easily use 'in' operator to get the description of phone and its data by key but problem is if input is something like 'phones' or 'Phones' . I want if i input 'phone' then i get words like 'phone' ==> 'Phones','phones','Phone','Phone's','phone's' I don't know which word2vec i can use and which nlp module can

spacy rule matcher on unit of measure before or after digit

阅读更多关于 spacy rule matcher on unit of measure before or after digit

问题 I am new to spacy and i am trying to match some measurements in some text. My problem is that the unit of measure sometimes is before, sometimes is after the value. In some other cases has a different name. Here is some code: nlp = spacy.load('en_core_web_sm') # case 1: text = "the surface is 31 sq" # case 2: # text = "the surface is sq 31" # case 3: # text = "the surface is square meters 31" # case 4: # text = "the surface is 31 square meters" # case 5: # text = "the surface is about 31

Error while loading english module in spacy

阅读更多关于 Error while loading english module in spacy

问题 I am working on Ubuntu 16.04, on jupyter notebook. I just installed the latest version of spaCy using the following because my english module wasn't downloading conda install -c conda-forge spacy=2.0.11 However while installing spaCy using the above command it said: The following packages will be REMOVED: anaconda: 5.2.0-py36_3 While loading the english module via : import spacy nlp = spacy.load('en') I get the following: KeyError Traceback (most recent call last) <ipython-input-15