spacy

what to do with non-pip requirement in requirements.txt

混江龙づ霸主 提交于 2019-12-12 03:27:16
问题 So I recently moved my NLP application over to a new machine. Added the same python environment with pyenv as the old machine and installed with pip all the dependencies. Then there was a 'dependency' of sorts that is not installed by pip, maybe 'model' is a better word for it. The command that installed it is: python -m spacy.en.download Now, I'm wanting to note that somewhere in my repository so if one day I or someone else goes to install the whole thing on another PC it's there, noted in

Ignore out-of-vocabulary words when averaging vectors in Spacy

人盡茶涼 提交于 2019-12-11 19:09:59
问题 I would like to use a pre-trained word2vec model in Spacy to encode titles by (1) mapping words to their vector embeddings and (2) perform the mean of word embeddings. To do this I use the following code: import spacy nlp = spacy.load('myspacy.bioword2vec.model') sentence = "I love Stack Overflow butitsalsodistractive" avg_vector = nlp(sentence).vector Where nlp(sentence).vector (1) tokenizes my sentence with white-space splitting, (2) vectorizes each word according to the dictionary provided

Custom entity ruler with SpaCy did not return a match

扶醉桌前 提交于 2019-12-11 17:12:12
问题 This link shows how to create custom entity ruler. I basically copied and modified the code for another custom entity ruler and used it to find a match in a doc as follows: nlp = spacy.load('en_core_web_lg') ruler = EntityRuler(nlp) grades = ["Level 1", "Level 2", "Level 3", "Level 4"] for item in grades: ruler.add_patterns([{"label": "LEVEL", "pattern": item}]) nlp.add_pipe(ruler) doc = nlp('Level 2 employee first 12 months 1032.70') with doc.retokenize() as retokenizer: for ent in doc.ents:

Percentage Count Verb, Noun using Spacy?

泪湿孤枕 提交于 2019-12-11 17:08:47
问题 I want to count percentage split of POS in a sentence using spacy, similiar to Count verbs, nouns, and other parts of speech with python's NLTK Currently able to detect and count POS. How to find percentage split. from __future__ import unicode_literals import spacy,en_core_web_sm from collections import Counter nlp = en_core_web_sm.load() print Counter(([token.pos_ for token in nlp('The cat sat on the mat.')])) Current output: Counter({u'NOUN': 2, u'DET': 2, u'VERB': 1, u'ADP': 1, u'PUNCT':

Spacy LIKE_NUM cast to it's python number equivalent

梦想的初衷 提交于 2019-12-11 16:28:14
问题 Does spacy provide a quick conversion from LIKE_NUM token to a python float, decimal. Spacy can match a LIKE_NUM token like “31,2”, “10.9”, “10”, “ten”, etc. Does it provide a quick way to get a python number as well? I was expecting a method like .get_value() to return me the number (not the string), but I couldn't find any. nlp = spacy.load('en_core_web_sm') matcher = Matcher(nlp.vocab) text = "this is just a text and a number 10,2 or 10.2 meaning ten point two" doc = nlp(text) pattern = [{

How write code and run python's files using spaCy? (using windows)

天涯浪子 提交于 2019-12-11 15:27:41
问题 I want to implement a new model language for spaCY. I have installed spaCy (using the guide of the official web site) on my Windows SO but I haven't understand where and how I could write and run my future files. Help me, Thanks. 回答1: I hope I understand your question correctly: If you only want to use spaCy, you can simply create a Python file, import spacy and run it. However, if you want to add things to the spaCy source – for example to add new language data that doesn't yet exist – you

Extracting name as first name last name in python

可紊 提交于 2019-12-11 15:15:13
问题 I have a text file with lines as: Acosta, Christina, M.D. is a heart doctor Alissa Russo, M.D. is a heart doctor is there a way to convert below line: Acosta, Christina, M.D. is a heart doctor to Christina Acosta, M.D. is a heart doctor Expected Output: Christina Acosta, M.D. is a heart doctor Alissa Russo, M.D. is a heart doctor 回答1: You can use the follow regex to group the first and last names and substitute them in reverse order without the comma: import re data = '''Acosta, Christina, M

How to get similar words related to one word?

橙三吉。 提交于 2019-12-11 15:03:19
问题 I am trying to solve a nlp problem where i have a dict of words like : list_1={'phone':'android','chair':'netflit','charger':'macbook','laptop','sony'} Now if input is 'phone' i can easily use 'in' operator to get the description of phone and its data by key but problem is if input is something like 'phones' or 'Phones' . I want if i input 'phone' then i get words like 'phone' ==> 'Phones','phones','Phone','Phone's','phone's' I don't know which word2vec i can use and which nlp module can

spacy rule matcher on unit of measure before or after digit

霸气de小男生 提交于 2019-12-11 14:32:00
问题 I am new to spacy and i am trying to match some measurements in some text. My problem is that the unit of measure sometimes is before, sometimes is after the value. In some other cases has a different name. Here is some code: nlp = spacy.load('en_core_web_sm') # case 1: text = "the surface is 31 sq" # case 2: # text = "the surface is sq 31" # case 3: # text = "the surface is square meters 31" # case 4: # text = "the surface is 31 square meters" # case 5: # text = "the surface is about 31

Error while loading english module in spacy

只谈情不闲聊 提交于 2019-12-11 12:59:08
问题 I am working on Ubuntu 16.04, on jupyter notebook. I just installed the latest version of spaCy using the following because my english module wasn't downloading conda install -c conda-forge spacy=2.0.11 However while installing spaCy using the above command it said: The following packages will be REMOVED: anaconda: 5.2.0-py36_3 While loading the english module via : import spacy nlp = spacy.load('en') I get the following: KeyError Traceback (most recent call last) <ipython-input-15