spacy

Removing punctuation using spaCy; AttribueError

。_饼干妹妹 提交于 2021-02-19 03:00:24
问题 Currently I'm using the following code to lemmatize and calculate TF-IDF values for some text data using spaCy: lemma = [] for doc in nlp.pipe(df['col'].astype('unicode').values, batch_size=9844, n_threads=3): if doc.is_parsed: lemma.append([n.lemma_ for n in doc if not n.lemma_.is_punct | n.lemma_ != "-PRON-"]) else: lemma.append(None) df['lemma_col'] = lemma vect = sklearn.feature_extraction.text.TfidfVectorizer() lemmas = df['lemma_col'].apply(lambda x: ' '.join(x)) vect = sklearn.feature

Spacy lemmatizer issue/consistency

徘徊边缘 提交于 2021-02-11 18:21:09
问题 I'm currently using spaCy for NLP purpose (mainly lemmatization and tokenization). The model used is en-core-web-sm (2.1.0). The following code is run to retrieve a list of words "cleansed" from a query import spacy nlp = spacy.load("en_core_web_sm") doc = nlp(query) list_words = [] for token in doc: if token.text != ' ': list_words.append(token.lemma_) However I face a major issue, when running this code. For example, when the query is "processing of tea leaves". The result stored in list

extract name entities and its corresponding numerical values from sentence

▼魔方 西西 提交于 2021-02-11 13:59:26
问题 I want to extract information from sentences. Currently, I am able to do the following using spacy. Amy's monthly payment is $2000. --> (Amy's monthly payment, $2000) However, I am trying to do the following. The monthly payments for Amy, Bob, and Eva are $2000, $3000 and $3500 respectively. --> ((Amy's monthly payment, $2000), (Bob's monthly payment, $3000), (Eva's monthly payment, $3500)) Is there any way that I can perform the task using the NLP method through python library such as Spacy?

extract name entities and its corresponding numerical values from sentence

本秂侑毒 提交于 2021-02-11 13:57:27
问题 I want to extract information from sentences. Currently, I am able to do the following using spacy. Amy's monthly payment is $2000. --> (Amy's monthly payment, $2000) However, I am trying to do the following. The monthly payments for Amy, Bob, and Eva are $2000, $3000 and $3500 respectively. --> ((Amy's monthly payment, $2000), (Bob's monthly payment, $3000), (Eva's monthly payment, $3500)) Is there any way that I can perform the task using the NLP method through python library such as Spacy?

SPACY custom NER is not returning any entity

主宰稳场 提交于 2021-02-11 13:24:58
问题 I am trying to train a Spacy model to recognize a few custom NERs, the training data is given below, it is mostly related to recognizing a few server models, date in the FY format and Types of HDD: TRAIN_DATA = [('Send me the number of units shipped in FY21 for A566TY server', {'entities': [(39, 42, 'DateParse'),(48,53,'server')]}), ('Send me the number of units shipped in FY-21 for A5890Y server', {'entities': [(39, 43, 'DateParse'),(49,53,'server')]}), ('How many systems sold with 3.5 inch

How to get phrase count in Spacy phrasematcher

若如初见. 提交于 2021-02-10 22:37:35
问题 I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below. color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')] product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')] material_patterns = [nlp(text) for text in ('bat', 'yellow ball')] matcher = PhraseMatcher(nlp.vocab) matcher.add('COLOR', None, *color_patterns) matcher.add('PRODUCT', None, *product_patterns) matcher.add('MATERIAL', None, *material_patterns) doc =

How to get phrase count in Spacy phrasematcher

馋奶兔 提交于 2021-02-10 22:36:09
问题 I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below. color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')] product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')] material_patterns = [nlp(text) for text in ('bat', 'yellow ball')] matcher = PhraseMatcher(nlp.vocab) matcher.add('COLOR', None, *color_patterns) matcher.add('PRODUCT', None, *product_patterns) matcher.add('MATERIAL', None, *material_patterns) doc =

spaCy: Word in vocabulary

a 夏天 提交于 2021-02-10 18:31:53
问题 I try to do typo correction with spaCy, and for that I need to know if a word exists in the vocab or not. If not, the idea is to split the word in two until all segments do exist. As example, "ofthe" does not exist, "of" and "the" do. So I first need to know if a word exists in the vocab. That's where the problems start. I try: for token in nlp("apple"): print(token.lemma_, token.lemma, token.is_oov, "apple" in nlp.vocab) apple 8566208034543834098 True True for token in nlp("andshy"): print

spaCy: Word in vocabulary

拈花ヽ惹草 提交于 2021-02-10 18:28:32
问题 I try to do typo correction with spaCy, and for that I need to know if a word exists in the vocab or not. If not, the idea is to split the word in two until all segments do exist. As example, "ofthe" does not exist, "of" and "the" do. So I first need to know if a word exists in the vocab. That's where the problems start. I try: for token in nlp("apple"): print(token.lemma_, token.lemma, token.is_oov, "apple" in nlp.vocab) apple 8566208034543834098 True True for token in nlp("andshy"): print

Spacy replace token

纵然是瞬间 提交于 2021-02-10 14:53:23
问题 I am trying to replace a word without destroying the space structure in the sentence. Suppose I have the sentence text = "Hi this is my dog." . And I wish to replace dog with Simba . Following the answer from https://stackoverflow.com/a/57206316/2530674 I did: import spacy nlp = spacy.load("en_core_web_lg") from spacy.tokens import Doc doc1 = nlp("Hi this is my dog.") new_words = [token.text if token.text!="dog" else "Simba" for token in doc1] Doc(doc1.vocab, words=new_words) # Hi this is my