spacy | 易学教程

Removing punctuation using spaCy; AttribueError

阅读更多关于 Removing punctuation using spaCy; AttribueError

问题 Currently I'm using the following code to lemmatize and calculate TF-IDF values for some text data using spaCy: lemma = [] for doc in nlp.pipe(df['col'].astype('unicode').values, batch_size=9844, n_threads=3): if doc.is_parsed: lemma.append([n.lemma_ for n in doc if not n.lemma_.is_punct | n.lemma_ != "-PRON-"]) else: lemma.append(None) df['lemma_col'] = lemma vect = sklearn.feature_extraction.text.TfidfVectorizer() lemmas = df['lemma_col'].apply(lambda x: ' '.join(x)) vect = sklearn.feature

Spacy lemmatizer issue/consistency

阅读更多关于 Spacy lemmatizer issue/consistency

问题 I'm currently using spaCy for NLP purpose (mainly lemmatization and tokenization). The model used is en-core-web-sm (2.1.0). The following code is run to retrieve a list of words "cleansed" from a query import spacy nlp = spacy.load("en_core_web_sm") doc = nlp(query) list_words = [] for token in doc: if token.text != ' ': list_words.append(token.lemma_) However I face a major issue, when running this code. For example, when the query is "processing of tea leaves". The result stored in list

extract name entities and its corresponding numerical values from sentence

阅读更多关于 extract name entities and its corresponding numerical values from sentence

问题 I want to extract information from sentences. Currently, I am able to do the following using spacy. Amy's monthly payment is $2000. --> (Amy's monthly payment, $2000) However, I am trying to do the following. The monthly payments for Amy, Bob, and Eva are $2000, $3000 and $3500 respectively. --> ((Amy's monthly payment, $2000), (Bob's monthly payment, $3000), (Eva's monthly payment, $3500)) Is there any way that I can perform the task using the NLP method through python library such as Spacy?

extract name entities and its corresponding numerical values from sentence

阅读更多关于 extract name entities and its corresponding numerical values from sentence

SPACY custom NER is not returning any entity

阅读更多关于 SPACY custom NER is not returning any entity

问题 I am trying to train a Spacy model to recognize a few custom NERs, the training data is given below, it is mostly related to recognizing a few server models, date in the FY format and Types of HDD: TRAIN_DATA = [('Send me the number of units shipped in FY21 for A566TY server', {'entities': [(39, 42, 'DateParse'),(48,53,'server')]}), ('Send me the number of units shipped in FY-21 for A5890Y server', {'entities': [(39, 43, 'DateParse'),(49,53,'server')]}), ('How many systems sold with 3.5 inch

How to get phrase count in Spacy phrasematcher

阅读更多关于 How to get phrase count in Spacy phrasematcher

问题 I am trying spaCy's PhraseMatcher. I have used an adaptation of the example given in the website like below. color_patterns = [nlp(text) for text in ('red', 'green', 'yellow')] product_patterns = [nlp(text) for text in ('boots', 'coats', 'bag')] material_patterns = [nlp(text) for text in ('bat', 'yellow ball')] matcher = PhraseMatcher(nlp.vocab) matcher.add('COLOR', None, *color_patterns) matcher.add('PRODUCT', None, *product_patterns) matcher.add('MATERIAL', None, *material_patterns) doc =

How to get phrase count in Spacy phrasematcher

阅读更多关于 How to get phrase count in Spacy phrasematcher

spaCy: Word in vocabulary

阅读更多关于 spaCy: Word in vocabulary

问题 I try to do typo correction with spaCy, and for that I need to know if a word exists in the vocab or not. If not, the idea is to split the word in two until all segments do exist. As example, "ofthe" does not exist, "of" and "the" do. So I first need to know if a word exists in the vocab. That's where the problems start. I try: for token in nlp("apple"): print(token.lemma_, token.lemma, token.is_oov, "apple" in nlp.vocab) apple 8566208034543834098 True True for token in nlp("andshy"): print

spaCy: Word in vocabulary

阅读更多关于 spaCy: Word in vocabulary

Spacy replace token

阅读更多关于 Spacy replace token

问题 I am trying to replace a word without destroying the space structure in the sentence. Suppose I have the sentence text = "Hi this is my dog." . And I wish to replace dog with Simba . Following the answer from https://stackoverflow.com/a/57206316/2530674 I did: import spacy nlp = spacy.load("en_core_web_lg") from spacy.tokens import Doc doc1 = nlp("Hi this is my dog.") new_words = [token.text if token.text!="dog" else "Simba" for token in doc1] Doc(doc1.vocab, words=new_words) # Hi this is my