spacy | 易学教程

How to identify Abbreviations/Acronyms and expand them in spaCy?

阅读更多关于 How to identify Abbreviations/Acronyms and expand them in spaCy?

问题 I have a large (~50k) term list and a number of these key phrases / terms have corresponding acronyms / abbreviations. I need a fast way of finding either the abbreviation or the expanded abbreviation ( i.e. MS -> Microsoft ) and then replacing that with the full expanded abbreviation + abbreviation ( i.e. Microsoft -> Microsoft (MS) or MS -> Microsoft (MS) ). I am very new to spaCy, so my naive approach was going to be to use spacy_lookup and use both the abbreviation and the expanded

Spacy lemmatization of a single word

阅读更多关于 Spacy lemmatization of a single word

问题 I am trying to get the lemmatized version of a single word. Is there a way using "spacy" (fantastic python NLP library) to do this. Below is the code I have tried but this does not work): from spacy.lemmatizer import Lemmatizer from spacy.lookups import Lookups lookups = Lookups() lemmatizer = Lemmatizer(lookups) word = "ducks" lemmas = lemmatizer.lookup(word) print(lemmas) The result I was hoping for was that the word "ducks" (plural) would result in "duck" (singular). Unfortunately, "ducks"

Python / Pandas / spacy - iterate over a DataFrame and count the number of pos_ tags

阅读更多关于 Python / Pandas / spacy - iterate over a DataFrame and count the number of pos_ tags

问题 i have a Pandas Dataframe with some Texts from an Author and want to do some statistical stuff with the sum of the different word types. Dataframe - my data : >>> data name style text year year_dt number 0001 Demetrius D Demetrius an der russischen Grenze Er ist vo... 1805 1805-01-01 0002 Der versöhnte Menschenfeind D Der versöhnte Menschenfeind -Fragment Gegend... 1790 1790-01-01 0003 Die Braut von Messina D Die Braut von Messina oder die feindlichen B... 1803 1803-01-01 Some months ago i

Python / Pandas / spacy - iterate over a DataFrame and count the number of pos_ tags

阅读更多关于 Python / Pandas / spacy - iterate over a DataFrame and count the number of pos_ tags

removing stop words using spacy

阅读更多关于 removing stop words using spacy

问题 I am cleaning a column in my data frame , Sumcription, and am trying to do 3 things: Tokenize Lemmantize Remove stop words import spacy nlp = spacy.load('en_core_web_sm', parser=False, entity=False) df['Tokens'] = df.Sumcription.apply(lambda x: nlp(x)) spacy_stopwords = spacy.lang.en.stop_words.STOP_WORDS spacy_stopwords.add('attach') df['Lema_Token'] = df.Tokens.apply(lambda x: " ".join([token.lemma_ for token in x if token not in spacy_stopwords])) However, when I print for example: df.Lema

How to do text pre-processing using spaCy?

阅读更多关于 How to do text pre-processing using spaCy?

问题 How to do preprocessing steps like Stopword removal , punctuation removal , stemming and lemmatization in spaCy using python. I have text data in csv file like paragraphs and sentences. I want to do text cleaning. Kindly give example by loading csv in pandas dataframe 回答1: This may helps who is looking for answer for this quesion. import spacy #load spacy nlp = spacy.load("en", disable=['parser', 'tagger', 'ner']) stops = stopwords.words("english") def normalize(comment, lowercase, remove

How to do text pre-processing using spaCy?

阅读更多关于 How to do text pre-processing using spaCy?

How to do text pre-processing using spaCy?

阅读更多关于 How to do text pre-processing using spaCy?

In Spacy NLP, how extract the agent, action, and patient — as well as cause/effect relations?

阅读更多关于 In Spacy NLP, how extract the agent, action, and patient — as well as cause/effect relations?

问题 I would like to use Space to extract word relation information in the form of "agent, action, and patient." For example, "Autonomous cars shift insurance liability toward manufacturers" -> ("autonomous cars", "shift", "liability") or ("autonomous cars", "shift", "liability towards manufacturers"). In other words, "who did what to whom" and "what applied the action to something else." I don't know much about my input data, so I can't make many assumptions. I also want to extract logical

Patterns with ENT_TYPE from manually labelled Span not working

阅读更多关于 Patterns with ENT_TYPE from manually labelled Span not working

问题 As an alternative to accomplishing this: Patterns with multi-terms entries in the IN attribute I wrote the following code to match phrases, label them, and then use them in EntityRuler patterns: # %% import spacy from spacy.matcher import PhraseMatcher from spacy.pipeline import EntityRuler from spacy.tokens import Span class PhraseRuler(object): name = 'phrase_ruler' def __init__(self, nlp, terms, label): patterns = [nlp(term) for term in terms] self.matcher = PhraseMatcher(nlp.vocab) self