spacy | 易学教程

Spacy replace token

阅读更多关于 Spacy replace token

问题 I am trying to replace a word without destroying the space structure in the sentence. Suppose I have the sentence text = "Hi this is my dog." . And I wish to replace dog with Simba . Following the answer from https://stackoverflow.com/a/57206316/2530674 I did: import spacy nlp = spacy.load("en_core_web_lg") from spacy.tokens import Doc doc1 = nlp("Hi this is my dog.") new_words = [token.text if token.text!="dog" else "Simba" for token in doc1] Doc(doc1.vocab, words=new_words) # Hi this is my

Pip install error exit status 1 while installing a pip package

阅读更多关于 Pip install error exit status 1 while installing a pip package

问题 I'm having an issue while trying to install the pyresparser python library. The issue seems to be regarding a Spacy library. How could I solve this and install successfully ? I am a rookie on python C:\Users\User>pip install pyresparser Collecting pyresparser Using cached https://files.pythonhosted.org/packages/ad/8f/5a55cfb269621d3374a6ba4aed390267f65bdf6c4fed8b1c0cbf5a118f0e/pyresparser-1.0.2-py3-none-any.whl Collecting idna>=2.8 (from pyresparser) Downloading https://files.pythonhosted.org

Pip install error exit status 1 while installing a pip package

阅读更多关于 Pip install error exit status 1 while installing a pip package

Pip install error exit status 1 while installing a pip package

阅读更多关于 Pip install error exit status 1 while installing a pip package

How to select only first entity extracted from spacy entities?

阅读更多关于 How to select only first entity extracted from spacy entities?

问题 I am trying to using following code to extract entities from text available in DataFrame. for i in df['Text'].to_list(): doc = nlp(i) for entity in doc.ents: if entity.label_ == 'GPE': I need to store text of first GPE with it's corresponding column of text. Like for instance if following is text at index 0 in column df['Text'] Match between USA and Canada was postponed then I need only first location(USA) in another column such as df['Place'] at the corresponding index to Text which is 0. df

Attribute Error using NeuralCoref in Colab

阅读更多关于 Attribute Error using NeuralCoref in Colab

问题 I'm trying to use the following spacy module in colab: https://spacy.io/universe/project/neuralcoref I install the following packages: !pip install spacy import spacy !pip show spacy !git clone https://github.com/huggingface/neuralcoref.git import neuralcoref I get the following output after installing: Name: spacy Version: 2.2.4 Summary: Industrial-strength Natural Language Processing (NLP) in Python Home-page: https://spacy.io Author: Explosion Author-email: contact@explosion.ai License:

Is there a way to retrieve the whole noun chunk using a root token in spaCy?

阅读更多关于 Is there a way to retrieve the whole noun chunk using a root token in spaCy?

问题 I'm very new to using spaCy. I have been reading the documentation for hours and I'm still confused if it's possible to do what I have in my question. Anyway... As the title says, is there a way to actually get a given noun chunk using a token containing it. For example, given the sentence: "Autonomous cars shift insurance liability toward manufacturers" Would it be possible to get the "autonomous cars" noun chunk when what I only have the "cars" token? Here is an example snippet of the

Custom sentence boundary detection in SpaCy

阅读更多关于 Custom sentence boundary detection in SpaCy

问题 I'm trying to write a custom sentence segmenter in spaCy that returns the whole document as a single sentence. I wrote a custom pipeline component that does it using the code from here. I can't get it to work though, because instead of changing the sentence boundaries to take the whole document as a single sentence it throws two different errors. If I create a blank language instance and only add my custom component to the pipeline I get this error: ValueError: Sentence boundary detection

Custom sentence boundary detection in SpaCy

阅读更多关于 Custom sentence boundary detection in SpaCy

Tokenizing an HTML document

阅读更多关于 Tokenizing an HTML document

问题 I have an HTML document and I'd like to tokenize it using spaCy while keeping HTML tags as a single token. Here's my code: import spacy from spacy.symbols import ORTH nlp = spacy.load('en', vectors=False, parser=False, entity=False) nlp.tokenizer.add_special_case(u'', [{ORTH: u''}]) nlp.tokenizer.add_special_case(u'', [{ORTH: u''}]) doc = nlp('Hello, world !') print([e.text for e in doc]) The output is: ['Hello', ',', '<', 'i', '>', 'world</i', '>', '!'] If I put spaces