spacy

Spacy replace token

允我心安 提交于 2021-02-10 14:53:19
问题 I am trying to replace a word without destroying the space structure in the sentence. Suppose I have the sentence text = "Hi this is my dog." . And I wish to replace dog with Simba . Following the answer from https://stackoverflow.com/a/57206316/2530674 I did: import spacy nlp = spacy.load("en_core_web_lg") from spacy.tokens import Doc doc1 = nlp("Hi this is my dog.") new_words = [token.text if token.text!="dog" else "Simba" for token in doc1] Doc(doc1.vocab, words=new_words) # Hi this is my

Pip install error exit status 1 while installing a pip package

醉酒当歌 提交于 2021-02-10 11:52:48
问题 I'm having an issue while trying to install the pyresparser python library. The issue seems to be regarding a Spacy library. How could I solve this and install successfully ? I am a rookie on python C:\Users\User>pip install pyresparser Collecting pyresparser Using cached https://files.pythonhosted.org/packages/ad/8f/5a55cfb269621d3374a6ba4aed390267f65bdf6c4fed8b1c0cbf5a118f0e/pyresparser-1.0.2-py3-none-any.whl Collecting idna>=2.8 (from pyresparser) Downloading https://files.pythonhosted.org

Pip install error exit status 1 while installing a pip package

有些话、适合烂在心里 提交于 2021-02-10 11:52:07
问题 I'm having an issue while trying to install the pyresparser python library. The issue seems to be regarding a Spacy library. How could I solve this and install successfully ? I am a rookie on python C:\Users\User>pip install pyresparser Collecting pyresparser Using cached https://files.pythonhosted.org/packages/ad/8f/5a55cfb269621d3374a6ba4aed390267f65bdf6c4fed8b1c0cbf5a118f0e/pyresparser-1.0.2-py3-none-any.whl Collecting idna>=2.8 (from pyresparser) Downloading https://files.pythonhosted.org

Pip install error exit status 1 while installing a pip package

好久不见. 提交于 2021-02-10 11:51:06
问题 I'm having an issue while trying to install the pyresparser python library. The issue seems to be regarding a Spacy library. How could I solve this and install successfully ? I am a rookie on python C:\Users\User>pip install pyresparser Collecting pyresparser Using cached https://files.pythonhosted.org/packages/ad/8f/5a55cfb269621d3374a6ba4aed390267f65bdf6c4fed8b1c0cbf5a118f0e/pyresparser-1.0.2-py3-none-any.whl Collecting idna>=2.8 (from pyresparser) Downloading https://files.pythonhosted.org

How to select only first entity extracted from spacy entities?

允我心安 提交于 2021-02-10 05:22:10
问题 I am trying to using following code to extract entities from text available in DataFrame. for i in df['Text'].to_list(): doc = nlp(i) for entity in doc.ents: if entity.label_ == 'GPE': I need to store text of first GPE with it's corresponding column of text. Like for instance if following is text at index 0 in column df['Text'] Match between USA and Canada was postponed then I need only first location(USA) in another column such as df['Place'] at the corresponding index to Text which is 0. df

Attribute Error using NeuralCoref in Colab

情到浓时终转凉″ 提交于 2021-02-10 04:56:54
问题 I'm trying to use the following spacy module in colab: https://spacy.io/universe/project/neuralcoref I install the following packages: !pip install spacy import spacy !pip show spacy !git clone https://github.com/huggingface/neuralcoref.git import neuralcoref I get the following output after installing: Name: spacy Version: 2.2.4 Summary: Industrial-strength Natural Language Processing (NLP) in Python Home-page: https://spacy.io Author: Explosion Author-email: contact@explosion.ai License:

Is there a way to retrieve the whole noun chunk using a root token in spaCy?

不想你离开。 提交于 2021-02-08 10:40:43
问题 I'm very new to using spaCy. I have been reading the documentation for hours and I'm still confused if it's possible to do what I have in my question. Anyway... As the title says, is there a way to actually get a given noun chunk using a token containing it. For example, given the sentence: "Autonomous cars shift insurance liability toward manufacturers" Would it be possible to get the "autonomous cars" noun chunk when what I only have the "cars" token? Here is an example snippet of the

Custom sentence boundary detection in SpaCy

六眼飞鱼酱① 提交于 2021-02-08 01:51:24
问题 I'm trying to write a custom sentence segmenter in spaCy that returns the whole document as a single sentence. I wrote a custom pipeline component that does it using the code from here. I can't get it to work though, because instead of changing the sentence boundaries to take the whole document as a single sentence it throws two different errors. If I create a blank language instance and only add my custom component to the pipeline I get this error: ValueError: Sentence boundary detection

Custom sentence boundary detection in SpaCy

让人想犯罪 __ 提交于 2021-02-08 01:50:57
问题 I'm trying to write a custom sentence segmenter in spaCy that returns the whole document as a single sentence. I wrote a custom pipeline component that does it using the code from here. I can't get it to work though, because instead of changing the sentence boundaries to take the whole document as a single sentence it throws two different errors. If I create a blank language instance and only add my custom component to the pipeline I get this error: ValueError: Sentence boundary detection

Tokenizing an HTML document

元气小坏坏 提交于 2021-02-07 14:23:38
问题 I have an HTML document and I'd like to tokenize it using spaCy while keeping HTML tags as a single token. Here's my code: import spacy from spacy.symbols import ORTH nlp = spacy.load('en', vectors=False, parser=False, entity=False) nlp.tokenizer.add_special_case(u'<i>', [{ORTH: u'<i>'}]) nlp.tokenizer.add_special_case(u'</i>', [{ORTH: u'</i>'}]) doc = nlp('Hello, <i>world</i> !') print([e.text for e in doc]) The output is: ['Hello', ',', '<', 'i', '>', 'world</i', '>', '!'] If I put spaces