Remove a word in a span from SpaCy?

ぃ、小莉子 提交于 2019-11-29 19:28:27

问题


I am parsing a sentence with Spacy like following:

import spacy
nlp = spacy.load("en")
span = nlp("This is some text.")

I am wondering if there is a way to delete a word in the span, while still keep the remaining words format like a sentence. Such as

del span[3]

which could yield a sentence like

This is some.

If some other methods without SpaCy could achieve the same effect that will be great too.


回答1:


There is a workaround for that.

The idea is that you create a numpy array from the doc, you delete the entry you don't want and then you create a doc from the new numpy array.

import spacy
from spacy.attrs import LOWER, POS, ENT_TYPE, IS_ALPHA
from spacy.tokens import Doc
import numpy

def remove_span(doc, index):
    np_array = doc.to_array([LOWER, POS, ENT_TYPE, IS_ALPHA])
    np_array_2 = numpy.delete(np_array, (index), axis = 0)
    doc2 = Doc(doc.vocab, words=[t.text for i, t in enumerate(doc) if i!=index])
    doc2.from_array([LOWER, POS, ENT_TYPE, IS_ALPHA], np_array_2)
    return doc2

# load english model
nlp = spacy.load('en')
doc = nlp("This is some text")
new_doc = remove_span(doc, 3)
print(new_doc)

Hope it helps!



来源:https://stackoverflow.com/questions/52193581/remove-a-word-in-a-span-from-spacy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!