Replace entity with its label in SpaCy

二次信任 提交于 2021-01-21 05:14:24

问题


Is there anyway by SpaCy to replace entity detected by SpaCy NER with its label? For example: I am eating an apple while playing with my Apple Macbook.

I have trained NER model with SpaCy to detect "FRUITS" entity and the model successfully detects the first "apple" as "FRUITS", but not the second "Apple".

I want to do post-processing of my data by replacing each entity with its label, so I want to replace the first "apple" with "FRUITS". The sentence will be "I am eating an FRUITS while playing with my Apple Macbook."

If I simply use regex, it will replace the second "Apple" with "FRUITS" as well, which is incorrect. Is there any smart way to do this?

Thanks!


回答1:


the entity label is an attribute of the token (see here)

import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')

s = "His friend Nicolas is here."
doc = nlp(s)

print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His', 'friend', 'PERSON', 'is', 'here', '.']

print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here .

Edit:

In order to handle cases were entities can span several words the following code can be used instead:

s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
newString = s
for e in reversed(doc.ents): #reversed to not modify the offsets of other entities when substituting
    start = e.start_char
    end = start + len(e.text)
    newString = newString[:start] + e.label_ + newString[end:]
print(newString)
#His friend PERSON is here with PERSON and PERSON.


来源:https://stackoverflow.com/questions/58712418/replace-entity-with-its-label-in-spacy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!