Merging tags into my file using named entity annotation

﹥>﹥吖頭↗ 提交于 2021-01-29 07:42:12

问题


While learning the basics of text mining i run into the following problem: I must use named entity annotation to find and locate named entities. However, when found, the tag must be included in the document. So for example: "Hello I am Koen" must result in "Hello I am < PERSON> Koen < /PERSON>.

I figured out how to find and label the named entities but I am stuck on getting them in the file in the right way. I've tried comparing if the ent.orth_ is in the file and then replace it with the tag + ent.orth_ + closing tag.

print([(X, X.ent_iob_, X.ent_type_) for X in doc])

I used this for locating where the entities are and where they start.

for ent in doc.ents:
    entities.append(ent.orth_ + ", " + ent.label_)

I used this for creating a variable with both the original form and the label.

Right now i have the variable with all original forms and labels and know where the entities start and end. However when trying to replace it somehow my knowledge runs short and can't find any similar examples.


回答1:


Try this:

import spacy

nlp = spacy.load("en_core_web_sm")
s ="Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(s)

def replaceSubstring(s, replacement, position, length_of_replaced):
    s = s[:position] + replacement + s[position+length_of_replaced:]
    return(s)

for ent in reversed(doc.ents):
    #print(ent.text, ent.start_char, ent.end_char, ent.label_)
    replacement = "<{}>{}</{}>".format(ent.label_,ent.text, ent.label_)
    position = ent.start_char
    length_of_replaced = ent.end_char - ent.start_char 
    s = replaceSubstring(s, replacement, position, length_of_replaced)

print(s)
#<ORG>Apple</ORG> is looking at buying <GPE>U.K.</GPE> startup for <MONEY>$1 billion</MONEY>




来源:https://stackoverflow.com/questions/58077806/merging-tags-into-my-file-using-named-entity-annotation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!