问题
Dataframe with below structure -
ID text
0 Language processing in python th is great
1 Relace the string
Dictionary named custom fix
{'Relace': 'Replace', 'th' : 'three'}
Tried the code and the output is coming as - Current output -
ID text
0 Language processing in pythirdon three is great
1 Replace threee string
Code:
def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)
df['col1'] = df.apply(lambda row: multiple_replace(custom_fix, row['text']), axis=1)
Expected Output -
ID text
0 Language processing in python three is great
1 Replace the string
回答1:
I'm not an regex expert, and maybe this is not the best solution, but using
word boundaries \b
in your regex should fix the problem, here the fixed function:
def multiple_replace(d, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(["\\b" + x + "\\b" for x in d.keys()]))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: d[mo.string[mo.start():mo.end()]], text)
回答2:
You can also split the string to get all the words and iterate through the list.
def multiple_replace(d, text):
splitText=text.split()
disc=len(set(splitText).intersection(set(d.keys())))
if disc==0:
return ' '.join(splitText)
else:
for k in range(len(splitText)):
try:
splitText[k]=d[splitText[k]]
except KeyError:
pass
return ' '.join(splitText)
Hope it helps.
来源:https://stackoverflow.com/questions/55641550/text-data-replacement-using-dictionary