How to replace a string using a dictionary containing multiple values for a key in python

匿名 (未验证) 提交于 2019-12-03 02:38:01

问题:

I have dictionary with Word and its closest related words.

I want to replace the related words in the string with original word. Currently I am able replace words in the string which has only value per key ,I am not able to replace strings for a Key has multiple values. How can this be done

Example Input

North Indian Restaurant South India  Hotel Mexican Restrant Italian  Hotpot Cafe Bar Irish Pub Maggiee Baar Jacky Craft Beer Bristo 1889 Bristo 188 Bristo 188. 

How dictionary is made

y= list(word) words = y similar = [[item[0] for item in model.wv.most_similar(word) if item[1] > 0.7] for word in words] similarity_matrix = pd.DataFrame({'Orginal_Word': words, 'Related_Words': similar}) similarity_matrix = similarity_matrix[['Orginal_Word', 'Related_Words']]  

Its 2 columns inside a dataframe with lists

Orginal_Word    Related_Words [Indian]        [India,Ind,ind.]     [Restaurant]    [Hotel,Restrant,Hotpot]    [Pub]           [Bar,Baar, Beer]      [1888]          [188, 188., 18]  

Dictionary

similarity_matrix.set_index('Orginal_Word')['Related_Words'].to_dict()  {'Indian ': 'India, Ind, ind.',  'Restaurant': 'Hotel, Restrant, Hotpot',  'Pub': 'Bar, Baar, Beer'  '1888': '188, 188., 18'} 

Expected Output

North Indian Restaurant South India  Restaurant Mexican Restaurant Italian  Restaurant Cafe Pub Irish Pub Maggiee Pub Jacky Craft Pub Bristo 1888 Bristo 1888 Bristo 1888 

Any help is appreciated

回答1:

I think you can replace by new dict with regex from this answer:

d = {'Indian': 'India, Ind, ind.',  'Restaurant': 'Hotel, Restrant, Hotpot',  'Pub': 'Bar, Baar, Beer',  '1888': '188, 188., 18'}  d1 = {r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}  df['col'] = df['col'].replace(d1, regex=True) print (df)                         col 0   North Indian Restaurant 1   South Indian Restaurant 2        Mexican Restaurant 3       Italian  Restaurant 4                  Cafe Pub 5                 Irish Pub 6               Maggiee Pub 7           Jacky Craft Pub 8               Bristo 1888 9               Bristo 1888 10              Bristo 1888 

EDIT (Function for the above code):

def replace_words(d, col):     d1={r'(?<!\S)'+ k.strip() + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}     df[col] = df[col].replace(d1, regex=True)     return df[col]  df['col'] = replace_words(d, 'col') 

EDIT1:

If get errors like:

regex error- missing ), unterminated subpattern at position 7

is necessary escape regex values in keys:

import re  def replace_words(d, col):     d1={r'(?<!\S)'+ re.escape(k.strip()) + r'(?!\S)':k1 for k1, v1 in d.items() for k in v1.split(',')}     df[col] = df[col].replace(d1, regex=True)     return df[col]  df['col'] = replace_words(d, 'col') 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!