Is there anyway to use the mapping function or something better to replace values in an entire dataframe?
I only know how to perform the mapping on series.
I
You can build dictionary
from column
values itself and fill like below
x=df['Item_Type'].value_counts()
item_type_mapping={}
item_list=x.index
for i in range(0,len(item_list)):
item_type_mapping[item_list[i]]=i
df['Item_Type']=df['Item_Type'].map(lambda x:item_type_mapping[x])
I know this is old, but adding for those searching as I was. Create a dataframe in pandas, df in this code
ip_addresses = df.source_ip.unique()
ip_dict = dict(zip(ip_addresses, range(len(ip_addresses))))
That will give you a dictionary map of the ip addresses without having to write it out.
To convert Strings like 'volvo','bmw' into integers first convert it to a dataframe then pass it to pandas.get_dummies()
df = DataFrame.from_csv("myFile.csv")
df_transform = pd.get_dummies( df )
print( df_transform )
Better alternative: passing a dictionary to map() of a pandas series (df.myCol) (by specifying the column brand for example)
df.brand = df.brand.map( {'volvo':0 , 'bmw':1, 'audi':2} )
You can also do this with pandas rename_categories
. You would first need to define the column as dtype="category"
e.g.
In [66]: s = pd.Series(["a","b","c","a"], dtype="category")
In [67]: s
Out[67]:
0 a
1 b
2 c
3 a
dtype: category
Categories (3, object): [a, b, c]
and then rename them:
In [70]: s.cat.rename_categories([1,2,3])
Out[70]:
0 1
1 2
2 3
3 1
dtype: category
Categories (3, int64): [1, 2, 3]
You can also pass a dict-like object to map the renaming, e.g.:
In [72]: s.cat.rename_categories({1: 'x', 2: 'y', 3: 'z'})
You can use the applymap
DataFrame function to do this:
In [26]: df = DataFrame({"A": [1,2,3,4,5], "B": ['a','b','c','d','e'],
"C": ['b','a','c','c','d'], "D": ['a','c',7,9,2]})
In [27]: df
Out[27]:
A B C D
0 1 a b a
1 2 b a c
2 3 c c 7
3 4 d c 9
4 5 e d 2
In [28]: mymap = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
In [29]: df.applymap(lambda s: mymap.get(s) if s in mymap else s)
Out[29]:
A B C D
0 1 1 2 1
1 2 2 1 3
2 3 3 3 7
3 4 4 3 9
4 5 5 4 2
df.replace(to_replace=['set', 'test'], value=[1, 2])
from @Ishnark comment on accepted answer.