I'm new to pandas and trying to figure out how to convert multiple columns which are formatted as strings to float64's. Currently I'm doing the below, but it seems like apply() or applymap() should be able to accomplish this task even more efficiently...unfortunately I'm a bit too much of a rookie to figure out how. Currently the values are percentages formatted as strings like '15.5%'
for column in ['field1', 'field2', 'field3']: data[column] = data[column].str.rstrip('%').astype('float64') / 100
Starting in 0.11.1 (coming out this week), replace has a new option to replace with a regex, so this becomes possible
In [14]: df = DataFrame('10.0%',index=range(100),columns=range(10)) In [15]: df.replace('%','',regex=True).astype('float')/100 Out[15]: <class 'pandas.core.frame.DataFrame'> Int64Index: 100 entries, 0 to 99 Data columns (total 10 columns): 0 100 non-null values 1 100 non-null values 2 100 non-null values 3 100 non-null values 4 100 non-null values 5 100 non-null values 6 100 non-null values 7 100 non-null values 8 100 non-null values 9 100 non-null values dtypes: float64(10)
And a bit faster
In [16]: %timeit df.replace('%','',regex=True).astype('float')/100 1000 loops, best of 3: 1.16 ms per loop In [18]: %timeit df.applymap(lambda x: float(x[:-1]))/100 1000 loops, best of 3: 1.67 ms per loop
df.applymap(lambda x:float(x.rstrip('%'))/100)
answering a comment in the accepted answer: for specific columns make sure you don't do it inplace.
df['Column1'] = df['Column1'].replace('%','',regex=True).astype('float')/100