Pandas convert string to int

匿名 (未验证) 提交于 2019-12-03 02:24:01

问题:

I have a large dataframe with ID numbers:

ID.head() Out[64]:  0    4806105017087 1    4806105017087 2    4806105017087 3    4901295030089 4    4901295030089 

These are all strings at the moment.

I want to convert to int without using loops - for this I use ID.astype(int).

The problem is that some of my lines contain dirty data which cannot be converted to int, for e.g.

ID[154382] Out[58]: 'CN414149' 

How can I (without using loops) remove these type of occurrences so that I can use astype with peace of mind?

回答1:

You need add parameter errors='coerce' to function to_numeric:

ID = pd.to_numeric(ID, errors='coerce') 

If ID is column:

df.ID = pd.to_numeric(df.ID, errors='coerce') 

but non numeric are converted to NaN, so all values are float.

For int need convert NaN to some value e.g. 0 and then cast to int:

df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64) 

Sample:

df = pd.DataFrame({'ID':['4806105017087','4806105017087','CN414149']}) print (df)               ID 0  4806105017087 1  4806105017087 2       CN414149  print (pd.to_numeric(df.ID, errors='coerce')) 0    4.806105e+12 1    4.806105e+12 2             NaN Name: ID, dtype: float64  df.ID = pd.to_numeric(df.ID, errors='coerce').fillna(0).astype(np.int64) print (df)               ID 0  4806105017087 1  4806105017087 2              0 


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!