Pandas: Coloumn with mixed datatype; how to find the exceptions

好久不见. 提交于 2019-12-08 05:49:02

问题


I have a large dataframe, and when reading it, it gives me this message: DtypeWarning: Columns (0,8) have mixed types. Specify dtype upon import or set low_memory=False.

It is supposed to be a column of floats, but I suspect a few strings snuck in there. I would like to identify them, and possibly remove them.

I tried df.apply(lambda row: isinstance(row.AnnoyingColumn, (int, float)), 1)

But that gave me an out of memory error.

I assume there must be a better way.


回答1:


This will give you True if float:

df.some_column.apply(lambda x: isinstance(x, float))

This will give you True if int or string:

df.some_column.apply(lambda x: isinstance(x, (int,str)))

So, to remove strings:

mask = df.some_column.apply(lambda x: isinstance(x, str))
df = df[~mask]

Example that removes floats and strings:

$ df = pd.DataFrame({'a': [1,2.0,'hi',4]})
$ df
    a
0   1
1   2
2   hi
3   4

$ mask = df.a.apply(lambda x: isinstance(x, (float,str)))
$ mask
0    False
1    False
2     True
3    False
Name: a, dtype: bool

$ df = df[~mask]
$ df
    a
0   1
3   4


来源:https://stackoverflow.com/questions/47660384/pandas-coloumn-with-mixed-datatype-how-to-find-the-exceptions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!