问题
My dataframe contains both NaT and NaN values
Date/Time_entry Entry Date/Time_exit Exit
0 2015-11-11 10:52:00 19.9900 2015-11-11 11:30:00 20.350
1 2015-11-11 11:36:00 20.4300 2015-11-11 11:38:00 20.565
2 2015-11-11 11:44:00 21.0000 NaT NaN
3 2009-04-20 10:28:00 13.7788 2009-04-20 10:46:00 13.700
I want to fill NaT with dates and NaN with numbers. Fillna(4) method replaces both NaT and NaN with 4. Is it possible to differentiate between NaT and NaN somehow?
My current workaround is to df[column].fillna()
回答1:
Since NaTs pertain to datetime columns, you can exclude them when applying your filling operation.
u = df.select_dtypes(exclude=['datetime'])
df[u.columns] = u.fillna(4)
df
Date/Time_entry Entry Date/Time_exit Exit
0 2015-11-11 10:52:00 19.9900 2015-11-11 11:30:00 20.350
1 2015-11-11 11:36:00 20.4300 2015-11-11 11:38:00 20.565
2 2015-11-11 11:44:00 21.0000 NaT 4.000
3 2009-04-20 10:28:00 13.7788 2009-04-20 10:46:00 13.700
Similarly, to fill NaT values only, change "exclude" to "include" in the code above.
u = df.select_dtypes(include=['datetime'])
df[u.columns] = u.fillna(pd.to_datetime('today'))
df
Date/Time_entry Entry Date/Time_exit Exit
0 2015-11-11 10:52:00 19.9900 2015-11-11 11:30:00.000000 20.350
1 2015-11-11 11:36:00 20.4300 2015-11-11 11:38:00.000000 20.565
2 2015-11-11 11:44:00 21.0000 2019-02-17 16:11:09.407466 4.000
3 2009-04-20 10:28:00 13.7788 2009-04-20 10:46:00.000000 13.700
回答2:
Try something like this, using pandas.DataFrame.select_dtypes
:
>>> import pandas as pd, datetime, numpy as np
>>> df = pd.DataFrame({'a': [datetime.datetime.now(), np.nan], 'b': [5, np.nan], 'c': [1, 2]})
>>> df
a b c
0 2019-02-17 18:06:15.231557 5.0 1
1 NaT NaN 2
>>> fill_dt = datetime.datetime.now()
>>> fill_value = 4
>>> dt_filled_df = df.select_dtypes('datetime').fillna(fill_dt)
>>> dt_filled_df
a
0 2019-02-17 18:06:15.231557
1 2019-02-17 18:06:36.040404
>>> value_filled_df = df.select_dtypes('int').fillna(fill_value)
>>> value_filled_df
c
0 1
1 2
>>> dt_filled_df.columns = [col + '_notnull' for col in dt_filled_df]
>>> value_filled_df.columns = [col + '_notnull' for col in value_filled_df]
>>> df = df.join(value_filled_df)
>>> df = df.join(dt_filled_df)
>>> df
a b c c_notnull a_notnull
0 2019-02-17 18:06:15.231557 5.0 1 1 2019-02-17 18:06:15.231557
1 NaT NaN 2 2 2019-02-17 18:06:36.040404
来源:https://stackoverflow.com/questions/54738858/how-to-fill-nat-and-nan-values-separately