Numpy Where Changing Timestamps/Datetime to Integers

谁都会走 提交于 2021-02-08 15:22:22

问题


Not so much a question but something puzzling me.

I have a column of dates that looks something like this:

0              NaT
1       1996-04-01
2       2000-03-01
3              NaT
4              NaT
5              NaT
6              NaT
7              NaT
8              NaT

I'd like to convert it the NaTs to a static value. (Assume I imported pandas as pd and numpy as np).

If I do:

mydata['mynewdate'] = mydata.mydate.replace(
    np.NaN, pd.datetime(1994,6,30,0,0))

All is well, I get:

0       1994-06-30
1       1996-04-01
2       2000-03-01
3       1994-06-30
4       1994-06-30
5       1994-06-30
6       1994-06-30
7       1994-06-30
8       1994-06-30

But if I do:

mydata['mynewdate'] = np.where(
    mydata['mydate'].isnull(), pd.datetime(1994,6,30,0,0),mydata['mydate'])

I get:

0        1994-06-30 00:00:00
1         828316800000000000
2         951868800000000000
3        1994-06-30 00:00:00
4        1994-06-30 00:00:00
5        1994-06-30 00:00:00
6        1994-06-30 00:00:00
7        1994-06-30 00:00:00
8        1994-06-30 00:00:00

This operation converts the original, non-null dates to integers. I thought there might be a mix-up of data types, so I did this:

mydata['mynewdate'] = np.where(
    mydata['mydate'].isnull(), pd.datetime(1994,6,30,0,0),pd.to_datetime(mydata['mydate']))

And still get:

0        1994-06-30 00:00:00
1         828316800000000000
2         951868800000000000
3        1994-06-30 00:00:00
4        1994-06-30 00:00:00
5        1994-06-30 00:00:00
6        1994-06-30 00:00:00
7        1994-06-30 00:00:00
8        1994-06-30 00:00:00

Please note (and don't ask): Yes, I have a better solution for replacing nulls. This question is not about replacing nulls (as the title indicates that it is not) but how numpy where is handling dates. I ask because I will have more complex conditions to select dates to replace in the future, and thought numpy where would do the job.

Any ideas?


回答1:


It's due to wonky interactions between Numpy's datetime64, Pandas' Timestamp, and/or datetime.datetime. I fixed it by setting the replacement value to be a numpy.datetime64 from the start.

static_date = np.datetime64('1994-06-30')
# static_date = np.datetime64(pd.datetime(1994, 6, 30))

mydata.assign(
    mynewdate=np.where(
        mydata.mydate.isnull(),
        static_date,
        mydata.mydate
    )
)

      mydate  mynewdate
0        NaT 1994-06-30
1 1996-04-01 1996-04-01
2 2000-03-01 2000-03-01
3        NaT 1994-06-30
4        NaT 1994-06-30
5        NaT 1994-06-30
6        NaT 1994-06-30
7        NaT 1994-06-30
8        NaT 1994-06-30



回答2:


If you are in pandas try to using mask/where from pandas

df.mask(df['Date'].isnull(), pd.to_datetime('1994-06-30'))
Out[824]: 
        Date
0 1994-06-30
1 1996-04-01
2 2000-03-01
3 1994-06-30
4 1994-06-30
5 1994-06-30
6 1994-06-30
7 1994-06-30
8 1994-06-30


来源:https://stackoverflow.com/questions/52430395/numpy-where-changing-timestamps-datetime-to-integers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!