Extracting the first day of month of a datetime type column in pandas

前端 未结 8 1183
你的背包
你的背包 2020-12-14 06:11

I have the following dataframe:

user_id    purchase_date 
  1        2015-01-23 14:05:21
  2        2015-02-05 05:07:30
  3        2015-02-18 17:08:51
  4            


        
相关标签:
8条回答
  • 2020-12-14 06:31

    Try this ..

    df['month']=pd.to_datetime(df.purchase_date.astype(str).str[0:7]+'-01')
    
    Out[187]: 
       user_id        purchase_date       month
    0        1  2015-01-23 14:05:21  2015-01-01
    1        2  2015-02-05 05:07:30  2015-02-01
    2        3  2015-02-18 17:08:51  2015-02-01
    3        4  2015-03-21 17:07:30  2015-03-01
    4        5  2015-03-11 18:32:56  2015-03-01
    5        6  2015-03-03 11:02:30  2015-03-01
    
    0 讨论(0)
  • 2020-12-14 06:34

    We can use date offset in conjunction with Series.dt.normalize:

    In [60]: df['month'] = df['purchase_date'].dt.normalize() - pd.offsets.MonthBegin(1)
    
    In [61]: df
    Out[61]:
       user_id       purchase_date      month
    0        1 2015-01-23 14:05:21 2015-01-01
    1        2 2015-02-05 05:07:30 2015-02-01
    2        3 2015-02-18 17:08:51 2015-02-01
    3        4 2015-03-21 17:07:30 2015-03-01
    4        5 2015-03-11 18:32:56 2015-03-01
    5        6 2015-03-03 11:02:30 2015-03-01
    

    Or much nicer solution from @BradSolomon

    In [95]: df['month'] = df['purchase_date'] - pd.offsets.MonthBegin(1, normalize=True)
    
    In [96]: df
    Out[96]:
       user_id       purchase_date      month
    0        1 2015-01-23 14:05:21 2015-01-01
    1        2 2015-02-05 05:07:30 2015-02-01
    2        3 2015-02-18 17:08:51 2015-02-01
    3        4 2015-03-21 17:07:30 2015-03-01
    4        5 2015-03-11 18:32:56 2015-03-01
    5        6 2015-03-03 11:02:30 2015-03-01
    
    0 讨论(0)
  • 2020-12-14 06:34

    @Eyal: This is what I did to get the first day of the month using pd.offsets.MonthBegin and handle the scenario where day is already first day of month.

    import datetime
    
    from_date= pd.to_datetime('2018-12-01')
    
    from_date = from_date - pd.offsets.MonthBegin(1, normalize=True) if not from_date.is_month_start else from_date
    
    from_date
    

    result: Timestamp('2018-12-01 00:00:00')

    from_date= pd.to_datetime('2018-12-05')
    
    from_date = from_date - pd.offsets.MonthBegin(1, normalize=True) if not rom_date.is_month_start else from_date
    
    from_date
    

    result: Timestamp('2018-12-01 00:00:00')

    0 讨论(0)
  • 2020-12-14 06:39

    For me df['purchase_date'] - pd.offsets.MonthBegin(1) didn't work (it fails for the first day of the month), so I'm subtracting the days of the month like this:

    df['purchase_date'] - pd.to_timedelta(df['purchase_date'].dt.day - 1, unit='d')
    
    0 讨论(0)
  • 2020-12-14 06:44

    Most proposed solutions don't work for the first day of the month.

    Following solution works for any day of the month:

    df['month'] = df['purchase_date'] + pd.offsets.MonthEnd(0) - pd.offsets.MonthBegin(normalize=True)
    

    [EDIT]

    Another, more readable, solution is:

    from pandas.tseries.offsets import MonthBegin
    df['month'] = df['purchase_date'].dt.normalize().map(MonthBegin().rollback)
    

    Be aware not to use:

    df['month'] = df['purchase_date'].map(MonthBegin(normalize=True).rollback)
    

    because that gives incorrect results for the first day due to a bug: https://github.com/pandas-dev/pandas/issues/32616

    0 讨论(0)
  • 2020-12-14 06:47

    How about this easy solution?
    As purchase_date is already in datetime64[ns] format, you can use strftime to format the date to always have the first day of month.

    df['date'] = df['purchase_date'].apply(lambda x: x.strftime('%Y-%m-01'))
    
    print(df)
     user_id   purchase_date       date
    0   1   2015-01-23 14:05:21 2015-01-01
    1   2   2015-02-05 05:07:30 2015-02-01
    2   3   2015-02-18 17:08:51 2015-02-01
    3   4   2015-03-21 17:07:30 2015-03-01
    4   5   2015-03-11 18:32:56 2015-03-01
    5   6   2015-03-03 11:02:30 2015-03-01
    

    Because we used strftime, now the date column is in object (string) type:

    print(df.dtypes)
    user_id                   int64
    purchase_date    datetime64[ns]
    date                     object
    dtype: object
    

    Now if you want it to be in datetime64[ns], just use pd.to_datetime():

    df['date'] = pd.to_datetime(df['date'])
    
    print(df.dtypes)
    user_id                   int64
    purchase_date    datetime64[ns]
    date             datetime64[ns]
    dtype: object
    
    0 讨论(0)
提交回复
热议问题