I have the following dataframe:
user_id purchase_date
1 2015-01-23 14:05:21
2 2015-02-05 05:07:30
3 2015-02-18 17:08:51
4
Try this ..
df['month']=pd.to_datetime(df.purchase_date.astype(str).str[0:7]+'-01')
Out[187]:
user_id purchase_date month
0 1 2015-01-23 14:05:21 2015-01-01
1 2 2015-02-05 05:07:30 2015-02-01
2 3 2015-02-18 17:08:51 2015-02-01
3 4 2015-03-21 17:07:30 2015-03-01
4 5 2015-03-11 18:32:56 2015-03-01
5 6 2015-03-03 11:02:30 2015-03-01
We can use date offset in conjunction with Series.dt.normalize:
In [60]: df['month'] = df['purchase_date'].dt.normalize() - pd.offsets.MonthBegin(1)
In [61]: df
Out[61]:
user_id purchase_date month
0 1 2015-01-23 14:05:21 2015-01-01
1 2 2015-02-05 05:07:30 2015-02-01
2 3 2015-02-18 17:08:51 2015-02-01
3 4 2015-03-21 17:07:30 2015-03-01
4 5 2015-03-11 18:32:56 2015-03-01
5 6 2015-03-03 11:02:30 2015-03-01
Or much nicer solution from @BradSolomon
In [95]: df['month'] = df['purchase_date'] - pd.offsets.MonthBegin(1, normalize=True)
In [96]: df
Out[96]:
user_id purchase_date month
0 1 2015-01-23 14:05:21 2015-01-01
1 2 2015-02-05 05:07:30 2015-02-01
2 3 2015-02-18 17:08:51 2015-02-01
3 4 2015-03-21 17:07:30 2015-03-01
4 5 2015-03-11 18:32:56 2015-03-01
5 6 2015-03-03 11:02:30 2015-03-01
@Eyal: This is what I did to get the first day of the month using pd.offsets.MonthBegin
and handle the scenario where day is already first day of month.
import datetime
from_date= pd.to_datetime('2018-12-01')
from_date = from_date - pd.offsets.MonthBegin(1, normalize=True) if not from_date.is_month_start else from_date
from_date
result: Timestamp('2018-12-01 00:00:00')
from_date= pd.to_datetime('2018-12-05')
from_date = from_date - pd.offsets.MonthBegin(1, normalize=True) if not rom_date.is_month_start else from_date
from_date
result: Timestamp('2018-12-01 00:00:00')
For me df['purchase_date'] - pd.offsets.MonthBegin(1)
didn't work (it fails for the first day of the month), so I'm subtracting the days of the month like this:
df['purchase_date'] - pd.to_timedelta(df['purchase_date'].dt.day - 1, unit='d')
Most proposed solutions don't work for the first day of the month.
Following solution works for any day of the month:
df['month'] = df['purchase_date'] + pd.offsets.MonthEnd(0) - pd.offsets.MonthBegin(normalize=True)
[EDIT]
Another, more readable, solution is:
from pandas.tseries.offsets import MonthBegin
df['month'] = df['purchase_date'].dt.normalize().map(MonthBegin().rollback)
Be aware not to use:
df['month'] = df['purchase_date'].map(MonthBegin(normalize=True).rollback)
because that gives incorrect results for the first day due to a bug: https://github.com/pandas-dev/pandas/issues/32616
How about this easy solution?
As purchase_date
is already in datetime64[ns]
format, you can use strftime to format the date to always have the first day of month.
df['date'] = df['purchase_date'].apply(lambda x: x.strftime('%Y-%m-01'))
print(df)
user_id purchase_date date
0 1 2015-01-23 14:05:21 2015-01-01
1 2 2015-02-05 05:07:30 2015-02-01
2 3 2015-02-18 17:08:51 2015-02-01
3 4 2015-03-21 17:07:30 2015-03-01
4 5 2015-03-11 18:32:56 2015-03-01
5 6 2015-03-03 11:02:30 2015-03-01
Because we used strftime
, now the date
column is in object
(string) type:
print(df.dtypes)
user_id int64
purchase_date datetime64[ns]
date object
dtype: object
Now if you want it to be in datetime64[ns]
, just use pd.to_datetime():
df['date'] = pd.to_datetime(df['date'])
print(df.dtypes)
user_id int64
purchase_date datetime64[ns]
date datetime64[ns]
dtype: object