Python Pandas. Date object split by separate columns.

亡梦爱人 提交于 2020-01-11 14:42:11

问题


I have dates in Python (pandas) written as "1/31/2010". To apply linear regression I want to have 3 separate variables: number of day, number of month, number of year.

What will be the way to split a column with date in pandas into 3 columns? Another question is to have the same but group days into 3 groups: 1-10, 11-20, 21-31.


回答1:


df['date'] = pd.to_datetime(df['date'])

#Create 3 additional columns
df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year

Ideally, you can do this without having to create 3 additional columns, you can just pass the Series to your function.

In [2]: pd.to_datetime('01/31/2010').day
Out[2]: 31

In [3]: pd.to_datetime('01/31/2010').month
Out[3]: 1

In [4]: pd.to_datetime('01/31/2010').year
Out[4]: 2010



回答2:


This answers only your first question

One solution is to extract attributes of pd.Timestamp objects using operator.attrgetter.

The benefit of this method is you can easily expand / change the attributes you require. In addition, the logic is not specific to object type.

from operator import attrgetter
import pandas as pd

df = pd.DataFrame({'date': ['1/21/2010', '5/5/2015', '4/30/2018']})

df['date'] = pd.to_datetime(df['date'], format='%m/%d/%Y')

attr_list = ['day', 'month', 'year']
attrs = attrgetter(*attr_list)
df[attr_list] = df['date'].apply(attrs).apply(pd.Series)

print(df)

        date  day  month  year
0 2010-01-21   21      1  2010
1 2015-05-05    5      5  2015
2 2018-04-30   30      4  2018


来源:https://stackoverflow.com/questions/50101384/python-pandas-date-object-split-by-separate-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!