How do I efficiently apply pandas.Timestamp functions to a full dataframe/column?

一个人想着一个人 提交于 2021-02-10 18:22:03

问题


Pandas is a great tool for a number of data tasks. Many functions have been streamlined to efficiently be applied to columns rather than individual cells/rows. One such function is the to_datetime() function, which I use as an example later in this question. However, there are a number of commands in pandas that, as best I can tell from the documentation, do not directly relate to dataframes. The specific function I am interested in is the pandas.Timestamp.isocalendar() function, but there are a slew of functions in the pandas.Timestamp class (and likely other pandas classes as well) that fit this description and have minimal documentation. Is there a way to efficiently broadcast these functions to a full column's worth of data? If so, how would I do that?

Note: I know that I can use the apply() function, but this is demonstrably slower (~5x in my test) than what I have in mind. The apply() function is also not restricted to pandas functions, so I feel there must be a way to do this (otherwise, why have the pandas.Timestamp class at all, when datetime does these same things for single values?). See the below code for an example, in which I compare the pandas.to_datetime() function to applying the datetime.strptime() function to each row.

import pandas as pd
import datetime
from faker import Faker
import time
import copy

# Setting up fake dataframe:
Faker.seed(0)
fake = Faker()

observations=1000

dates=[fake.date_between(start_date=datetime.datetime(2020,1,1),end_date=datetime.datetime(2020,1,31)) for _ in range(observations)]
index=[x for x in range(observations)]

df=pd.DataFrame({'id' : index,'dates' : dates},columns=['id','dates'])

# Converting datetime object to string:
df['dates']=df['dates'].apply(lambda x: x.strftime('%Y-%m-%d'))

# Copy dataframe to run two time tests:
df2=copy.copy(df)

# Speed of the apply() function:
tic = time.perf_counter()
df['dates']=df['dates'].apply(lambda x: datetime.datetime.strptime(x,'%Y-%m-%d'))
toc = time.perf_counter()
print(f'pandas apply(lambda) completed in {toc-tic:0.4f} seconds')

# speed of the to_datetime() function:
tic = time.perf_counter()
df2['dates']=pd.to_datetime(df2['dates'],format='%Y-%m-%d')
toc = time.perf_counter()
print(f'pandas to_datetime() completed in {toc-tic:0.4f} seconds')

#Script returns:
#pandas apply(lambda) completed in 0.0107 seconds
#pandas to_datetime() completed in 0.0021 seconds

回答1:


Most of the time function can be accessed once you got a datetime64[ns] dtype (which will be created once you have a datetimeindex : for example using date_range or to_datetime).

You can then use the dt accessor to cast all datetime-like functions efficiently:

df['dates'].dt.isocalendar()


来源:https://stackoverflow.com/questions/65347513/how-do-i-efficiently-apply-pandas-timestamp-functions-to-a-full-dataframe-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!