Python: Converting a seconds to a datetime format in a dataframe column

问题

Currently I am working with a big dataframe (12x47800). One of the twelve columns is a column consisting of an integer number of seconds. I want to change this column to a column consisting of a datetime.time format. Schedule is my dataframe where I try changing the column named 'depTime'. Since I want it to be a datetime.time and it could cross midnight i added the if-statement. This 'works' but really slow as one could imagine. Is there a faster way to do this? My current code, the only one I could get working is:

for i in range(len(schedule)):
    t_sec = schedule.iloc[i].depTime
    [t_min, t_sec] = divmod(t_sec,60)
    [t_hour,t_min] = divmod(t_min,60)
    if t_hour>23:
        t_hour -= 23
    schedule['depTime'].iloc[i] = dt.time(int(t_hour),int(t_min),int(t_sec))

Thanks in advance guys.

Ps: I'm pretty new to Python, so if anybody could help me I would be very gratefull :)

回答1:

I'm adding a new solution which is much faster than the original since it relies on pandas vectorized functions instead of looping (pandas apply functions are essentially optimized loops on the data).

I tested it with a sample similar in size to yours and the difference is from 778ms to 21.3ms. So I definitely recommend the new version.

Both solutions are based on transforming your seconds integers into timedelta format and adding it to a reference datetime. Then, I simply capture the time component of the resulting datetimes.

New (Faster) Option:

import datetime as dt

seconds = pd.Series(np.random.rand(50)*100).astype(int) # Generating test data

start = dt.datetime(2019,1,1,0,0) # You need a reference point

datetime_series = seconds.astype('timedelta64[s]') + start

time_series = datetime_series.dt.time

time_series

Original (slower) Answer:

Not the most elegant solution, but it does the trick.

import datetime as dt

seconds = pd.Series(np.random.rand(50)*100).astype(int) # Generating test data

start = dt.datetime(2019,1,1,0,0) # You need a reference point

time_series = seconds.apply(lambda x: start + pd.Timedelta(seconds=x)).dt.time

回答2:

You should try not to do a full scan on a dataframe, but instead use vectorized access because it is normally much more efficient.

Fortunately, pandas has a function that does exactly what you are asking for, to_timedelta:

schedule['depTime'] = pd.to_timedelta(schedule['depTime'], unit='s')

It is not really a datetime format, but it is the pandas equivalent of a datetime.timedelta and is a convenient type for processing times. You could use to_datetime but will end with a full datetime close to 1970-01-01...

If you really need datetime.time objects, you can get them that way:

schedule['depTime'] = pd.to_datetime(schedule['depTime'], unit='s').dt.time

but they are less convenient to use in a pandas dataframe.

来源：https://stackoverflow.com/questions/55003543/python-converting-a-seconds-to-a-datetime-format-in-a-dataframe-column

标签

python

pandas

datetime

seconds