问题
Currently I am working with a big dataframe (12x47800). One of the twelve columns is a column consisting of an integer number of seconds. I want to change this column to a column consisting of a datetime.time format. Schedule is my dataframe where I try changing the column named 'depTime'. Since I want it to be a datetime.time and it could cross midnight i added the if-statement. This 'works' but really slow as one could imagine. Is there a faster way to do this? My current code, the only one I could get working is:
for i in range(len(schedule)):
t_sec = schedule.iloc[i].depTime
[t_min, t_sec] = divmod(t_sec,60)
[t_hour,t_min] = divmod(t_min,60)
if t_hour>23:
t_hour -= 23
schedule['depTime'].iloc[i] = dt.time(int(t_hour),int(t_min),int(t_sec))
Thanks in advance guys.
Ps: I'm pretty new to Python, so if anybody could help me I would be very gratefull :)
回答1:
I'm adding a new solution which is much faster than the original since it relies on pandas vectorized functions instead of looping (pandas apply functions are essentially optimized loops on the data).
I tested it with a sample similar in size to yours and the difference is from 778ms to 21.3ms. So I definitely recommend the new version.
Both solutions are based on transforming your seconds integers into timedelta format and adding it to a reference datetime. Then, I simply capture the time component of the resulting datetimes.
New (Faster) Option:
import datetime as dt
seconds = pd.Series(np.random.rand(50)*100).astype(int) # Generating test data
start = dt.datetime(2019,1,1,0,0) # You need a reference point
datetime_series = seconds.astype('timedelta64[s]') + start
time_series = datetime_series.dt.time
time_series
Original (slower) Answer:
Not the most elegant solution, but it does the trick.
import datetime as dt
seconds = pd.Series(np.random.rand(50)*100).astype(int) # Generating test data
start = dt.datetime(2019,1,1,0,0) # You need a reference point
time_series = seconds.apply(lambda x: start + pd.Timedelta(seconds=x)).dt.time
回答2:
You should try not to do a full scan on a dataframe, but instead use vectorized access because it is normally much more efficient.
Fortunately, pandas has a function that does exactly what you are asking for, to_timedelta
:
schedule['depTime'] = pd.to_timedelta(schedule['depTime'], unit='s')
It is not really a datetime format, but it is the pandas equivalent of a datetime.timedelta
and is a convenient type for processing times. You could use to_datetime
but will end with a full datetime close to 1970-01-01...
If you really need datetime.time
objects, you can get them that way:
schedule['depTime'] = pd.to_datetime(schedule['depTime'], unit='s').dt.time
but they are less convenient to use in a pandas dataframe.
来源:https://stackoverflow.com/questions/55003543/python-converting-a-seconds-to-a-datetime-format-in-a-dataframe-column