In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?

后端 未结 3 719
粉色の甜心
粉色の甜心 2020-11-29 07:53
import pandas as pd
date_stngs = (\'2008-12-20\',\'2008-12-21\',\'2008-12-22\',\'2008-12-23\')

a = pd.Series(range(4),index = (range(4)))

for idx, date in enumerat         


        
相关标签:
3条回答
  • 2020-11-29 08:21
    In [46]: pd.to_datetime(pd.Series(date_stngs))
    Out[46]: 
    0   2008-12-20 00:00:00
    1   2008-12-21 00:00:00
    2   2008-12-22 00:00:00
    3   2008-12-23 00:00:00
    dtype: datetime64[ns]
    

    Update: benchmark

    In [43]: dates = [(dt.datetime(1960, 1, 1)+dt.timedelta(days=i)).date().isoformat() for i in range(20000)]
    
    In [44]: timeit pd.Series([pd.to_datetime(date) for date in dates])
    1 loops, best of 3: 1.71 s per loop
    
    In [45]: timeit pd.to_datetime(pd.Series(dates))
    100 loops, best of 3: 5.71 ms per loop
    
    0 讨论(0)
  • 2020-11-29 08:36

    A simple solution involves the Series constructor. You can simply pass the data type to the dtype parameter. Also, the to_datetime function can take a sequence of strings now.

    Create Data

    date_strings = ('2008-12-20','2008-12-21','2008-12-22','2008-12-23')
    

    All three produce the same thing

    pd.Series(date_strings, dtype='datetime64[ns]')
    pd.Series(pd.to_datetime(date_strings))
    pd.to_datetime(pd.Series(date_strings))
    

    Benchmarks

    The benchmarks provided by @waitingkuo are wrong. The first method is a bit slower than the other two, which have the same performance.

    import datetime as dt
    dates = [(dt.datetime(1960, 1, 1)+dt.timedelta(days=i)).date().isoformat() 
             for i in range(20000)] * 100
    
    %timeit pd.Series(dates, dtype='datetime64[ns]')
    730 ms ± 9.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    
    %timeit pd.Series(pd.to_datetime(dates))
    426 ms ± 3.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    %timeit pd.to_datetime(pd.Series(dates))
    430 ms ± 5.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    0 讨论(0)
  • 2020-11-29 08:40
    >>> import pandas as pd
    >>> date_stngs = ('2008-12-20','2008-12-21','2008-12-22','2008-12-23')
    >>> a = pd.Series([pd.to_datetime(date) for date in date_stngs])
    >>> a
    0    2008-12-20 00:00:00
    1    2008-12-21 00:00:00
    2    2008-12-22 00:00:00
    3    2008-12-23 00:00:00
    

    UPDATE

    Use pandas.to_datetime(pd.Series(..)). It's concise and much faster than above code.

    >>> pd.to_datetime(pd.Series(date_stngs))
    0   2008-12-20 00:00:00
    1   2008-12-21 00:00:00
    2   2008-12-22 00:00:00
    3   2008-12-23 00:00:00
    
    0 讨论(0)
提交回复
热议问题