dask dataframe how to convert column to to_datetime

后端 未结 5 1408
后悔当初
后悔当初 2020-12-05 05:27

I am trying to convert one column of my dataframe to datetime. Following the discussion here https://github.com/dask/dask/issues/863 I tried the following code:



        
5条回答
  •  情歌与酒
    2020-12-05 06:09

    If the datetime is in a non ISO format then map_partition yields better results:

    import dask
    import pandas as pd
    from dask.distributed import Client
    client = Client()
    
    ddf = dask.datasets.timeseries()
    ddf = ddf.assign(datetime=ddf.index.astype(object))
    ddf = (ddf.assign(datetime_nonISO = ddf['datetime'].astype(str).str.split(' ')
                                     .apply(lambda x: x[1]+' '+x[0], meta=('object'))) 
    
    %%timeit
    ddf.datetime = ddf.datetime.astype('M8[s]')
    ddf.compute()
    

    11.3 s ± 719 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    ddf = dask.datasets.timeseries()
    ddf = ddf.assign(datetime=ddf.index.astype(object))
    ddf = (ddf.assign(datetime_nonISO = ddf['datetime'].astype(str).str.split(' ')
                                     .apply(lambda x: x[1]+' '+x[0], meta=('object'))) 
    
    
    %%timeit
    ddf.datetime_nonISO = (ddf.datetime_nonISO.map_partitions(pd.to_datetime
                           ,  format='%H:%M:%S %Y-%m-%d', meta=('datetime64[s]')))
    ddf.compute()
    

    8.78 s ± 599 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

    ddf = dask.datasets.timeseries()
    ddf = ddf.assign(datetime=ddf.index.astype(object))
    ddf = (ddf.assign(datetime_nonISO = ddf['datetime'].astype(str).str.split(' ')
                                     .apply(lambda x: x[1]+' '+x[0], meta=('object'))) 
    
    %%timeit
    ddf.datetime_nonISO = ddf.datetime_nonISO.astype('M8[s]')
    ddf.compute()
    

    1min 8s ± 3.65 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

提交回复
热议问题