dask dataframe how to convert column to to_datetime

后端 未结 5 1405
后悔当初
后悔当初 2020-12-05 05:27

I am trying to convert one column of my dataframe to datetime. Following the discussion here https://github.com/dask/dask/issues/863 I tried the following code:



        
5条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-05 06:07

    Use astype

    You can use the astype method to convert the dtype of a series to a NumPy dtype

    df.time.astype('M8[us]')
    

    There is probably a way to specify a Pandas style dtype as well (edits welcome)

    Use map_partitions and meta

    When using black-box methods like map_partitions, dask.dataframe needs to know the type and names of the output. There are a few ways to do this listed in the docstring for map_partitions.

    You can supply an empty Pandas object with the right dtype and name

    meta = pd.Series([], name='time', dtype=pd.Timestamp)
    

    Or you can provide a tuple of (name, dtype) for a Series or a dict for a DataFrame

    meta = ('time', pd.Timestamp)
    

    Then everything should be fine

    df.time.map_partitions(pd.to_datetime, meta=meta)
    

    If you were calling map_partitions on df instead then you would need to provide the dtypes for everything. That isn't the case in your example though.

提交回复
热议问题