dask dataframe how to convert column to to_datetime

后端未结

关注

 5  1404

I am trying to convert one column of my dataframe to datetime. Following the discussion here https://github.com/dask/dask/issues/863 I tried the following code:

相关标签:

5条回答

野趣味

2020-12-05 06:07
Use astype

You can use the astype method to convert the dtype of a series to a NumPy dtype
```
df.time.astype('M8[us]')
```
There is probably a way to specify a Pandas style dtype as well (edits welcome)

Use map_partitions and meta

When using black-box methods like map_partitions, dask.dataframe needs to know the type and names of the output. There are a few ways to do this listed in the docstring for map_partitions.

You can supply an empty Pandas object with the right dtype and name
```
meta = pd.Series([], name='time', dtype=pd.Timestamp)
```
Or you can provide a tuple of (name, dtype) for a Series or a dict for a DataFrame
```
meta = ('time', pd.Timestamp)
```
Then everything should be fine
```
df.time.map_partitions(pd.to_datetime, meta=meta)
```
If you were calling map_partitions on df instead then you would need to provide the dtypes for everything. That isn't the case in your example though.
0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2020-12-05 06:09

If the datetime is in a non ISO format then map_partition yields better results:

import dask
import pandas as pd
from dask.distributed import Client
client = Client()

ddf = dask.datasets.timeseries()
ddf = ddf.assign(datetime=ddf.index.astype(object))
ddf = (ddf.assign(datetime_nonISO = ddf['datetime'].astype(str).str.split(' ')
                                 .apply(lambda x: x[1]+' '+x[0], meta=('object'))) 

%%timeit
ddf.datetime = ddf.datetime.astype('M8[s]')
ddf.compute()

11.3 s ± 719 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ddf = dask.datasets.timeseries()
ddf = ddf.assign(datetime=ddf.index.astype(object))
ddf = (ddf.assign(datetime_nonISO = ddf['datetime'].astype(str).str.split(' ')
                                 .apply(lambda x: x[1]+' '+x[0], meta=('object'))) 


%%timeit
ddf.datetime_nonISO = (ddf.datetime_nonISO.map_partitions(pd.to_datetime
                       ,  format='%H:%M:%S %Y-%m-%d', meta=('datetime64[s]')))
ddf.compute()

8.78 s ± 599 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

ddf = dask.datasets.timeseries()
ddf = ddf.assign(datetime=ddf.index.astype(object))
ddf = (ddf.assign(datetime_nonISO = ddf['datetime'].astype(str).str.split(' ')
                                 .apply(lambda x: x[1]+' '+x[0], meta=('object'))) 

%%timeit
ddf.datetime_nonISO = ddf.datetime_nonISO.astype('M8[s]')
ddf.compute()

1min 8s ± 3.65 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

0 讨论(0)

后悔当初

2020-12-05 06:15
Dask also come with to_timedelta so this should work as well.
```
df['time']=dd.to_datetime(df.time,unit='ns')
```
The values unit takes is the same as pd.to_timedelta in pandas. This can be found here.
0 讨论(0)
发布评论:

提交评论
- 加载中...
深忆病人

2020-12-05 06:16

This worked for me

ddf["Date"] = ddf["Date"].map_partitions(pd.to_datetime,format='%d/%m/%Y',meta = ('datetime64[ns]'))

0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-12-05 06:17
I'm not sure if it this is the right approach, but mapping the column worked for me:
```
df['time'] = df['time'].map(lambda x: pd.to_datetime(x, errors='coerce'))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

dask dataframe how to convert column to to_datetime

Use astype

Use map_partitions and meta

Use `astype`