dask dataframe apply meta

前端 未结 1 1247
陌清茗
陌清茗 2021-02-19 01:10

I\'m wanting to do a frequency count on a single column of a dask dataframe. The code works, but I get an warning complaining that meta is

1条回答
  •  猫巷女王i
    2021-02-19 01:47

    meta is the prescription of the names/types of the output from the computation. This is required because apply() is flexible enough that it can produce just about anything from a dataframe. As you can see, if you don't provide a meta, then dask actually computes part of the data, to see what the types should be - which is fine, but you should know it is happening. You can avoid this pre-computation (which can be expensive) and be more explicit when you know what the output should look like, by providing a zero-row version of the output (dataframe or series), or just the types.

    The output of your computation is actually a series, so the following is the simplest that works

    (dask_df.groupby('Column B')
         .apply(len, meta=('int'))).compute()
    

    but more accurate would be

    (dask_df.groupby('Column B')
         .apply(len, meta=pd.Series(dtype='int', name='Column B')))
    

    0 讨论(0)
提交回复
热议问题