I want to select rows from a dask dataframe based on a list of indices. How can I do that?
Example: Let\'s say, I have the following dask dataframe. <
Using dask version '1.2.0' results with an error due to the mixed index type.
in any case there is an option to use loc.
import pandas as pd
import dask.dataframe as dd
#generate example dataframe
pdf = pd.DataFrame(dict(A = [1,2,3,4,5], B = [6,7,8,9,0]), index=['i1', 'i2', 'i3', '4', '5'])
ddf = dd.from_pandas(pdf, npartitions = 2,)
# #list of indices I want to select
l = ['i1', '4', '5']
# #generate new dask dataframe containing only the specified indices
# ddf_selected = ddf.map_partitions(lambda x: x[x.index.isin(l)], meta = ddf.dtypes)
ddf_selected = ddf.loc[l]
ddf_selected.head()