Why do pandas and dask perform better when importing from CSV compared to HDF5?
I am working with a system that currently operates with large (>5GB) .csv files. To increase performance, I am testing (A) different methods to create dataframes from disk (pandas VS dask ) as well as (B) different ways to store results to disk (.csv VS hdf5 files). In order to benchmark performance, I did the following: def dask_read_from_hdf(): results_dd_hdf = dd.read_hdf('store.h5', key='period1', columns = ['Security']) analyzed_stocks_dd_hdf = results_dd_hdf.Security.unique() hdf.close() def pandas_read_from_hdf(): results_pd_hdf = pd.read_hdf('store.h5', key='period1', columns = [