dask dataframe read parquet schema difference
I do the following: import dask.dataframe as dd from dask.distributed import Client client = Client() raw_data_df = dd.read_csv('dataset/nyctaxi/nyctaxi/*.csv', assume_missing=True, parse_dates=['tpep_pickup_datetime', 'tpep_dropoff_datetime']) The dataset is taken out of a presentation Mathew Rocklin has made and was used as a dask dataframe demo. Then I try to write it to parquet using pyarrow raw_data_df.to_parquet(path='dataset/parquet/2015.parquet/') # only pyarrow is installed Trying to read back: raw_data_df = dd.read_parquet(path='dataset/parquet/2015.parquet/') I get the following