Store multi-index pandas dataframe with hdf5 table format

∥☆過路亽.° 提交于 2019-12-11 01:52:39

问题


I just came across this issue when adding a multi-index to my pandas dataframe. I am using the pandas HDFStore with the option format='table', which I prefer because the saved data frame is easier to understand and load when not using pandas. (For details see this SO answer: Save pandas DataFrame using h5py for interoperabilty with other hdf5 readers .)

But I ran into a problem because I was setting the multi-index using drop=False when calling set_index, which keeps the index columns as dataframe columns. This was fine until I put the dataframe to the store using format='table'. Using format='fixed' worked fine. But format='table' gave me an error with duplicate column names. I avoided the error by dropping the redundant columns before putting and restoring the columns after getting.

Here is the write/read pair of functions that I now use:

def write_df_without_index_columns(store, name, df):
    if isinstance(df.index, pd.MultiIndex):
        # drop any columns that are duplicates of index columns
        redundant_columns = set(df.index.names).intersection(set(df.columns))
        if redundant_columns:
            df = df.copy(deep=True)
            df.drop(list(redundant_columns), axis=1, inplace=True)

    store.put(name, df,
              format='table',
              data_columns=True)

def read_df_add_index_columns(store, name, default_value):
    df = store.get(name)
    if isinstance(df.index, pd.MultiIndex):
        # remember the MultiIndex column names
        index_columns = df.index.names
        # put the MultiIndex columns into the data frame
        df.reset_index(drop=False, inplace=True)
        # now put the MultiIndex columns back into the index
        df.set_index(index_columns, drop=False, inplace=True)
    return df

My question: is there a better way to do this? I expect to have a data frame with millions of rows, so I do not want this to be too inefficient.

来源:https://stackoverflow.com/questions/44121688/store-multi-index-pandas-dataframe-with-hdf5-table-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!