Appending Column to Frame of HDF File in Pandas

前端 未结 1 1284
遥遥无期
遥遥无期 2020-12-15 10:42

I am working with a large dataset in CSV format. I am trying to process the data column-by-column, then append the data to a frame in an HDF file. All of this is done using

相关标签:
1条回答
  • 2020-12-15 10:59

    complete docs are here, and some cookbook strategies here

    PyTables is row-oriented, so you can only append rows. Read the csv chunk-by-chunk then append the entire frame as you go, something like this:

    store = pd.HDFStore('file.h5',mode='w')
    for chunk in read_csv('file.csv',chunksize=50000):
             store.append('df',chunk)
    store.close()
    

    You must be a tad careful as it is possiible for the dtypes of the resultant frrame when read chunk-by-chunk to have different dtypes, e.g. you have a integer like column that doesn't have missing values until say the 2nd chunk. The first chunk would have that column as an int64, while the second as float64. You may need to force dtypes with the dtype keyword to read_csv, see here.

    here is a similar question as well.

    0 讨论(0)
提交回复
热议问题