Appending Column to Frame of HDF File in Pandas

前端未结

关注

 1  1287

I am working with a large dataset in CSV format. I am trying to process the data column-by-column, then append the data to a frame in an HDF file. All of this is done using

相关标签:

1条回答

遇见更好的自我

2020-12-15 10:59
complete docs are here, and some cookbook strategies here

PyTables is row-oriented, so you can only append rows. Read the csv chunk-by-chunk then append the entire frame as you go, something like this:
```
store = pd.HDFStore('file.h5',mode='w')
for chunk in read_csv('file.csv',chunksize=50000):
         store.append('df',chunk)
store.close()
```
You must be a tad careful as it is possiible for the dtypes of the resultant frrame when read chunk-by-chunk to have different dtypes, e.g. you have a integer like column that doesn't have missing values until say the 2nd chunk. The first chunk would have that column as an int64, while the second as float64. You may need to force dtypes with the dtype keyword to read_csv, see here.

here is a similar question as well.
0 讨论(0)
发布评论:

提交评论
- 加载中...