Pandas: in memory sorting hdf5 files
I have the following problem: I have a set several hdf5 files with similar data frames which I want to sort globally based on multiple columns. My input is the file names and an ordered list of columns I want to use for sorting. The output should be a single hdf5 file containing all the sorted data. Each file can contain millions of rows. I can afford loading a single file in memory but not the entire dataset. Naively I would like first to copy all the data in a single hdf5 file (which is not difficult) and then find out a way to do in memory sorting of this huge file. Is there a quick way to