HDF is indeed a very good choice, you can also use npy/npz with some caveats:
Here is a benchmark using a data frame of 25k rows and 1000 columns filled with random floats:
Saving to HDF took 0.49s
Saving to npy took 0.40s
Loading from HDF took 0.10s
Loading from npy took 0.061s
npy is about 20% faster to write and about 40% faster to read if you don't compress data.
Code used to generate the output above:
#!/usr/bin/python3
import pandas as pd
import random
import numpy as np
import time
start = time.time()
f = pd.DataFrame()
for i in range(1000):
f['col_{}'.format(i)] = np.random.rand(25000)
print('Generating data took {}s'.format(time.time() - start))
start = time.time()
f.to_hdf('frame.hdf', 'main', format='fixed')
print('Saving to HDF took {}s'.format(time.time() - start))
start = time.time()
np.savez('frame.npz', f.index, f.values)
print('Saving to npy took {}s'.format(time.time() - start))
start = time.time()
pd.read_hdf('frame.hdf')
print('Loading from HDF took {}s'.format(time.time() - start))
start = time.time()
index, values = np.load('frame.npz')
pd.DataFrame(values, index=index)
print('Loading from npy took {}s'.format(time.time() - start))