How can extract data from .h5 file and save it in .txt or .csv properly?

扶醉桌前 提交于 2021-01-29 19:35:46

问题


After searching a lot I couldn't find a simple way to extract data from .h5 and pass it to a data.Frame by Numpy or Pandas in order to save in .txt or .csv file.

import h5py
import numpy as np
import pandas as pd

filename = 'D:\data.h5'
f = h5py.File(filename, 'r')

# List all groups
print("Keys: %s" % f.keys())
a_group_key = list(f.keys())[0]

# Get the data
data = list(f[a_group_key])
pd.DataFrame(data).to_csv("hi.csv")
Keys: <KeysViewHDF5 ['dd48']>

When I print data I see following results:

print(data)
['axis0',
 'axis1',
 'block0_items',
 'block0_values',
 'block1_items',
 'block1_values']

I would appreciate the if someone explain me what are they and how I can extract data completely and save it in .csv file. It seems there hasn't been a routine way to do that and it's kind of challenging yet! Until now I just could see part of data via:

import numpy as np 
dfm = np.fromfile('D:\data.h5', dtype=float)
print (dfm.shape)
print(dfm[5:])

dfm=pd.to_csv('train.csv')
#dfm.to_csv('hi.csv', sep=',', header=None, index=None)

My expectation is to extract time_stamps and measurements in .h5 file.


回答1:


It looks like that data was written by Pandas, so use pd.read_hdf() to read it.




回答2:


h5py will access HDF5 datasets as numpy arrays. Your call to get the keys returns a LIST of the dataset names. Now that you have them, it should be pretty simple to access them as a numpy array and write them. You need to get the dtype to know what is in each column to format correctly.

Updated 5/22/2019 to reflect content of data.h5 posted at link in comment. Default format in np.savetxt() is '%.18e'. Very simple (crude) logic provided to modify format based on dtype for these datasets. This requires more robust dtype checking and formatting for general use. Also, you will need to add logic to decode unicode strings.

import h5py
filename = 'D:\data.h5'
import numpy as np
h5f = h5py.File(filename, 'r')
# get a List of data sets in group 'dd48'
a_dset_keys = list(h5f['dd48'].keys())

# Get the data
for dset in a_dset_keys :
    ds_data = (h5f['dd48'][dset])
    print ('dataset=', dset)
    print (ds_data.dtype)
    if ds_data.dtype == 'float64' :
        csvfmt = '%.18e'
    elif ds_data.dtype == 'int64' :
        csvfmt = '%.10d'
    else:
        csvfmt = '%s'
    np.savetxt('output_'+dset+'.csv', ds_data, fmt=csvfmt, delimiter=',')


来源:https://stackoverflow.com/questions/56238200/how-can-extract-data-from-h5-file-and-save-it-in-txt-or-csv-properly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!