hdf5 | 易学教程

how to close a hdf5 file

阅读更多关于 how to close a hdf5 file

问题 I am using the rhdf5 library to create HDF5 files. h5createFile(myHDF5FileName) h5createDataset(myHDF5FileName,"myData",storage.mode="double",level=9,dims=length(myData), chunk=10000) h5write(myData,myHDF5FileName,"myData") Works fine, except that when I tried to delete the physical file Windows 7 tells me that the file is still open in RStudio , i.e. the file handle seems to be open in my RStudio environment. I checked the rhdf5 documents - http://www.bioconductor.org/packages/release/bioc

C/C++ HDF5 Read string attribute

阅读更多关于 C/C++ HDF5 Read string attribute

问题 A colleague of mine used labview to write an ASCII string as an attribute in an HDF5 file. I can see that the attribute exist, and read it, but I can't print it. The attribute is, as shown in HDF Viewer: Date = 2015\07\09 So "Date" is its name. I'm trying to read the attribute with this code hsize_t sz = H5Aget_storage_size(dateAttribHandler); std::cout<<sz<<std::endl; //prints 16 hid_t atype = H5Aget_type(dateAttribHandler); std::cout<<atype<<std::endl; //prints 50331867 std::cout<<H5Aread

Loading large datasets with dask

阅读更多关于 Loading large datasets with dask

问题 I am in an HPC environment with clusters, tightly coupled interconnects, and backing Lustre filesystems. We have been exploring how to leverage Dask to not only provide computation, but also to act as a distributed cache to speed up our workflows. Our proprietary data format is n-dimensional and regular, and we have coded a lazy reader to pass into the from_array/from_delayed methods. We have had some issues with loading and persisting larger-than-memory datasets across a Dask cluster.

How to install h5py (+numpy+libhdf5+…) as non-root on a Debian Linux system

阅读更多关于 How to install h5py (+numpy+libhdf5+…) as non-root on a Debian Linux system

问题 I need to install the h5py Python module, and all its absent dependencies, on a Debian Linux system. This task is complicated by the following: I don't have any superuser privileges on this system (no sudo, no root password, etc.); the rest of the code I am using requires version 2.7 of Python, which is not the default version installed in this system (although Python 2.7 is available under /opt/python-2.7.1). The ideal solution would be one that would enable me to use the dependency info in

Fast reading of specified columns in df using pandas.to_hdf

阅读更多关于 Fast reading of specified columns in df using pandas.to_hdf

问题 I have a dataframe of 2Gb that is a write once, read many df. I would like to use the df in pandas, therefore I was using df.read_hdf and df.to_hdf in a fixed format which works pretty fine in reading and writing. However, the df is growing with more columns being added, so I would like to use the table format instead, so I can select the columns I need when reading the data. I thought this would give me a speed advantage, but from testing this doesn't seem to be the case. This example:

Convert .h5 file to .jpg with Python

阅读更多关于 Convert .h5 file to .jpg with Python

问题 I currently have a .h5 file containing grayscale imagery. I need to convert it to a .jpg. Does anybody have any experience with this? Note: I could possible convert the h5 file to a numpy array and then use an external library like pypng to convert that to a png. But I am wondering if there is a more efficient way to convert to an image, and preferrably a .jpg. 回答1: Key ingredients: h5py to read the h5 file. Determine the format of your image and use PIL. Let us suppose it's RGB format (https

numpy数组之读写文件

阅读更多关于 numpy数组之读写文件

目录通过 numpy 读写 txt 或 csv 文件通过 numpy 读写 npy 或 npz 文件读写 npy 文件读写 npz 文件通过 h5py 读写 hdf5 文件简单读取通过切片赋值总结 References 将 numpy 数组存入文件，有多种文件类型可供选择，对应地就有不同的方法来读写。下面我将介绍读写 numpy 的三类文件： txt 或者 csv 文件 npy 或者 npz 文件 hdf5 文件通过 numpy 读写 txt 或 csv 文件 import numpy as np a = np.array(range(20)).reshape((4, 5)) print(a) # 后缀改为 .txt 一样 filename = 'data/a.csv' # 写文件 np.savetxt(filename, a, fmt='%d', delimiter=',') # 读文件 b = np.loadtxt(filename, dtype=np.int32, delimiter=',') print(b) 缺点：只能保存一维和二维 numpy 数组，当 numpy 数组 a 有多维时，需要将其 a.reshape((a.shape[0], -1)) 后才能用这种方式保存。不能追加保存，即每次 np.savetxt() 都会覆盖之前的内容。通过

pandas read_hdf with 'where' condition limitation?

阅读更多关于 pandas read_hdf with 'where' condition limitation?

问题 I need to query an HDF5 file with where clause with 3 conditions, one of the condition is a list with a length of 30: myList = list(xrange(30)) h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time=timeString') The query above gives me ValueError: too many inputs and the error is reproducible. If I reduce length of the list to 29 (three conditions): myList = list(xrange(29)) h5DF = pd.read_hdf(h5Filename, 'df', where='index=myList & date=dateString & time

How can one get the name of an HDF5 DataSet through the C or C++ API?

阅读更多关于 How can one get the name of an HDF5 DataSet through the C or C++ API?

问题 I'm trying to read the name of a HDF5 DataSet using the C++ API. For H5::Attribute objects, there is a getName() method. However, I don't see a similar getName() method for H5:DataSet objects. Ideally I want to do this: void Dump(H5::DataSet& ds) { cout << "Dataset " << ds.getName() << endl; // continue to print dataset values } I know h5dump can do it, but briefly looking at the code, it only knows it by walking the tree using H5Giterate , that is only the parent knows the name of the

How to write a Pandas Dataframe into a HDF5 dataset

阅读更多关于 How to write a Pandas Dataframe into a HDF5 dataset

问题 I'm trying to write data from a Pandas dataframe into a nested hdf5 file, with multiple groups and datasets within each group. I'd like to keep it as a single file which will grow in the future on a daily basis. I've had a go with the following code, which shows the structure of what I'd like to achieve import h5py import numpy as np import pandas as pd file = h5py.File('database.h5','w') d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index=[