pytables | 易学教程

How should python dictionaries be stored in pytables?

阅读更多关于 How should python dictionaries be stored in pytables?

问题 pytables doesn't natively support python dictionaries. The way I've approached it is to make a data structure of the form: tables_dict = { 'key' : tables.StringCol(itemsize=40), 'value' : tables.Int32Col(), } (note that I ensure that the keys are <40 characters long) and then create a table using this structure: file_handle.createTable('/', 'dictionary', tables_dict) and then populate it with: file_handle.dictionary.append(dictionary.items()) and retrieve data with: dict(file_handle

Pytables duplicates 2.5 giga rows

阅读更多关于 Pytables duplicates 2.5 giga rows

问题 I currently have a .h5 file, with a table in it consisting of three columns: a text columns of 64 chars, an UInt32 column relating to the source of the text and a UInt32 column which is the xxhash of the text. The table consists of ~ 2.5e9 rows I am trying to find and count the duplicates of each text entry in the table - essentially merge them into one entry, while counting the instances. I have tried doing so by indexing on the hash column and then looping through table.itersorted(hash) ,

Python/PyTables: Is it possible to have different data types for different columns of an array?

阅读更多关于 Python/PyTables: Is it possible to have different data types for different columns of an array?

问题 I create an expandable earray of Nx4 columns. Some columns require float64 datatype, the others can be managed with int32. Is it possible to vary the data types among the columns? Right now I just use one (float64, below) for all, but it takes huge disk space for (>10 GB) files. For example, how can I ensure column 1-2 elements are int32 and 3-4 elements are float64 ? import tables f1 = tables.open_file("table.h5", "w") a = f1.create_earray(f1.root, "dataset_1", atom=tables.Float32Atom(),

Python 2.7: Appending Data to Table in Pandas

阅读更多关于 Python 2.7: Appending Data to Table in Pandas

问题 I am reading data from image files and I want to append this data into a single HDF file. Here is my code: datafile = pd.HDFStore(os.path.join(path,'imageData.h5')) for file in fileList: data = {'X Position' : pd.Series(xpos, index=index1), 'Y Position' : pd.Series(ypos, index=index1), 'Major Axis Length' : pd.Series(major, index=index1), 'Minor Axis Length' : pd.Series(minor, index=index1), 'X Velocity' : pd.Series(xVelocity, index=index1), 'Y Velocity' : pd.Series(yVelocity, index=index1) }

Pandas as fast data storage for Flask application

阅读更多关于 Pandas as fast data storage for Flask application

问题 I'm impressed by the speed of running transformations, loading data and ease of use of Pandas and want to leverage all these nice properties (amongst others) to model some large-ish data sets (~100-200k rows, <20 columns). The aim is to work with the data on some computing nodes, but also to provide a view of the data sets in a browser via Flask . I'm currently using a Postgres database to store the data, but the import (coming from csv files) of the data is slow, tedious and error prone and

I want to convert very large csv data to hdf5 in python

阅读更多关于 I want to convert very large csv data to hdf5 in python

问题 I have a very large csv data. It looks like this. [Date, Firm name, value 1, value 2, ..., value 60] I want to convert this to a hdf5 file. For example, let's say I have two dates (2019-07-01, 2019-07-02), each date has 3 firms (firm 1, firm 2, firm 3) and each firm has [value 1, value 2, ... value 60]. I want to use date and firm name as a group. Specifically, I want this hierarchy: 'Date/Firm name'. For example, 2019-07-01 has firm 1, firm 2, and firm 3. When you look at each firm, there

PyTables create_array fails to save numpy array

阅读更多关于 PyTables create_array fails to save numpy array

问题 Why does the snipped below give: "TypeError: Array objects cannot currently deal with void, unicode or object arrays" ? Python 3.8.2, tables 3.6.1, numpy 1.19.1 import numpy as np import tables as tb TYPE = np.dtype([ ('d', 'f4') ]) with tb.open_file(r'c:\temp\file.h5', mode="a") as h5file: h5file.create_group(h5file.root, 'grp') arr = np.array([(1.1)], dtype=TYPE) h5file.create_array('/grp', str('arr'), arr) 回答1: File.create_array() is for homogeneous dtypes (all ints, or all floats, etc).

using H5T_ARRAY in Python

阅读更多关于 using H5T_ARRAY in Python

问题 I am trying to use H5T_ARRAY inside the H5T_COMPOUND structure using Python. Basically, I am writing hdf5 file and if you open it using H5Dump, the structure looks like this. HDF5 "SO_64449277np.h5" { GROUP "/" { DATASET "Table3" { DATATYPE H5T_COMPOUND { H5T_COMPOUND { H5T_STD_I16LE "id"; H5T_STD_I16LE "timestamp"; } "header"; H5T_COMPOUND { H5T_IEEE_F32LE "latency"; H5T_STD_I16LE "segments_k"; H5T_COMPOUND { H5T_STD_I16LE "segment_id"; H5T_IEEE_F32LE "segment_quality"; H5T_IEEE_F32LE

pytables add repetitive subclass as column

阅读更多关于 pytables add repetitive subclass as column

问题 I am creating a HDF5 file with strict parameters. It has 1 table consisting of variable columns. At one point the columns become repetitive with the different data being appended. Apparently, I can't add loop inside IsDescription class. Currently the class Segments has been added under class Summary_data twice. I need to call segments_k 70 times. What is the best approach to it? Thank you. class Header(IsDescription): _v_pos = 1 id = Int16Col(dflt=1, pos = 0) timestamp = Int16Col(dflt=1, pos

pytables writes much faster than h5py. Why?

阅读更多关于 pytables writes much faster than h5py. Why?

来源： https://stackoverflow.com/questions/57953554/pytables-writes-much-faster-than-h5py-why