pytables

How should python dictionaries be stored in pytables?

こ雲淡風輕ζ 提交于 2021-02-18 10:40:53
问题 pytables doesn't natively support python dictionaries. The way I've approached it is to make a data structure of the form: tables_dict = { 'key' : tables.StringCol(itemsize=40), 'value' : tables.Int32Col(), } (note that I ensure that the keys are <40 characters long) and then create a table using this structure: file_handle.createTable('/', 'dictionary', tables_dict) and then populate it with: file_handle.dictionary.append(dictionary.items()) and retrieve data with: dict(file_handle

Pytables duplicates 2.5 giga rows

别说谁变了你拦得住时间么 提交于 2021-02-11 13:41:00
问题 I currently have a .h5 file, with a table in it consisting of three columns: a text columns of 64 chars, an UInt32 column relating to the source of the text and a UInt32 column which is the xxhash of the text. The table consists of ~ 2.5e9 rows I am trying to find and count the duplicates of each text entry in the table - essentially merge them into one entry, while counting the instances. I have tried doing so by indexing on the hash column and then looping through table.itersorted(hash) ,

Python/PyTables: Is it possible to have different data types for different columns of an array?

耗尽温柔 提交于 2021-02-11 12:33:29
问题 I create an expandable earray of Nx4 columns. Some columns require float64 datatype, the others can be managed with int32. Is it possible to vary the data types among the columns? Right now I just use one (float64, below) for all, but it takes huge disk space for (>10 GB) files. For example, how can I ensure column 1-2 elements are int32 and 3-4 elements are float64 ? import tables f1 = tables.open_file("table.h5", "w") a = f1.create_earray(f1.root, "dataset_1", atom=tables.Float32Atom(),

Python 2.7: Appending Data to Table in Pandas

假如想象 提交于 2021-02-08 09:29:14
问题 I am reading data from image files and I want to append this data into a single HDF file. Here is my code: datafile = pd.HDFStore(os.path.join(path,'imageData.h5')) for file in fileList: data = {'X Position' : pd.Series(xpos, index=index1), 'Y Position' : pd.Series(ypos, index=index1), 'Major Axis Length' : pd.Series(major, index=index1), 'Minor Axis Length' : pd.Series(minor, index=index1), 'X Velocity' : pd.Series(xVelocity, index=index1), 'Y Velocity' : pd.Series(yVelocity, index=index1) }

Pandas as fast data storage for Flask application

女生的网名这么多〃 提交于 2021-02-06 09:06:35
问题 I'm impressed by the speed of running transformations, loading data and ease of use of Pandas and want to leverage all these nice properties (amongst others) to model some large-ish data sets (~100-200k rows, <20 columns). The aim is to work with the data on some computing nodes, but also to provide a view of the data sets in a browser via Flask . I'm currently using a Postgres database to store the data, but the import (coming from csv files) of the data is slow, tedious and error prone and

I want to convert very large csv data to hdf5 in python

◇◆丶佛笑我妖孽 提交于 2021-01-29 17:37:12
问题 I have a very large csv data. It looks like this. [Date, Firm name, value 1, value 2, ..., value 60] I want to convert this to a hdf5 file. For example, let's say I have two dates (2019-07-01, 2019-07-02), each date has 3 firms (firm 1, firm 2, firm 3) and each firm has [value 1, value 2, ... value 60]. I want to use date and firm name as a group. Specifically, I want this hierarchy: 'Date/Firm name'. For example, 2019-07-01 has firm 1, firm 2, and firm 3. When you look at each firm, there

PyTables create_array fails to save numpy array

假如想象 提交于 2021-01-29 17:27:03
问题 Why does the snipped below give: "TypeError: Array objects cannot currently deal with void, unicode or object arrays" ? Python 3.8.2, tables 3.6.1, numpy 1.19.1 import numpy as np import tables as tb TYPE = np.dtype([ ('d', 'f4') ]) with tb.open_file(r'c:\temp\file.h5', mode="a") as h5file: h5file.create_group(h5file.root, 'grp') arr = np.array([(1.1)], dtype=TYPE) h5file.create_array('/grp', str('arr'), arr) 回答1: File.create_array() is for homogeneous dtypes (all ints, or all floats, etc).

using H5T_ARRAY in Python

孤人 提交于 2021-01-29 08:18:47
问题 I am trying to use H5T_ARRAY inside the H5T_COMPOUND structure using Python. Basically, I am writing hdf5 file and if you open it using H5Dump, the structure looks like this. HDF5 "SO_64449277np.h5" { GROUP "/" { DATASET "Table3" { DATATYPE H5T_COMPOUND { H5T_COMPOUND { H5T_STD_I16LE "id"; H5T_STD_I16LE "timestamp"; } "header"; H5T_COMPOUND { H5T_IEEE_F32LE "latency"; H5T_STD_I16LE "segments_k"; H5T_COMPOUND { H5T_STD_I16LE "segment_id"; H5T_IEEE_F32LE "segment_quality"; H5T_IEEE_F32LE

pytables add repetitive subclass as column

早过忘川 提交于 2021-01-29 07:53:57
问题 I am creating a HDF5 file with strict parameters. It has 1 table consisting of variable columns. At one point the columns become repetitive with the different data being appended. Apparently, I can't add loop inside IsDescription class. Currently the class Segments has been added under class Summary_data twice. I need to call segments_k 70 times. What is the best approach to it? Thank you. class Header(IsDescription): _v_pos = 1 id = Int16Col(dflt=1, pos = 0) timestamp = Int16Col(dflt=1, pos