Read data into structured array with multiple dtypes

问题

I'm trying to read some data from SQL (using pyodbc) into a numpy structured array (I believe a structured array is required due to the multiple dtypes).

import pyodbc
import numpy as np
cnxn = pyodbc.connect('DRIVER={SQL Server};Server=SERVER;Database=DB;Trusted_Connection=Yes;')
cursor = cnxn.cursor()
sql_ps = "select a, b from table"
cursor.execute(sql_positions)
p_data = cursor.fetchall()
cnxn.close

ndtype = np.dtype([('f1','>f8'),('f2','|S22')])
p_data = np.asarray(p_data, dtype=ndtype)

However this returns:

TypeError: expected a readable buffer object

If I load into the array as a tuple

p_data_tuple = np.asarray([tuple(i) for i in p_data], dtype=ndtype)

It works, however p_data_tuple is an array of tuples, as opposed to a 2d array, meaning I cannot call elements using p_data_tuple[0,1]

Does anyone know how I can either put the data returned directly into a str array with multiple dtypes, or convert the array of tuples into a 2d array of multiple dtypes, or some other solution?

Thanks

回答1:

Your cursor.fetchall returns a list of records. A record is 'Row objects are similar to tuples, but they also allow access to columns by name' (http://mkleehammer.github.io/pyodbc/). Sounds like a namedtuple to me, though the class details may be different.

sql_ps = "select a, b from table"
cursor.execute(sql_positions)
p_data = cursor.fetchall()
cnxn.close

just for fun let's change the dtype to use the same field names as the sql:

ndtype = np.dtype([('a','>f8'),('b','|S22')])

This doesn't work, presumably because the tuple-like record isn't a real tuple.

p_data = np.array(p_data, dtype=ndtype)

So instead we convert each record to a tuple. Structured arrays take their data as a list of tuples.

p_data = np.array([tuple(i) for i in p_data], dtype=ndtype)

Now you can access the data by field or by row

p_data['a']    # 1d array of floats
p_data['b'][1]  # one string
p_data[10]   # one record

A record from p_data displays as a tuple, though it does actually have a dtype like the parent array.

There's a variant on structured arrays, recarray that adds the ability to access fields by attribute name, e.g. p_rec.a. That's even closer to the dp cursor record, but doesn't add much otherwise.

So this structured array is quite similar to your source sql table - with fields and rows. It's not a 2d array, but indexing by field name is similar to indexing a 2d array by column number.

pandas does something similar, though it often resorts to using dtype=object (like the pointers of Python lists). And it keeps track of 'row' labels.

来源：https://stackoverflow.com/questions/35174979/read-data-into-structured-array-with-multiple-dtypes

标签

python

arrays

numpy

pyodbc