python dict to numpy structured array

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

I have a dictionary that I need to convert to a NumPy structured array. I'm using the arcpy function NumPyArraytoTable, so a NumPy structured array is the only data format that will work.

Based on this thread: Writing to numpy array from dictionary and this thread: How to convert Python dictionary object to numpy array

I've tried this:

result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}  names = ['id','data'] formats = ['f8','f8'] dtype = dict(names = names, formats=formats) array=numpy.array([[key,val] for (key,val) in result.iteritems()],dtype) 

But I keep getting expected a readable buffer object

The method below works, but is stupid and obviously won't work for real data. I know there is a more graceful approach, I just can't figure it out.

totable = numpy.array([[key,val] for (key,val) in result.iteritems()]) array=numpy.array([(totable[0,0],totable[0,1]),(totable[1,0],totable[1,1])],dtype) 

回答1:

You could use np.array(list(result.items()), dtype=dtype):

import numpy as np result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}  names = ['id','data'] formats = ['f8','f8'] dtype = dict(names = names, formats=formats) array = np.array(list(result.items()), dtype=dtype)  print(repr(array)) 

yields

array([(0.0, 1.1181753789488595), (1.0, 0.5566080288678394),        (2.0, 0.4718269778030734), (3.0, 0.48716683119447185), (4.0, 1.0),        (5.0, 0.1395076201641266), (6.0, 0.20941558441558442)],        dtype=[('id', '<f8'), ('data', '<f8')]) 

If you don't want to create the intermediate list of tuples, list(result.items()), then you could instead use np.fromiter:

In Python2:

array = np.fromiter(result.iteritems(), dtype=dtype, count=len(result)) 

In Python3:

array = np.fromiter(result.items(), dtype=dtype, count=len(result)) 

Why using the list [key,val] does not work:

By the way, your attempt,

numpy.array([[key,val] for (key,val) in result.iteritems()],dtype) 

was very close to working. If you change the list [key, val] to the tuple (key, val), then it would have worked. Of course,

numpy.array([(key,val) for (key,val) in result.iteritems()], dtype) 

is the same thing as

numpy.array(result.items(), dtype) 

in Python2, or

numpy.array(list(result.items()), dtype) 

in Python3.


np.array treats lists differently than tuples: Robert Kern explains:

As a rule, tuples are considered "scalar" records and lists are recursed upon. This rule helps numpy.array() figure out which sequences are records and which are other sequences to be recursed upon; i.e. which sequences create another dimension and which are the atomic elements.

Since (0.0, 1.1181753789488595) is considered one of those atomic elements, it should be a tuple, not a list.



回答2:

I would prefer storing keys and values on separate arrays. This i often more practical. Structures of arrays are perfect replacement to array of structures. As most of the time you have to process only a subset of your data (in this cases keys or values, operation only with only one of the two arrays would be more efficient than operating with half of the two arrays together.

But in case this way is not possible, I would suggest to use arrays sorted by column instead of by row. In this way you would have the same benefit as having two arrays, but packed only in one.

import numpy as np result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}  names = 0 values = 1 array = np.empty(shape=(2, len(result)), dtype=float) array[names] = r.keys() array[values] = r.values() 

But my favorite is this (simpler):

import numpy as np result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}  arrays = {'names': np.array(k.keys(), dtype=float),           'values': np.array(k.values(), dtype=float)} 


回答3:

Let me propose an improved method when the values of the dictionnary are lists with the same lenght :

import numpy  def dctToNdarray (dd, szFormat = 'f8'):     '''     Convert a 'rectangular' dictionnary to numpy NdArray     entry          dd : dictionnary (same len of list      retrun         data : numpy NdArray      '''     names = dd.keys()     firstKey = dd.keys()[0]     formats = [szFormat]*len(names)     dtype = dict(names = names, formats=formats)     values = [tuple(dd[k][0] for k in dd.keys())]     data = numpy.array(values, dtype=dtype)     for i in range(1,len(dd[firstKey])) :         values = [tuple(dd[k][i] for k in dd.keys())]         data_tmp = numpy.array(values, dtype=dtype)         data = numpy.concatenate((data,data_tmp))     return data  dd = {'a':[1,2.05,25.48],'b':[2,1.07,9],'c':[3,3.01,6.14]} data = dctToNdarray(dd) print data.dtype.names print data 


回答4:

Even more simple if you accept using pandas :

import pandas result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442} df = pandas.DataFrame(result, index=[0]) print df 

gives :

          0         1         2         3  4         5         6 0  1.118175  0.556608  0.471827  0.487167  1  0.139508  0.209416 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!