How to convert a Numpy 2D array with object dtype to a regular 2D array of floats

后端未结

关注

 6  1535

As part of broader program I am working on, I ended up with object arrays with strings, 3D coordinates and etc all mixed. I know object arrays might not be very favorite in

相关标签:

6条回答

春和景丽

2020-12-09 10:28
Based on Jaime's toy example I think you can do this very simply using np.vstack():
```
arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
float_arr = np.vstack(arr[:, 1]).astype(np.float)
```
This will work regardless of whether the 'numeric' elements in your object array are 1D numpy arrays, lists or tuples.
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-12-09 10:32
You may want to use structured array, so that when you need to access the names and the values independently you can easily do so. In this example, there are two data points:
```
x = zeros(2, dtype=[('name','S10'), ('value','f4',(3,))])
x[0][0]='item1'
x[1][0]='item2'
y1=x['name']
y2=x['value']
```
the result:
```
>>> y1
array(['item1', 'item2'], 
      dtype='|S10')
>>> y2
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]], dtype=float32)
```
See more details: http://docs.scipy.org/doc/numpy/user/basics.rec.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-09 10:38
This works great working on your array arr to convert from an object to an array of floats. Number processing is extremely easy after. Thanks for that last post!!!! I just modified it to include any DataFrame size:
```
float_arr = np.vstack(arr[:, :]).astype(np.float)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
-上瘾入骨i

2020-12-09 10:48

This is way faster to just convert your object array to a NumPy float array: arr=np.array(arr, dtype=[('O', np.float)]).astype(np.float) - from there no looping, index it just like you'd normally do on a NumPy array. You'd have to do it in chunks though with your different datatypes arr[:, 1], arr[:,2], etc. Had the same issue with a NumPy tuple object returned from a C++ DLL function - conversion for 17M elements takes <2s.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2020-12-09 10:49

This problem usually happens when you have a dataset with different types, usually, dates in the first column or so.

What I use to do, is to store the date column in a different variable; and take the rest of the "X matrix of features" into X. So I have dates and X, for instance.

Then I apply the conversion to the X matrix as:

X = np.array(list(X[:,:]), dtype=np.float)

Hope to help!

0 讨论(0)
发布评论:

提交评论
- 加载中...

广开言路

2020-12-09 10:51

Nasty little problem... I have been fooling around with this toy example:

>>> arr = np.array([['one', [1, 2, 3]],['two', [4, 5, 6]]], dtype=np.object)
>>> arr
array([['one', [1, 2, 3]],
       ['two', [4, 5, 6]]], dtype=object)

My first guess was:

>>> np.array(arr[:, 1])
array([[1, 2, 3], [4, 5, 6]], dtype=object)

But that keeps the object dtype, so perhaps then:

>>> np.array(arr[:, 1], dtype=np.float)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: setting an array element with a sequence.

You can normally work around this doing the following:

>>> np.array(arr[:, 1], dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: expected a readable buffer object

Not here though, which was kind of puzzling. Apparently it is the fact that the objects in your array are lists that throws this off, as replacing the lists with tuples works:

>>> np.array([tuple(j) for j in arr[:, 1]],
...          dtype=[('', np.float)]*3).view(np.float).reshape(-1, 3)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Since there doesn't seem to be any entirely satisfactory solution, the easiest is probably to go with:

>>> np.array(list(arr[:, 1]), dtype=np.float)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Although that will not be very efficient, probably better to go with something like:

>>> np.fromiter((tuple(j) for j in arr[:, 1]), dtype=[('', np.float)]*3,
...             count=len(arr)).view(np.float).reshape(-1, 3)
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

0 讨论(0)