recarray

Normalize/Standardize a numpy recarray

て烟熏妆下的殇ゞ 提交于 2019-12-04 11:49:03
问题 I wonder what the best way of normalizing/standardizing a numpy recarray is. To make it clear, I'm not talking about a mathematical matrix, but a record array that also has e.g. textual columns (such as labels). a = np.genfromtxt("iris.csv", delimiter=",", dtype=None) print a.shape > (150,) As you can see, I cannot e.g. process a[:,:-1] as the shape is one-dimensional. The best I found is to iterate over all columns: for nam in a.dtype.names[:-1]: col = a[nam] a[nam] = (col - col.min()) /

Numpy Mean Structured Array

£可爱£侵袭症+ 提交于 2019-12-04 03:36:22
问题 Suppose that I have a structured array of students (strings) and test scores (ints), where each entry is the score that a specific student received on a specific test. Each student has multiple entries in this array, naturally. Example import numpy grades = numpy.array([('Mary', 96), ('John', 94), ('Mary', 88), ('Edgar', 89), ('John', 84)], dtype=[('student', 'a50'), ('score', 'i')]) print grades #[('Mary', 96) ('John', 94) ('Mary', 88) ('Edgar', 89) ('John', 84)] How do I easily compute the

Normalize/Standardize a numpy recarray

…衆ロ難τιáo~ 提交于 2019-12-03 07:29:47
I wonder what the best way of normalizing/standardizing a numpy recarray is. To make it clear, I'm not talking about a mathematical matrix, but a record array that also has e.g. textual columns (such as labels). a = np.genfromtxt("iris.csv", delimiter=",", dtype=None) print a.shape > (150,) As you can see, I cannot e.g. process a[:,:-1] as the shape is one-dimensional. The best I found is to iterate over all columns: for nam in a.dtype.names[:-1]: col = a[nam] a[nam] = (col - col.min()) / (col.max() - col.min()) Any more elegant way of doing this? Is there some method such as "normalize" or

is ndarray faster than recarray access?

荒凉一梦 提交于 2019-12-02 18:52:22
问题 I was able to copy my recarray data to a ndarray, do some calculations and return the ndarray with updated values. Then, I discovered the append_fields() capability in numpy.lib.recfunctions , and thought it would be a lot smarter to simply append 2 fields to my original recarray to hold my calculated values. When I did this, I found the operation was much, much slower. I didn't have to time it, the ndarray based process takes a few seconds compared to a minute+ with recarray and my test

is ndarray faster than recarray access?

不问归期 提交于 2019-12-02 08:33:38
I was able to copy my recarray data to a ndarray, do some calculations and return the ndarray with updated values. Then, I discovered the append_fields() capability in numpy.lib.recfunctions , and thought it would be a lot smarter to simply append 2 fields to my original recarray to hold my calculated values. When I did this, I found the operation was much, much slower. I didn't have to time it, the ndarray based process takes a few seconds compared to a minute+ with recarray and my test arrays are small, <10,000 rows. Is this typical? ndarray access is much faster than recarray? I expected

numpy recarray append_fields: can't append numpy array of datetimes

左心房为你撑大大i 提交于 2019-12-01 21:28:44
I have a recarray containing various fields and I want to append an array of datetime objects on to it. However, it seems like the append_fields function in numpy.lib.recfunctions won't let me add an array of objects. Here's some example code: import numpy as np import datetime import numpy.lib.recfunctions as recfun dtype= np.dtype([('WIND_WAVE_HGHT', '<f4'), ('WIND_WAVE_PERD', '<f4')]) obs = np.array([(0.1,10.0),(0.2,11.0),(0.3,12.0)], dtype=dtype) dates = np.array([datetime.datetime(2001,1,1,0), datetime.datetime(2001,1,1,0), datetime.datetime(2001,1,1,0)]) # This doesn't work: recfun

Passing a structured numpy array with strings to a cython function

时光怂恿深爱的人放手 提交于 2019-11-30 16:53:42
I am attempting to create a function in cython that accepts a numpy structured array or record array by defining a cython struct type. Suppose I have the data: a = np.recarray(3, dtype=[('a', np.float32), ('b', np.int32), ('c', '|S5'), ('d', '|S3')]) a[0] = (1.1, 1, 'this\0', 'to\0') a[1] = (2.1, 2, 'that\0', 'ta\0') a[2] = (3.1, 3, 'dogs\0', 'ot\0') (Note: the problem described below occurs with or without the null terminator) I then have the cython code: import numpy as np cimport numpy as np cdef packed struct tstruct: np.float32_t a np.int32_t b char[5] c char[3] d def test_struct(tstruct[

Passing a structured numpy array with strings to a cython function

对着背影说爱祢 提交于 2019-11-30 00:11:40
问题 I am attempting to create a function in cython that accepts a numpy structured array or record array by defining a cython struct type. Suppose I have the data: a = np.recarray(3, dtype=[('a', np.float32), ('b', np.int32), ('c', '|S5'), ('d', '|S3')]) a[0] = (1.1, 1, 'this\0', 'to\0') a[1] = (2.1, 2, 'that\0', 'ta\0') a[2] = (3.1, 3, 'dogs\0', 'ot\0') (Note: the problem described below occurs with or without the null terminator) I then have the cython code: import numpy as np cimport numpy as

numpy recarray strings of variable length

ε祈祈猫儿з 提交于 2019-11-28 23:22:16
Is it possible to initialise a numpy recarray that will hold strings, without knowing the length of the strings beforehand? As a (contrived) example: mydf = np.empty( (numrows,), dtype=[ ('file_name','STRING'), ('file_size_MB',float) ] ) The problem is that I'm constructing my recarray in advance of populating it with information, and I don't necessarily know the maximum length of file_name in advance. All my attempts result in the string field being truncated: >>> mydf = np.empty( (2,), dtype=[('file_name',str),('file_size_mb',float)] ) >>> mydf['file_name'][0]='foobarasdf.tif' >>> mydf['file

Subclassing numpy ndarray problem

萝らか妹 提交于 2019-11-27 23:06:55
I would like to subclass numpy ndarray. However, I cannot change the array. Why self = ... does not change the array? Thanks. import numpy as np class Data(np.ndarray): def __new__(cls, inputarr): obj = np.asarray(inputarr).view(cls) return obj def remove_some(self, t): test_cols, test_vals = zip(*t) test_cols = self[list(test_cols)] test_vals = np.array(test_vals, test_cols.dtype) self = self[test_cols != test_vals] # Is this part correct? print len(self) # correct result z = np.array([(1,2,3), (4,5,6), (7,8,9)], dtype=[('a', int), ('b', int), ('c', int)]) d = Data(z) d.remove_some([('a',4)])