I have two different arrays, one with strings and another with ints. I want to concatenate them, into one array where each column has the original datatype. My current solution for doing this (see below) converts the entire array into dtype = string, which seems very memory inefficient.
combined_array = np.concatenate((A, B), axis = 1)
Is it possible to mutiple dtypes in combined_array
when A.dtype = string
and B.dtype = int
?
One approach might be to use a record array. The "columns" won't be like the columns of standard numpy arrays, but for most use cases, this is sufficient:
>>> a = numpy.array(['a', 'b', 'c', 'd', 'e']) >>> b = numpy.arange(5) >>> records = numpy.rec.fromarrays((a, b), names=('keys', 'data')) >>> records rec.array([('a', 0), ('b', 1), ('c', 2), ('d', 3), ('e', 4)], dtype=[('keys', '|S1'), ('data', '>> records['keys'] rec.array(['a', 'b', 'c', 'd', 'e'], dtype='|S1') >>> records['data'] array([0, 1, 2, 3, 4])
Note that you can also do something similar with a standard array by specifying the datatype of the array. This is known as a "structured array":
>>> arr = numpy.array([('a', 0), ('b', 1)], dtype=([('keys', '|S1'), ('data', 'i8')])) >>> arr array([('a', 0), ('b', 1)], dtype=[('keys', '|S1'), ('data', '
The difference is that record arrays also allow attribute access to individual data fields. Standard structured arrays do not.
>>> records.keys chararray(['a', 'b', 'c', 'd', 'e'], dtype='|S1') >>> arr.keys Traceback (most recent call last): File "", line 1, in AttributeError: 'numpy.ndarray' object has no attribute 'keys'
A simple solution: convert your data to object 'O' type
z = np.zeros((2,2), dtype='U2') o = np.ones((2,1), dtype='O') np.hstack([o, z])
creates the array:
array([[1, '', ''], [1, '', '']], dtype=object)