问题
nd2values[:,[1]]=nd2values[:,[1]].astype(int)
nd2values
outputs
array([['021fd159b55773fba8157e2090fe0fe2', '1',
'881f83d2dee3f18c7d1751659406144e',
'012059d397c0b7e5a30a5bb89c0b075e', 'A'],
['021fd159b55773fba8157e2090fe0fe2', '1',
'cec898a1d355dbfbad8c760615fde1af',
'012059d397c0b7e5a30a5bb89c0b075e', 'A'],
['021fd159b55773fba8157e2090fe0fe2', '1',
'a99f44bbff39e352191a870e17f04537',
'012059d397c0b7e5a30a5bb89c0b075e', 'A'],
...,
['fdeb2950c4d5209d449ebd2d6afac11e', '4',
'4f4e47023263931e1445dc97f7dae941',
'3cd0b15957ceb80f5125bef8bd1bbea7', 'A'],
['fdeb2950c4d5209d449ebd2d6afac11e', '4',
'021dabc5d7a1404ec8ad34fe8ca4b5e3',
'3cd0b15957ceb80f5125bef8bd1bbea7', 'A'],
['fdeb2950c4d5209d449ebd2d6afac11e', '4',
'f79a2b5e6190ac3c534645e806f1b611',
'3cd0b15957ceb80f5125bef8bd1bbea7', 'A']], dtype='<U32')
The data type of the second column is still str. Is it because this particular numpy array has dtype restriction? How would you change the second column to int? Thanks.
np.array(nd2values,dtype=[str,int,str,str,str])
gives
TypeError: data type not understood
回答1:
The assignement is casting your ints to the type of the array. To be able to hold all kind of objects in an array set the dtype to object.
nd2values = nd2values.astype(object)
then
nd2values[:,[1]]=nd2values[:,[1]].astype(int)
回答2:
A structured array alternative:
A copy-n-paste from the question gives me a (6,5) array with U32 dtype:
In [96]: arr.shape
Out[96]: (6, 5)
define a compound dtype:
In [99]: dt = np.dtype([('f0','U32'),('f1',int),('f2','U32'),('f3','U32'),('f4','U1')])
Input to a structured array should be a list of tuples:
In [100]: arrS = np.array([tuple(x) for x in arr], dt)
In [101]: arrS
Out[101]:
array([('021fd159b55773fba8157e2090fe0fe2', 1, '881f83d2dee3f18c7d1751659406144e', '012059d397c0b7e5a30a5bb89c0b075e', 'A'),
('021fd159b55773fba8157e2090fe0fe2', 1, 'cec898a1d355dbfbad8c760615fde1af', '012059d397c0b7e5a30a5bb89c0b075e', 'A'),
('021fd159b55773fba8157e2090fe0fe2', 1, 'a99f44bbff39e352191a870e17f04537', '012059d397c0b7e5a30a5bb89c0b075e', 'A'),
('fdeb2950c4d5209d449ebd2d6afac11e', 4, '4f4e47023263931e1445dc97f7dae941', '3cd0b15957ceb80f5125bef8bd1bbea7', 'A'),
('fdeb2950c4d5209d449ebd2d6afac11e', 4, '021dabc5d7a1404ec8ad34fe8ca4b5e3', '3cd0b15957ceb80f5125bef8bd1bbea7', 'A'),
('fdeb2950c4d5209d449ebd2d6afac11e', 4, 'f79a2b5e6190ac3c534645e806f1b611', '3cd0b15957ceb80f5125bef8bd1bbea7', 'A')],
dtype=[('f0', '<U32'), ('f1', '<i8'), ('f2', '<U32'), ('f3', '<U32'), ('f4', '<U1')])
One field can be accessed by name:
In [102]: arrS['f1']
Out[102]: array([1, 1, 1, 4, 4, 4])
来源:https://stackoverflow.com/questions/51291797/why-is-it-that-the-numpy-array-column-data-type-does-not-get-updated