How to change the dtype of certain columns of a numpy recarray?

前端 未结 3 1917
鱼传尺愫
鱼传尺愫 2020-12-06 01:35

Suppose I have a recarray such as the following:

import numpy as np

# example data from @unutbu\'s answer
recs = [(\'Bill\', \'31\', 260.0), (\'Fred\', 15,          


        
相关标签:
3条回答
  • 2020-12-06 02:14

    Here is an example using astype to perform the conversion:

    import numpy as np
    recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
    r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
    print(r)
    # [('Bill', 31, 260.0) ('Fred', 15, 145.0)]
    

    The age is of dtype <i2:

    print(r.dtype)
    # [('name', '|S30'), ('age', '<i2'), ('weight', '<f4')]
    

    We can change that to <f4 using astype:

    r = r.astype([('name', '|S30'), ('age', '<f4'), ('weight', '<f4')])
    print(r)
    # [('Bill', 31.0, 260.0) ('Fred', 15.0, 145.0)]
    
    0 讨论(0)
  • 2020-12-06 02:21

    There are basically two steps. My stumbling block was in finding how to modify an existing dtype. This is how I did it:

    # change dtype by making a whole new array
    dt = data.dtype
    dt = dt.descr # this is now a modifiable list, can't modify numpy.dtype
    # change the type of the first col:
    dt[0] = (dt[0][0], 'float64')
    dt = numpy.dtype(dt)
    # data = numpy.array(data, dtype=dt) # option 1
    data = data.astype(dt)
    
    0 讨论(0)
  • 2020-12-06 02:21

    Here is a minor refinement of the existing answers, plus an extension to situations where you want to make a change based on the dtype rather than column name (e.g. change all floats to integers).

    First, you can improve the conciseness and readability by using a listcomp:

    col       = 'age'
    new_dtype = 'float64'
    
    r.astype( [ (col, new_dtype) if d[0] == col else d for d in r.dtype.descr ] )
    
    # rec.array([(b'Bill', 31.0, 260.0), (b'Fred', 15.0, 145.0)], 
    #           dtype=[('name', 'S30'), ('age', '<f8'), ('weight', '<f4')])
    

    Second, you can extend this syntax to handle cases where you want to change all floats to integers (or vice versa). For example, if you wanted to change any 32 or 64 bit float into a 64 bit integer, you could do something like:

    old_dtype = ['<f4', '<f8']
    new_dtype = 'int64'
    
    r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ] )
    
    # rec.array([(b'Bill', 31, 260), (b'Fred', 15, 145)], 
    #           dtype=[('name', 'S30'), ('age', '<i2'), ('weight', '<i8')])
    

    Note that astype has an optional casting argument that defaults to unsafe so you may want to specify casting='safe' to avoid accidentally losing precision when casting floats to integers:

    r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ],
              casting='safe' )
    

    Refer to the numpy documentation on astype for more on casting and other options.

    Also note that for general cases of changing floats to integers or vice versa you might prefer to check the general number type with np.issubdtype rather than checking against multiple specific dtypes.

    0 讨论(0)
提交回复
热议问题