Prevent pandas from automatically inferring type in read_csv

后端未结

关注

 2  1764

梦谈多话 2020-12-29 08:36

I have a #-separated file with three columns: the first is integer, the second looks like a float, but isn\'t, and the third is a string. I attempt to load this directly in

2条回答

轻奢々 (楼主)

2020-12-29 08:49

I think your best bet is to read the data in as a record array first using numpy.

# what you described:
In [15]: import numpy as np
In [16]: import pandas
In [17]: x = pandas.read_csv('weird.csv')

In [19]: x.dtypes
Out[19]: 
int_field            int64
floatlike_field    float64  # what you don't want?
str_field           object

In [20]: datatypes = [('int_field','i4'),('floatlike','S10'),('strfield','S10')]

In [21]: y_np = np.loadtxt('weird.csv', dtype=datatypes, delimiter=',', skiprows=1)

In [22]: y_np
Out[22]: 
array([(1, '2.31', 'one'), (2, '3.12', 'two'), (3, '1.32', 'three ')], 
      dtype=[('int_field', '