Inconsistent pandas read_csv dtype inference on mostly-integer string column in huge TSV file

前端未结

关注

 2  698

悲哀的现实 2020-12-19 10:03

I have a tab separated file with a column that should be interpreted as a string, but many of the entries are integers. With small files read_csv correctly interprets the c

2条回答

青春惊慌失措 (楼主)

2020-12-19 10:27
To avoid having Pandas infer your data type, provide a converters argument to read_csv:

converters : dict. optional

Dict of functions for converting values in certain columns. Keys can either be integers or column labels

For your file this would look like:
```
df2 = pd.read_csv('test', sep='\t', converters={'a':str})
```
My reading of the docs is that you do not need to specify converters for every column. Pandas should continue to infer the datatype of unspecified columns.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...