in Pandas, when using read_csv(), how to assign a NaN to a value that's not the dtype intended?

前端未结

关注

 2  1264

抹茶落季 2020-12-15 11:56

Note: Please excuse my very low skilled English, feel free to modify the question\'s title, or the following text to be more understandable

I have t

2条回答

借酒劲吻你 (楼主)

2020-12-15 12:10
A great answer, wordsmith ! Just to add a couple of minor things:
- there is a typo in the answer, data.test_column should probably be moto.test_column
- convert_objects is now deprecated, in favor of type-specific methods on columns, one-at-a-time [why?]
A full working example, including the dropping of the lines containing read errors (not column count errors, covered by read_csv(..., error_bad_lines=False) would be:
```
moto = pd.read_csv('reporte.csv')
moto.test_column = pd.to_numeric(moto.test_column, errors='coerce')
moto.dropna(axis='index',how='any',inplace=True)
```
I would also like to offer an alternative:
```
from pandas import read_csv
import numpy as np

# if the data is not a valid "number", return a NaN
# note that it must be a float, as NaN is a float:  print(type(np.nan))
def valid_float(y):
  try:
    return float(y)
  except ValueError:
    return np.nan

# assuming the first row of the file contains the column names 'A','B','C'...
data = read_csv('test.csv',header=0,usecols=['A','B','D'],
   converters={'A': valid_float, 'B': valid_float, 'D': valid_float} )

# delete all rows ('index') with an invalid numerical entry
data.dropna(axis='index',how='any',inplace=True)
```
This is fairly compact and readable at the same time. For a true one-liner, it would be great to (1) re-write the validation function as lambda code, and (2) do the dropping of defective rows directly in the call to read_csv, but I could not figure out how to do either of these.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...