What is the best way to account for (not a number) nan values in a pandas DataFrame?
The following code:
import numpy as np
import pandas as pd
dfd =
If you want to count only NaN values in column 'a' of a DataFrame df, use:
len(df) - df['a'].count()
Here count() tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).
To count NaN values in every column of df, use:
len(df) - df.count()
If you want to use value_counts, tell it not to drop NaN values by setting dropna=False (added in 0.14.1):
dfv = dfd['a'].value_counts(dropna=False)
This allows the missing values in the column to be counted too:
3 3
NaN 2
1 1
Name: a, dtype: int64
The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan]) suffices).