how to zscore normalize pandas column with nans?

后端 未结 4 701
不思量自难忘°
不思量自难忘° 2020-12-06 00:53

I have a pandas dataframe with a column of real values that I want to zscore normalize:

>> a
array([    nan,  0.0767,  0.4383,  0.7866,  0.8091,  0.195         


        
4条回答
  •  北荒
    北荒 (楼主)
    2020-12-06 01:01

    Another alternative solution to this problem is to fill the NaNs in a DataFrame with the column means when calculating the z-score. This will result in the NaNs being calculated as having a z-score of 0, which can then be masked out using notna on the original df.

    You can create a DataFrame of the same dimensions as the original df, containing the z-scores of the original df's values and NaNs in the same places in one line with:

    zscore_df = pd.DataFrame(scipy.stats.zscore(df.fillna(df.mean())), index=df.index, columns=df.columns).where(df.notna())
    

提交回复
热议问题