quantile normalization on pandas dataframe

前端 未结 7 809
余生分开走
余生分开走 2020-12-14 21:44

Simply speaking, how to apply quantile normalization on a large Pandas dataframe (probably 2,000,000 rows) in Python?

PS. I know that there is a package named rpy2 w

7条回答
  •  再見小時候
    2020-12-14 22:02

    One thing worth noticing is that both ayhan and shawn's code use the smaller rank mean for ties, but if you use R package processcore's normalize.quantiles() , it would use the mean of rank means for ties.

    Using the above example:

    > df
    
       C1  C2  C3
    A   5   4   3
    B   2   1   4
    C   3   4   6
    D   4   2   8
    
    > normalize.quantiles(as.matrix(df))
    
             C1        C2        C3
    A  5.666667  5.166667  2.000000
    B  2.000000  2.000000  3.000000
    C  3.000000  5.166667  4.666667
    D  4.666667  3.000000  5.666667
    

提交回复
热议问题