Simply speaking, how to apply quantile normalization on a large Pandas dataframe (probably 2,000,000 rows) in Python?
PS. I know that there is a package named rpy2 w
One thing worth noticing is that both ayhan and shawn's code use the smaller rank mean for ties, but if you use R package processcore's normalize.quantiles() , it would use the mean of rank means for ties.
Using the above example:
> df
C1 C2 C3
A 5 4 3
B 2 1 4
C 3 4 6
D 4 2 8
> normalize.quantiles(as.matrix(df))
C1 C2 C3
A 5.666667 5.166667 2.000000
B 2.000000 2.000000 3.000000
C 3.000000 5.166667 4.666667
D 4.666667 3.000000 5.666667