Simply speaking, how to apply quantile normalization on a large Pandas dataframe (probably 2,000,000 rows) in Python?
PS. I know that there is a package named rpy2 w
As pointed out by @msg, none of the solutions here take ties into account. I made a python package called qnorm which handles ties, and correctly recreates the Wikipedia quantile normalization example:
import pandas as pd
import qnorm
df = pd.DataFrame({'C1': {'A': 5, 'B': 2, 'C': 3, 'D': 4},
'C2': {'A': 4, 'B': 1, 'C': 4, 'D': 2},
'C3': {'A': 3, 'B': 4, 'C': 6, 'D': 8}})
print(qnorm.quantile_normalize(df))
C1 C2 C3
A 5.666667 5.166667 2.000000
B 2.000000 2.000000 3.000000
C 3.000000 5.166667 4.666667
D 4.666667 3.000000 5.666667
Installation can be done with either pip or conda
pip install qnorm
or
conda config --add channels conda-forge
conda install qnorm