quantile normalization on pandas dataframe

前端 未结 7 803
余生分开走
余生分开走 2020-12-14 21:44

Simply speaking, how to apply quantile normalization on a large Pandas dataframe (probably 2,000,000 rows) in Python?

PS. I know that there is a package named rpy2 w

7条回答
  •  抹茶落季
    2020-12-14 22:18

    As pointed out by @msg, none of the solutions here take ties into account. I made a python package called qnorm which handles ties, and correctly recreates the Wikipedia quantile normalization example:

    import pandas as pd
    import qnorm
    
    df = pd.DataFrame({'C1': {'A': 5, 'B': 2, 'C': 3, 'D': 4},
                       'C2': {'A': 4, 'B': 1, 'C': 4, 'D': 2},
                       'C3': {'A': 3, 'B': 4, 'C': 6, 'D': 8}})
    
    print(qnorm.quantile_normalize(df))
             C1        C2        C3
    A  5.666667  5.166667  2.000000
    B  2.000000  2.000000  3.000000
    C  3.000000  5.166667  4.666667
    D  4.666667  3.000000  5.666667
    

    Installation can be done with either pip or conda

    pip install qnorm
    

    or

    conda config --add channels conda-forge
    conda install qnorm
    

提交回复
热议问题