quantile normalization on pandas dataframe

前端 未结 7 816
余生分开走
余生分开走 2020-12-14 21:44

Simply speaking, how to apply quantile normalization on a large Pandas dataframe (probably 2,000,000 rows) in Python?

PS. I know that there is a package named rpy2 w

7条回答
  •  不知归路
    2020-12-14 21:58

    The code below gives identical result as preprocessCore::normalize.quantiles.use.target and I find it simpler clearer than the solutions above. Also performance should be good up to huge array lengths.

    import numpy as np
    
    def quantile_normalize_using_target(x, target):
        """
        Both `x` and `target` are numpy arrays of equal lengths.
        """
    
        target_sorted = np.sort(target)
    
        return target_sorted[x.argsort().argsort()]
    

    Once you have a pandas.DataFrame easy to do:

    quantile_normalize_using_target(df[0].as_matrix(),
                                    df[1].as_matrix())
    

    (Normalizing the first columnt to the second one as a reference distribution in the example above.)

提交回复
热议问题