发表新帖

发表新帖

quantile normalization on pandas dataframe

前端未结

关注

 7  816

余生分开走 2020-12-14 21:44

Simply speaking, how to apply quantile normalization on a large Pandas dataframe (probably 2,000,000 rows) in Python?

PS. I know that there is a package named rpy2 w

7条回答

不知归路 (楼主)

2020-12-14 21:58
The code below gives identical result as preprocessCore::normalize.quantiles.use.target and I find it simpler clearer than the solutions above. Also performance should be good up to huge array lengths.
```
import numpy as np

def quantile_normalize_using_target(x, target):
    """
    Both `x` and `target` are numpy arrays of equal lengths.
    """

    target_sorted = np.sort(target)

    return target_sorted[x.argsort().argsort()]
```
Once you have a pandas.DataFrame easy to do:
```
quantile_normalize_using_target(df[0].as_matrix(),
                                df[1].as_matrix())
```
(Normalizing the first columnt to the second one as a reference distribution in the example above.)
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...

热议问题