Normalize data in R data.frame column

流过昼夜 提交于 2019-12-23 03:48:11

问题


Suppose I have the following data:

a <- data.frame(var1=letters,var2=runif(26))

Suppose I want to scale every value in var2 such that the sum of the var2 column is equal to 1 (basically turn the var2 column into a probability distribution)

I have tried the following:

a$var2 <- lapply(a$var2,function(x) (x-min(a$var2))/(max(a$var2)-min(a$var2)))

this not only gives an overall sum greater than 1 but also turns the var2 column into a list on which I can't do operations like sum

Is there any valid way of turning this column into a probability distribution?


回答1:


Suppose you have a vector x with non-negative values and no NA, you can normalize it by

x / sum(x)

which is a proper probability mass function.

The transform you take:

(x - min(x)) / (max(x) - min(x))

only rescales x onto [0, 1], but does not ensure "summation to 1".


Regarding you code

There is no need to use lapply here:

lapply(a$var2, function(x) (x-min(a$var2)) / (max(a$var2) - min(a$var2)))

Just use vectorized operation

a$var2 <- with(a, (var2 - min(var2)) / (max(var2) - min(var2)))

As you said, lapply gives you a list, and that is what "l" in "lapply" refers to. You can use unlist to collapse that list into a vector; or, you can use sapply, where "s" implies "simplification (when possible)".



来源:https://stackoverflow.com/questions/39323277/normalize-data-in-r-data-frame-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!