R - Assign a value/factor in a data.frame to column conditioned on value(s) of other columns

折月煮酒 提交于 2020-01-23 16:20:27

问题


set.seed(8)
df <- data.frame(n = rnorm(5,1), m = rnorm(5,0), l = factor(LETTERS[1:5]))

Have can I make a new column in df conditioned on values or combination of values of n, m and l. For instance make a vector level and assign it low, medium and high based on values of both n and m (pseudo-code):

df$level <- ifelse(df$n < 1 & df$m < 1, "low", ifelse(df$n > 1 & df$m > 1, "high", "medium")

This should give:

df$level

#low medium low low medium 

Or if I would like to assign a value to level based on the l column and a value in n (again, pseudo-code):

df$level <- ifelse(df$n < 1 & df$l == c("A", "B"), "low A/B", "high").

In this case one should get:

df$level

#"low A/B" "high" "high" "high" "high"

回答1:


You could also do:

 c("high", "medium", "low")[rowSums(df[,-3] <1)+1]
#[1] "low"    "medium" "low"    "low"    "medium"

c("high", "low A/B")[(df$n <1 &grepl("A|B", df$l)) +1]
#[1] "low A/B" "high"    "high"    "high"    "high"   

Explanation

  • df[,-3] gets the subset of numeric columns i.e. n and m
  • df[,-3] <1 gives a logical index of TRUE, FALSE if the element is <1 or not.
  • By doing rowSums on the above, it gives three possible values - 0, 1, 2 based on whether the corresponding values in each row are both >1, one value <1, and both <1.

    rowSums(df[,-3] <1) #in this example, there are no values equal to 0
    #[1] 2 1 2 2 1
    
  • +1 to the above will give us

    rowSums(df[,-3] <1) +1
    #[1] 3 2 3 3 2
    
  • Using the above as numeric index, we can do:

      c("high", "medium", "low")[rowSums(df[,-3] <1)+1]
      #[1] "low"    "medium" "low"    "low"    "medium"
    
  • low will occupy the places of numeric value 3, medium on 2 and if there was 1, high should occupy that.




回答2:


Here's a solution:

df$level1 <- c("low", "medium", "high")[rowMeans(sign(df[c("n", "m")] - 1)) + 2]

df$level2 <- c("high", "low A/B")[(df$n < 1 & df$l %in% c("A", "B")) + 1]

#           n          m l level1  level2
# 1 0.9154139 -0.1078814 A    low low A/B
# 2 1.8404001 -0.1702891 B medium    high
# 3 0.5365172 -1.0883317 C    low    high
# 4 0.4491650 -3.0110517 D    low    high
# 5 1.7360404 -0.5931743 E medium    high



回答3:


I'm probably missing the question, but when I add a missing closing parenthesis, it seems to work just fine:

> df$level <- ifelse(df$n < 1 & df$m < 1, "low", ifelse(df$n > 1 & df$m > 1, "high", "medium"))
> df
          n          m l  level
1 0.9154139 -0.1078814 A    low
2 1.8404001 -0.1702891 B medium
3 0.5365172 -1.0883317 C    low
4 0.4491650 -3.0110517 D    low
5 1.7360404 -0.5931743 E medium
> df$level
[1] "low"    "medium" "low"    "low"    "medium"



回答4:


More of an extended comment than an answer, and perhaps not exactly what you're looking for.

Usually, when I need to capture groups of continuous variables and convert them to a single categorical variable, I use clustering and title the clusters according to the values presented. Here's an example using kmeans:

set.seed(8)
df <- data.frame(n = rnorm(5000,1), m = rnorm(5000,0), l = factor(LETTERS[1:5]))
df$Category <- kmeans(df[1:2],7)$cluster

kmeans(df[1:2],7)
K-means clustering with 7 clusters of sizes 593, 606, 649, 626, 641, 1219, 666

Cluster means:
           n           m
1 -0.2097451  0.84837728 # Low-High
2  1.0977826  1.44383531 # Mid-Upper
3  2.1682482 -0.70983193 # High-Low
4 -0.3389432 -0.54514302 # Low-Low
5  2.3332772  0.67415808 # High-Mid
6  0.9816709 -0.01549909 # Upper-Mid
7  0.8859904 -1.46126667 # Mid-Low

df$Category <- factor(df$Category, c("Low-High","Mid-Upper","High-Low","Low-Low",...))

You would have to look at the mean results of the clusters on your own computer (with seed) to be able to label them appropriately. This will also provide you with groupings based on your data rather than an arbitrary threshold that you believe is correct for your data.



来源:https://stackoverflow.com/questions/25287285/r-assign-a-value-factor-in-a-data-frame-to-column-conditioned-on-values-of-o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!