问题
set.seed(8)
df <- data.frame(n = rnorm(5,1), m = rnorm(5,0), l = factor(LETTERS[1:5]))
Have can I make a new column in df
conditioned on values or combination of values of n, m and l.
For instance make a vector level
and assign it low
, medium
and high
based on values of both n
and m
(pseudo-code):
df$level <- ifelse(df$n < 1 & df$m < 1, "low", ifelse(df$n > 1 & df$m > 1, "high", "medium")
This should give:
df$level
#low medium low low medium
Or if I would like to assign a value to level
based on the l
column and a value in n
(again, pseudo-code):
df$level <- ifelse(df$n < 1 & df$l == c("A", "B"), "low A/B", "high").
In this case one should get:
df$level
#"low A/B" "high" "high" "high" "high"
回答1:
You could also do:
c("high", "medium", "low")[rowSums(df[,-3] <1)+1]
#[1] "low" "medium" "low" "low" "medium"
c("high", "low A/B")[(df$n <1 &grepl("A|B", df$l)) +1]
#[1] "low A/B" "high" "high" "high" "high"
Explanation
df[,-3]
gets the subset of numeric columns i.e.n
andm
df[,-3] <1
gives a logical index ofTRUE
,FALSE
if the element is<1
or not.By doing
rowSums
on the above, it gives three possible values - 0, 1, 2 based on whether the corresponding values in each row are both >1, one value <1, and both <1.rowSums(df[,-3] <1) #in this example, there are no values equal to 0 #[1] 2 1 2 2 1
+1
to the above will give usrowSums(df[,-3] <1) +1 #[1] 3 2 3 3 2
Using the above as numeric index, we can do:
c("high", "medium", "low")[rowSums(df[,-3] <1)+1] #[1] "low" "medium" "low" "low" "medium"
low
will occupy the places of numeric value3
,medium
on2
and if there was 1,high
should occupy that.
回答2:
Here's a solution:
df$level1 <- c("low", "medium", "high")[rowMeans(sign(df[c("n", "m")] - 1)) + 2]
df$level2 <- c("high", "low A/B")[(df$n < 1 & df$l %in% c("A", "B")) + 1]
# n m l level1 level2
# 1 0.9154139 -0.1078814 A low low A/B
# 2 1.8404001 -0.1702891 B medium high
# 3 0.5365172 -1.0883317 C low high
# 4 0.4491650 -3.0110517 D low high
# 5 1.7360404 -0.5931743 E medium high
回答3:
I'm probably missing the question, but when I add a missing closing parenthesis, it seems to work just fine:
> df$level <- ifelse(df$n < 1 & df$m < 1, "low", ifelse(df$n > 1 & df$m > 1, "high", "medium"))
> df
n m l level
1 0.9154139 -0.1078814 A low
2 1.8404001 -0.1702891 B medium
3 0.5365172 -1.0883317 C low
4 0.4491650 -3.0110517 D low
5 1.7360404 -0.5931743 E medium
> df$level
[1] "low" "medium" "low" "low" "medium"
回答4:
More of an extended comment than an answer, and perhaps not exactly what you're looking for.
Usually, when I need to capture groups of continuous variables and convert them to a single categorical variable, I use clustering and title the clusters according to the values presented. Here's an example using kmeans:
set.seed(8)
df <- data.frame(n = rnorm(5000,1), m = rnorm(5000,0), l = factor(LETTERS[1:5]))
df$Category <- kmeans(df[1:2],7)$cluster
kmeans(df[1:2],7)
K-means clustering with 7 clusters of sizes 593, 606, 649, 626, 641, 1219, 666
Cluster means:
n m
1 -0.2097451 0.84837728 # Low-High
2 1.0977826 1.44383531 # Mid-Upper
3 2.1682482 -0.70983193 # High-Low
4 -0.3389432 -0.54514302 # Low-Low
5 2.3332772 0.67415808 # High-Mid
6 0.9816709 -0.01549909 # Upper-Mid
7 0.8859904 -1.46126667 # Mid-Low
df$Category <- factor(df$Category, c("Low-High","Mid-Upper","High-Low","Low-Low",...))
You would have to look at the mean results of the clusters on your own computer (with seed) to be able to label them appropriately. This will also provide you with groupings based on your data rather than an arbitrary threshold that you believe is correct for your data.
来源:https://stackoverflow.com/questions/25287285/r-assign-a-value-factor-in-a-data-frame-to-column-conditioned-on-values-of-o