问题
x y
2 4
5 8
1 4
9 12
I have four conditions
- maxx = 3, minx = 1, maxy = 6, miny = 3. (If minx < x < maxx and miny < y < maxy, then z = apple)
- maxx = 6, minx = 4, maxy = 9, miny = 7. (If minx < x < maxx and miny < y < maxy, then z = ball)
- maxx = 2, minx = 0, maxy = 5, miny = 3. (If minx < x < maxx and miny < y < maxy, then z = pine)
- maxx = 12, minx = 7, maxy = 15, miny = 11. (If minx < x < maxx and miny < y < maxy, then z = orange)
Expected outcome:
x y z
2 4 apple
5 8 ball
1 4 pine
9 12 orange
I have thousands of rows, and these four conditions that will fit all values.
How can I do this using the mutate function? I know how to manipulate numbers directly, but not sure how I can store a character based on conditional statements.
回答1:
Alternative answer:
library(mosaic)
df <- mutate(df, fruit = derivedFactor(
"apple" = (x<3 & x>1 & y<6 & y>3),
"ball" = (x<6 & x>4 & y<9 & y>7),
"pine" = (x<2 & x>0 & y<5 & y>3),
"orange" = (x<12 & x>7 & y<15 & y>11),
method ="first",
.default = NA
))
回答2:
I believe the best option here is to use dplyr::case_when
df %>% mutate(z = case_when(
x < 3 & x > 1 & y < 6 & y > 3 ~ "apple" ,
x < 6 & x > 4 & y < 9 & y > 7 ~ "ball" ,
x < 2 & x > 0 & y < 5 & y > 3 ~ "pine" ,
x < 12 & x > 7 & y < 15 & y > 11 ~ "orange"
)
)
Which gives us:
# A tibble: 4 x 3
x y z
<dbl> <dbl> <chr>
1 2 4 apple
2 5 8 ball
3 1 4 pine
4 9 12 orange
回答3:
Using ifelse
, it's
df %>% mutate(z = ifelse(x<3 & x>1 & y<6 & y>3, 'apple',
ifelse(x<6 & x>4 & y<9 & y>7, 'ball',
ifelse(x<2 & x>0 & y<5 & y>3, 'pine',
ifelse(x<12 & x>7 & y<15 & y>11, 'orange', NA))))
)
# x y z
# 1 2 4 apple
# 2 5 8 ball
# 3 1 4 pine
# 4 9 12 orange
Notes:
- If you have cases that match two conditions (
x = 1.5, y = 4
), this will fail. dplyr
also has abetween
helper function that can reduce your conditions to two calls each, but it uses<=
and>=
, so you'd need to reconfigure your endpoints.- You could use
switch
, but all your conditions would need to be in the first term, which will end up looking exactly like theifelse
version, and your cases will have nothing to do. - If your ranges don't overlap, this is better solved with
cut
, which is easy to implement for one variable and could be overwritten by a second.
来源:https://stackoverflow.com/questions/36003699/how-can-i-create-a-new-column-based-on-conditional-statements-and-dplyr