How can I create a new column based on conditional statements and dplyr?

女生的网名这么多〃 提交于 2021-01-29 02:54:38

问题


x y
2 4
5 8
1 4
9 12

I have four conditions

  • maxx = 3, minx = 1, maxy = 6, miny = 3. (If minx < x < maxx and miny < y < maxy, then z = apple)
  • maxx = 6, minx = 4, maxy = 9, miny = 7. (If minx < x < maxx and miny < y < maxy, then z = ball)
  • maxx = 2, minx = 0, maxy = 5, miny = 3. (If minx < x < maxx and miny < y < maxy, then z = pine)
  • maxx = 12, minx = 7, maxy = 15, miny = 11. (If minx < x < maxx and miny < y < maxy, then z = orange)

Expected outcome:

x y z
2 4 apple
5 8 ball
1 4 pine 
9 12 orange

I have thousands of rows, and these four conditions that will fit all values.

How can I do this using the mutate function? I know how to manipulate numbers directly, but not sure how I can store a character based on conditional statements.


回答1:


Alternative answer:

library(mosaic)
df <- mutate(df, fruit = derivedFactor(
  "apple" = (x<3 & x>1 & y<6 & y>3),
  "ball" = (x<6 & x>4 & y<9 & y>7),
  "pine" = (x<2 & x>0 & y<5 & y>3),
  "orange" = (x<12 & x>7 & y<15 & y>11),
   method ="first",
  .default = NA
))



回答2:


I believe the best option here is to use dplyr::case_when

df %>% mutate(z = case_when(
    x < 3  & x > 1 & y < 6  & y > 3  ~ "apple" ,
    x < 6  & x > 4 & y < 9  & y > 7  ~ "ball"  ,
    x < 2  & x > 0 & y < 5  & y > 3  ~ "pine"  ,
    x < 12 & x > 7 & y < 15 & y > 11 ~ "orange"
  )
)

Which gives us:

# A tibble: 4 x 3
      x     y z     
  <dbl> <dbl> <chr> 
1     2     4 apple 
2     5     8 ball  
3     1     4 pine  
4     9    12 orange



回答3:


Using ifelse, it's

df %>% mutate(z = ifelse(x<3 & x>1 & y<6 & y>3, 'apple', 
                         ifelse(x<6 & x>4 & y<9 & y>7, 'ball',
                                ifelse(x<2 & x>0 & y<5 & y>3, 'pine',
                                       ifelse(x<12 & x>7 & y<15 & y>11, 'orange', NA))))
)

#   x  y      z
# 1 2  4  apple
# 2 5  8   ball
# 3 1  4   pine
# 4 9 12 orange

Notes:

  • If you have cases that match two conditions (x = 1.5, y = 4), this will fail.
  • dplyr also has a between helper function that can reduce your conditions to two calls each, but it uses <= and >=, so you'd need to reconfigure your endpoints.
  • You could use switch, but all your conditions would need to be in the first term, which will end up looking exactly like the ifelse version, and your cases will have nothing to do.
  • If your ranges don't overlap, this is better solved with cut, which is easy to implement for one variable and could be overwritten by a second.


来源:https://stackoverflow.com/questions/36003699/how-can-i-create-a-new-column-based-on-conditional-statements-and-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!