Using multiple columns in dplyr window functions?

我的未来我决定 提交于 2020-01-02 05:37:05

问题


Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible?

# R
tbl %>% mutate(n = dense_rank(Name, Email))

-- SQL
SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl

Also is there an equivilant for PARTITION BY?


回答1:


I did struggle with this problem and here is my solution:

In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from left to right using paste().

Below is the code sample:

tbl %>%
  mutate(n = dense_rank(paste(Name, Email))) %>%
  arrange(Name, Email) %>%
  view()

Moreover, I guess group_by is the equivalent for PARTITION BY in SQL.

The shortfall for this solution is that you can only order by 2 (or more) variables which have the same direction. In the case that you need to order by multiple columns which have different direction, saying that 1 asc and 1 desc, I suggest you to try this: Calculate rank with ties based on more than one variable



来源:https://stackoverflow.com/questions/48337259/using-multiple-columns-in-dplyr-window-functions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!