dplyr rowwise sum and other functions like max

扶醉桌前 提交于 2019-12-19 09:08:40

问题


If I wanted to sum over some variables in a data-frame using dplyr, I could do:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> select(iris, starts_with('Petal')) %>% rowSums()
  [1] 1.6 1.6 1.5 1.7 1.6 2.1 1.7 1.7 1.6 1.6 1.7 1.8 1.5 1.2 1.4 1.9 1.7 1.7 2.0 1.8 1.9 1.9 1.2 2.2 2.1 1.8 2.0 1.7 1.6 1.8 1.8 1.9 1.6 1.6 1.7 1.4
 [37] 1.5 1.5 1.5 1.7 1.6 1.6 1.5 2.2 2.3 1.7 1.8 1.6 1.7 1.6 6.1 6.0 6.4 5.3 6.1 5.8 6.3 4.3 5.9 5.3 4.5 5.7 5.0 6.1 4.9 5.8 6.0 5.1 6.0 5.0 6.6 5.3
 [73] 6.4 5.9 5.6 5.8 6.2 6.7 6.0 4.5 4.9 4.7 5.1 6.7 6.0 6.1 6.2 5.7 5.4 5.3 5.6 6.0 5.2 4.3 5.5 5.4 5.5 5.6 4.1 5.4 8.5 7.0 8.0 7.4 8.0 8.7 6.2 8.1
[109] 7.6 8.6 7.1 7.2 7.6 7.0 7.5 7.6 7.3 8.9 9.2 6.5 8.0 6.9 8.7 6.7 7.8 7.8 6.6 6.7 7.7 7.4 8.0 8.4 7.8 6.6 7.0 8.4 8.0 7.3 6.6 7.5 8.0 7.4 7.0 8.2
[145] 8.2 7.5 6.9 7.2 7.7 6.9

That's fine, but I would have thought rowwise accomplishes the same thing, but it doesn't,

> select(iris, starts_with('Petal')) %>% rowwise() %>% sum()
[1] 743.6

What I particularly want to do is select a set of columns, and create a new variable each value of which is the maximum value of each row of the selected columns. For example, if I selected the "Petal" columns, by maximum values would be 1.4, 1.4, 1.3 and so on.

I could do it like this:

> select(iris, starts_with('Petal')) %>% apply(1, max)

and that's fine. But I'm just curious as to why the rowwise approach doesn't work. I realize I am using rowwise incorrectly, I'm just not sure why it is wrong.


回答1:


In short: you are expecting the "sum" function to be aware of dplyr data structures like a data frame grouped by row. sum is not aware of it so it just takes the sum of the whole data.frame.

Here is a brief explanation. This:

select(iris, starts_with('Petal')) %>% rowwise() %>% sum()

Can be rewritten without using the pipe operator as the following:

data <- select(iris, starts_with('Petal'))
data <- rowwise(data)
sum(data)

As you can see you were constructing something called a tibble. Then the rowwise call adds additional information on this object and specifies that it should be grouped row-wise.

However only the functions aware of this grouping like summarize and mutate can work like intended. Base R functions like sum are not aware of these objects and treat them as any standard data.frames. And the standard approach for sum() is to sum the entire data frame.

Using mutate works:

select(iris, starts_with('Petal')) %>%
  rowwise() %>%
  mutate(sum = sum(Petal.Width, Petal.Length))

Result:

Source: local data frame [150 x 3]
Groups: <by row>

# A tibble: 150 x 3
   Petal.Length Petal.Width   sum
          <dbl>       <dbl> <dbl>
 1         1.40       0.200  1.60
 2         1.40       0.200  1.60
 3         1.30       0.200  1.50
 ...



回答2:


The problem is that the entire data frame is passed as dot despite the rowwise. To handle this use do which will interpret dot as meaning just the current row. One further problem is that the dot within do will represent the row as a list so convert it appropriately.

iris %>%
  slice(1:6) %>%
  select(starts_with('Petal')) %>% 
  rowwise() %>%
  do( (.) %>% as.data.frame %>% mutate(sum = sum(.)) ) %>%
  ungroup

giving:

# A tibble: 6 x 3
  Petal.Length Petal.Width   sum
*        <dbl>       <dbl> <dbl>
1         1.40       0.200  1.60
2         1.40       0.200  1.60
3         1.30       0.200  1.50
4         1.50       0.200  1.70
5         1.40       0.200  1.60
6         1.70       0.400  2.10


来源:https://stackoverflow.com/questions/49396267/dplyr-rowwise-sum-and-other-functions-like-max

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!