How to use dplyr `rowwise()` column numbers instead of column names

问题

library(tidyverse)
df <- tibble(col1 = c(5, 2), col2 = c(6, 4), col3 = c(9, 9))
df %>% rowwise() %>% mutate(col4 = sd(c(col1, col3)))
# # A tibble: 2 x 4
#    col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
# 1     5     6     9  2.83
# 2     2     4     9  4.95

After asking a series of questions I can finally calculate standard deviation across rows. See my code above.

But I can't use column names in my production code, because the database I pull from likes to change the column names periodically. Lucky for me the relative column positions is always the same.

So I'll just use column numbers instead. And let's check to make sure I can just swap things in and out:

identical(df$col1, df[[1]])
# [1] TRUE

Yes, I can just swap df[[1]] in place of df$col1. I think I do it like this.

df %>% rowwise() %>% mutate(col4 = sd(c(.[[1]], .[[3]])))  
# # A tibble: 2 x 4
#    col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
# 1     5     6     9  3.40
# 2     2     4     9  3.40

df %>% rowwise() %>% {mutate(col4 = sd(c(.[[1]], .[[3]])))} 
# Error in mutate_(.data, .dots = compat_as_lazy_dots(...)) : 
#   argument ".data" is missing, with no default

Nope, it looks like these don't work because the results are different from my original. And I can't use apply, if you really need to know why I made a separate question.

df %>% mutate(col4 = apply(.[, c(1, 3)], 1, sd))

How do I apply dplyr rowwise() with column numbers instead of names?

回答1:

The issue in using .[[1]] or .[[3]] after doing the rowwise (grouping by row - have only single row per group) is that it breaks the grouping structure and extracts the whole column. Inorder to avoid that, we can create a row_number() column before doing the rowwise and then subset the columns based on that index

library(dplyr)
df %>%
    mutate(rn = row_number()) %>% # create a sequence of row index
    rowwise %>% 
    mutate(col4 = sd(c(.[[1]][rn[1]], .[[3]][rn[1]]))) %>% #extract with index
    select(-rn)
#Source: local data frame [2 x 4]
#Groups: <by row>

# A tibble: 2 x 4
#   col1  col2  col3  col4
#  <dbl> <dbl> <dbl> <dbl>
#1     5     6     9  2.83
#2     2     4     9  4.95

Or another option is map from purrr where we loop over the row_number() and do the subsetting of rows of dataset

library(purrr)
df %>% 
  mutate(col4 = map_dbl(row_number(), ~ sd(c(df[[1]][.x], df[[3]][.x]))))
# A tibble: 2 x 4
#   col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
#1     5     6     9  2.83
#2     2     4     9  4.95

Or another option is pmap (if we don't want to use row_number())

df %>%
    mutate(col4 = pmap_dbl(.[c(1, 3)], ~ sd(c(...))))
# A tibble: 2 x 4
#   col1  col2  col3  col4
#  <dbl> <dbl> <dbl> <dbl>
#1     5     6     9  2.83
#2     2     4     9  4.95

Of course, the easiest way would be to use rowSds from matrixStats as described in the dupe tagged post here

NOTE: All of the above methods doesn't require any reshaping

回答2:

Since you don't necessarily know the column names, but know the positions of the columns for which you need standard deviation, etc., I'd reshape into long data and add an ID column. You can gather by position instead of column name, either by giving the numbers of the column that should become the key, or the numbers of the columns to omit from the key. That way, you don't need to specify those values by column because you'll have them all in one column already. Then you can join those summary values back to your original wide-shaped data.

library(dplyr)
library(tidyr)

df <- tibble(col1 = c(5, 2), col2 = c(6, 4), col3 = c(9, 9)) %>%
  mutate(id = row_number())

df %>%
  mutate(id = row_number()) %>%
  gather(key, value, 1, 3) %>%
  group_by(id) %>%
  summarise(sd = sd(value)) %>%
  inner_join(df, by = "id")
#> # A tibble: 2 x 5
#>      id    sd  col1  col2  col3
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1  2.83     5     6     9
#> 2     2  4.95     2     4     9

Rearrange columns by position as you need.

回答3:

An approach transposing data, converting it to matrix, computing the standard deviation, transposing again and transforming into tibble.

df %>%
  t %>%
  rbind(col4 = c(sd(.[c(1, 3),1]), sd(.[c(1, 3),2]))) %>%
  t %>%
  as_tibble()

来源：https://stackoverflow.com/questions/55855570/how-to-use-dplyr-rowwise-column-numbers-instead-of-column-names

标签

dplyr