How to use dplyr `rowwise()` column numbers instead of column names

北慕城南 提交于 2021-02-10 14:39:41

问题


library(tidyverse)
df <- tibble(col1 = c(5, 2), col2 = c(6, 4), col3 = c(9, 9))
df %>% rowwise() %>% mutate(col4 = sd(c(col1, col3)))
# # A tibble: 2 x 4
#    col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
# 1     5     6     9  2.83
# 2     2     4     9  4.95

After asking a series of questions I can finally calculate standard deviation across rows. See my code above.

But I can't use column names in my production code, because the database I pull from likes to change the column names periodically. Lucky for me the relative column positions is always the same.

So I'll just use column numbers instead. And let's check to make sure I can just swap things in and out:

identical(df$col1, df[[1]])
# [1] TRUE

Yes, I can just swap df[[1]] in place of df$col1. I think I do it like this.

df %>% rowwise() %>% mutate(col4 = sd(c(.[[1]], .[[3]])))  
# # A tibble: 2 x 4
#    col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
# 1     5     6     9  3.40
# 2     2     4     9  3.40

df %>% rowwise() %>% {mutate(col4 = sd(c(.[[1]], .[[3]])))} 
# Error in mutate_(.data, .dots = compat_as_lazy_dots(...)) : 
#   argument ".data" is missing, with no default 

Nope, it looks like these don't work because the results are different from my original. And I can't use apply, if you really need to know why I made a separate question.

df %>% mutate(col4 = apply(.[, c(1, 3)], 1, sd))

How do I apply dplyr rowwise() with column numbers instead of names?


回答1:


The issue in using .[[1]] or .[[3]] after doing the rowwise (grouping by row - have only single row per group) is that it breaks the grouping structure and extracts the whole column. Inorder to avoid that, we can create a row_number() column before doing the rowwise and then subset the columns based on that index

library(dplyr)
df %>%
    mutate(rn = row_number()) %>% # create a sequence of row index
    rowwise %>% 
    mutate(col4 = sd(c(.[[1]][rn[1]], .[[3]][rn[1]]))) %>% #extract with index
    select(-rn)
#Source: local data frame [2 x 4]
#Groups: <by row>

# A tibble: 2 x 4
#   col1  col2  col3  col4
#  <dbl> <dbl> <dbl> <dbl>
#1     5     6     9  2.83
#2     2     4     9  4.95

Or another option is map from purrr where we loop over the row_number() and do the subsetting of rows of dataset

library(purrr)
df %>% 
  mutate(col4 = map_dbl(row_number(), ~ sd(c(df[[1]][.x], df[[3]][.x]))))
# A tibble: 2 x 4
#   col1  col2  col3  col4
#   <dbl> <dbl> <dbl> <dbl>
#1     5     6     9  2.83
#2     2     4     9  4.95

Or another option is pmap (if we don't want to use row_number())

df %>%
    mutate(col4 = pmap_dbl(.[c(1, 3)], ~ sd(c(...))))
# A tibble: 2 x 4
#   col1  col2  col3  col4
#  <dbl> <dbl> <dbl> <dbl>
#1     5     6     9  2.83
#2     2     4     9  4.95

Of course, the easiest way would be to use rowSds from matrixStats as described in the dupe tagged post here

NOTE: All of the above methods doesn't require any reshaping




回答2:


Since you don't necessarily know the column names, but know the positions of the columns for which you need standard deviation, etc., I'd reshape into long data and add an ID column. You can gather by position instead of column name, either by giving the numbers of the column that should become the key, or the numbers of the columns to omit from the key. That way, you don't need to specify those values by column because you'll have them all in one column already. Then you can join those summary values back to your original wide-shaped data.

library(dplyr)
library(tidyr)

df <- tibble(col1 = c(5, 2), col2 = c(6, 4), col3 = c(9, 9)) %>%
  mutate(id = row_number())

df %>%
  mutate(id = row_number()) %>%
  gather(key, value, 1, 3) %>%
  group_by(id) %>%
  summarise(sd = sd(value)) %>%
  inner_join(df, by = "id")
#> # A tibble: 2 x 5
#>      id    sd  col1  col2  col3
#>   <int> <dbl> <dbl> <dbl> <dbl>
#> 1     1  2.83     5     6     9
#> 2     2  4.95     2     4     9

Rearrange columns by position as you need.




回答3:


An approach transposing data, converting it to matrix, computing the standard deviation, transposing again and transforming into tibble.

df %>%
  t %>%
  rbind(col4 = c(sd(.[c(1, 3),1]), sd(.[c(1, 3),2]))) %>%
  t %>%
  as_tibble()


来源:https://stackoverflow.com/questions/55855570/how-to-use-dplyr-rowwise-column-numbers-instead-of-column-names

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!