Add column to df that's the output of a function that uses different column values combined to be a vector input

a 夏天 提交于 2019-12-24 06:35:50

问题


This is a very simplified version of my actual problem.

My real df has many columns and I need to perform this action using a select from a character vector of column names.

library(tidyverse)


df <- data.frame(a1 = c(1:5), 
             b1 = c(3,1,3,4,6), 
             c1 = c(10:14), 
             a2 = c(9:13), 
             b2 = c(3:7), 
             c2 = c(15:19))
df
  a1 b1 c1 a2 b2 c2
1  1  3 10  9  3 15
2  2  1 11 10  4 16
3  3  3 12 11  5 17
4  4  4 13 12  6 18
5  5  6 14 13  7 19

Let's say I wanted to get the cor for each row for selected columns using mutate - I tried:

df %>% 
  mutate(my_cor = cor(x = c(a1,b1,c2), y = c(a2,b2,c2)))

but this doesn't work as it uses the full column of data for each column header input.

The first row of the my_cor column of the output df from above should be the calculation:

cor(x = c(1,3,10), y = c(9,3,15))

And the next row should be:

cor(x = c(2,1,11), y = c(10,4,16))

and so on. The actual function I'm using is more complex but it does take two vector inputs like cor does so I figured this would be a good proxy.

I have a feeling I should be using purrr for this action (similar to this post) but I haven't gotten it to work.

Bonus: The actual problem I'm facing is using a function that would use many different columns so I'd like to be able select them from a a character vector like my_list_of_cols <- c("a1", "b1", "c1") (my true list is much longer).

I suspect I'd be using pmap_dbl like the post I linked to but I can't get it to work - I tried something like...

mutate(my col = pmap_dbl(select(., var = my_list_of_cols), somefunction))

(note that somefunction in the above portion takes a 2 vector inputs but one of them is static and pre-defined - you can assume the vector c(a2, b2, c2) is the static and predefined one like:

somefunction <- function(a1,b1,c1){
    a2 = 1 
    b2 = 4
    c2 = 5
    my_vec = c(a2, b2, c2)
         cor(x = (a1,b1,c1), y = my_vec)
}

)

I'm still learning how to use purrr so any help would be greatly appreciated!


回答1:


Here is one option to pass an object of column names and other names passed into select

library(tidyverse)
my_list_of_cols <- c("a1", "b1", "c1")
another_list_cols <- c("a2", "b2", "c2")

df %>% 
  mutate(my_cor = pmap_dbl(
    select(., my_list_of_cols,
           another_list_cols), ~ c(...) %>% 
      {cor(.[my_list_of_cols], .[setdiff(names(.), my_list_of_cols)])}
    ))


来源:https://stackoverflow.com/questions/57295004/add-column-to-df-thats-the-output-of-a-function-that-uses-different-column-valu

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!