R generate all possible interaction variables

送分小仙女□ 提交于 2020-06-24 12:53:51

问题


I have a dataframe with variables, say a,b,c,d

dat <- data.frame(a=runif(1e5), b=runif(1e5), c=runif(1e5), d=runif(1e5))

and would like to generate all possible two-way interaction terms between each of the columns, that is: ab, ac, ad, bc, bd, cd. In reality my dataframe has over 100 columns, so I cannot code this manually. What is the most efficient way to do this (noting that I do not want both ab and ba)?


回答1:


What do you plan to do with all these interaction terms? There are several options, which is best will depend on what you are trying to do.

If you want to pass the interactions to a modeling function like lm or aov then it is very simple, just use the .^2 syntax:

fit <- lm( y ~ .^2, data=mydf )

The above will call lm and tell it to fit all the main effects and all 2 way interaction for the variables in mydf excluding y.

If for some reason you really want to calculate all the interactions then you can use model.matrix:

tmp <- model.matrix( ~.^2, data=iris)

This will include a column for the intercept and columns for the main effects, but you can drop those if you don't want them.

If you need something different from the modeling then you can use the combn function as @akrun mentions in the comments.




回答2:


Assuming that the expected output would be the combinations of column names (from the comments it should be a_b, a_c etc), we can use combn on the column names of the dataset and specify the m as 2.

combn(colnames(dat), 2, FUN=paste, collapse='_')
#[1] "a_b" "a_c" "a_d" "b_c" "b_d" "c_d"

If we need to multiply the combinations of columns in 'dat', we subset the dataset using each element of the combn output of column names (dat[,x[1]], dat[,x[2]]), multiply (*) it, convert to 'data.frame' (data.frame(), set the column names (setNames) by pasteing the combination of column names. We create the output in a list and cbind the list elements with do.call(cbind.

do.call(cbind, combn(colnames(dat), 2, FUN= function(x) 
                list(setNames(data.frame(dat[,x[1]]*dat[,x[2]]), 
                 paste(x, collapse="_")) )))
#         a_b        a_c        a_d        b_c        b_d        c_d
#1 0.26929788 0.17697473 0.26453066 0.55676619 0.83221898 0.54691008
#2 0.06291005 0.08337501 0.04455453 0.10370775 0.05542008 0.07344851
#3 0.53789990 0.47301970 0.03112880 0.51305076 0.03376319 0.02969076
#4 0.41596384 0.34920860 0.25992717 0.53948322 0.40155468 0.33711187
#5 0.16878584 0.21232357 0.09196025 0.08162171 0.03535148 0.04447027

Benchmarks

set.seed(494)
dat <- data.frame(a=runif(1e6), b=runif(1e6), c=runif(1e6), d=runif(1e6))

greg <- function()model.matrix( ~.^2, data=dat)
akrun <- function() {do.call(cbind, combn(colnames(dat), 2, FUN= function(x) 
           list(setNames(data.frame(dat[,x[1]]*dat[,x[2]]), 
            paste(x, collapse="_")) )))}

system.time(greg())
#  user  system elapsed 
#  1.159   0.024   1.182 

system.time(akrun())
#  user  system elapsed 
#  0.013   0.000   0.013 

library(microbenchmark)
microbenchmark(greg(), akrun(), times=20L, unit='relative')
# Unit: relative
#   expr      min       lq     mean   median       uq      max neval cld
# greg() 39.63122 38.53662 10.23198 18.81274 6.568741 4.642702    20   b
# akrun()  1.00000  1.00000  1.00000  1.00000 1.000000 1.000000    20  a 

NOTE: The benchmarks differ with number of columns, number of rows. Here, I am using the number of columns as showed in the OP's post.

data

set.seed(24)
dat <- data.frame(a=runif(5), b=runif(5), c=runif(5), d=runif(5))



回答3:


Since model.matrix complains for factors with just one level, you alternatively might want to use stats::terms

labels(terms(~.^2, data = iris[, 1:3]))
# [1] "Sepal.Length"              "Sepal.Width"               "Petal.Length"             
# [4] "Sepal.Length:Sepal.Width"  "Sepal.Length:Petal.Length" "Sepal.Width:Petal.Length"


来源:https://stackoverflow.com/questions/31905221/r-generate-all-possible-interaction-variables

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!