Dummy Variable for each year

匿名 (未验证) 提交于 2019-12-03 09:14:57

问题:

If I have the following data.frame, how would I go about creating a dummy variable for each year and attach it to DF so there would be additional columns year2010 and year2011. I have a fairly large dataset with many different years and I don't want to use ifelse 50 times. ddply?

Thanks

 DF <- read.table(text=" year     id     var     ans      2010      1      1       1      2010      2      0       0      2010      1      0       1      2010      1      0       1      2011      2      1       1      2011      2      0       1      2011      1      0       0      2011      1      0       0", header=TRUE) 

Desired output :

  year id var ans year_2010 year_2011 1 2010  1   1   1         1         0 2 2010  2   0   0         1         0 3 2010  1   0   1         1         0 4 2010  1   0   1         1         0 5 2011  2   1   1         0         1 6 2011  2   0   1         0         1 7 2011  1   0   0         0         1 8 2011  1   0   0         0         1 

1

回答1:

Just use table, like this:

cbind(DF, as.data.frame.matrix(table(sequence(nrow(DF)), DF$year)))   year id var ans 2010 2011 1 2010  1   1   1    1    0 2 2010  2   0   0    1    0 3 2010  1   0   1    1    0 4 2010  1   0   1    1    0 5 2011  2   1   1    0    1 6 2011  2   0   1    0    1 7 2011  1   0   0    0    1 8 2011  1   0   0    0    1 

You should also be able to do something like this:

library(data.table) cbind(DF,        dcast.data.table(as.data.table(DF, keep.rownames = TRUE),                         rn ~ year, value.var = "id", fun.aggregate = length)) #   year id var ans rn 2010 2011 # 1 2010  1   1   1  1    1    0 # 2 2010  2   0   0  2    1    0 # 3 2010  1   0   1  3    1    0 # 4 2010  1   0   1  4    1    0 # 5 2011  2   1   1  5    0    1 # 6 2011  2   0   1  6    0    1 # 7 2011  1   0   0  7    0    1 # 8 2011  1   0   0  8    0    1 

If you want the names to be "year_2010" and so on, I guess a workaround would be to do something like this:

dcast.data.table(as.data.table(DF, keep.rownames = TRUE)[, yr := "year"],                   rn ~ yr + year, value.var = "id", fun.aggregate = length) 

You can also always write your own function. Here's one I've whipped together that should be reasonably efficient:

dummyCreator <- function(invec, prefix = NULL) {   L <- length(invec)   ColNames <- sort(unique(invec))   M <- matrix(0L, ncol = length(ColNames), nrow = L,               dimnames = list(NULL, ColNames))   M[cbind(seq_len(L), match(invec, ColNames))] <- 1L   if (!is.null(prefix)) colnames(M) <- paste(prefix, colnames(M), sep = "_")   M }   dummyCreator(DF$year, prefix = "year") #      year_2010 year_2011 # [1,]         1         0 # [2,]         1         0 # [3,]         1         0 # [4,]         1         0 # [5,]         0         1 # [6,]         0         1 # [7,]         0         1 # [8,]         0         1 

Just use cbind as above to get the output you expect.



回答2:

Here is my favorite code for creating dummy variables from a categorical variable. The only difference is that this code produces K-1 dummy variable to avoid colinearity:

x = as.factor( rep(1:6,each=4) ); model.matrix(~x)[,-1] 

Substitute x with the year from your data set.



回答3:

maybe this?

library(tidyr) DF$row <- 1:nrow(DF)  # to make each row unique DF$dummy <- 1  newdf <- spread(DF, year, dummy, fill = 0) 


回答4:

 for(i in unique(DF$year)) {           DF[paste('year',i,sep="")]=DF$year==i   } 


回答5:

As Andrey Shabalin mentioned, you want model.matrix. First you need to convert the year column to be a factor. To get exactly what you want, you need to use contr.ltfr, a modified version of contr.treatment in the caret package.

In the formula below, 0 means don't use an intercept and . represents all the columns in the data frame.

DF$year <- factor(DF$year) model.matrix(   ~ 0 + .,    DF,    contrasts.arg = list(year = "contr.ltfr") ) 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!