Convert multiple binary columns to single categorical column [duplicate]

痴心易碎 提交于 2019-12-02 11:38:56


I have a table full of binary variables that I would like to condense down to categorical variables.

Very simplistically, I have is a data frame like this:

data <- data.frame(id=c(1,2,3,4,5,6,7,8,9), red=c("1","0","0","0","1","0","0","0","0"),blue=c("0","1","1","1","0","1","1","1","0"),yellow=c("0","0","0","0","0","0","0","0","1"))
  id   red   blue  yellow
1  1   1    0      0
2  2   0    1      0
3  3   0    1      0
4  4   0    1      0
5  5   1    0      0
6  6   0    1      0
7  7   0    1      0
8  8   0    1      0
9  9   0    0      1

And what I would like to get back would be:

  id   color 
1  1   red    
2  2   blue   
3  3   blue    
4  4   blue    
5  5   red    
6  6   blue    
7  7   blue    
8  8   blue    
9  9   yellow 

I hope there's a really simple answer for this.


You can get the values by making use of the column names and as.logical. However, since your "binary" columns are factors, you need to go though a few more hoops:

> apply(data[-1], 1, function(x) names(x)[as.logical(as.numeric(as.character(x)))])
[1] "red"    "blue"   "blue"   "blue"   "red"    "blue"   "blue"   "blue"   "yellow"

Bind this back with the first column (data[1]) to get the output you want.

      color = apply(data[-1], 1, 
                    function(x) names(x)[as.logical(as.numeric(
#   id  color
# 1  1    red
# 2  2   blue
# 3  3   blue
# 4  4   blue
# 5  5    red
# 6  6   blue
# 7  7   blue
# 8  8   blue
# 9  9 yellow

Alternatively, you can try the following:

data[-1] <- lapply(data[-1], function(x) as.numeric(as.character(x)))
temp <- subset(cbind(data[1], stack(data[-1])), values == 1, c("id", "ind"))
temp[order(temp$id), ]

Or, you can use a combination of "dplyr" and "tidyr", like this:


data %>%
  group_by(id) %>%
  mutate_each(funs(an = as.numeric(as.character(.)))) %>%
  gather(color, val, -id) %>%
  filter(val == 1) %>%
  select(-val) %>%
# Source: local data frame [9 x 2]
#   id  color
# 1  1    red
# 2  2   blue
# 3  3   blue
# 4  4   blue
# 5  5    red
# 6  6   blue
# 7  7   blue
# 8  8   blue
# 9  9 yellow


Here's a simple base R vectorized solution using max.col

cbind(data[1L], color = names(data[-1L])[max.col(data[-1L] == 1L)])
#   id  color
# 1  1    red
# 2  2   blue
# 3  3   blue
# 4  4   blue
# 5  5    red
# 6  6   blue
# 7  7   blue
# 8  8   blue
# 9  9 yellow

