Assign colors to a data frame based on shared values with a character string in R

半腔热情 提交于 2019-12-11 00:19:16

问题


I'm working in R. I have many different data frames that have sample names in them and I'm trying to assign a color to each row in each data frame based on the sample names. There are many rows that have the same sample names in them, but I have messy output data so I can't sort by sample name. Here's a small example case of what I have

names          <- c( "TC3", "102", "172", "136", "142", "143", "AC2G" )
colors         <- c( "darkorange", "forestgreen", "darkolivegreen", "darkgreen", "darksalmon", "firebrick3", "firebrick1" )
dataA          <- c( "JR13-101A", "TC3B", "JR12-136C", "AC2GA", "TC3A" )
newcolors      <- rep( NA, length( dataA ) )
dataA          <- as.data.frame( cbind( dataA, newcolors ) )

and I've tried the following (with loops, I know, but that's all I could think to do). I'm also trying to get away from falling back on loops in R, but I have yet to break the habit.
Here's what I've tried. Probably something obvious, but I just get NA returned for all the newcolors

for( i in 1:nrow( dataA ) ) {
  for( j in 1:length( names ) ) {
    if( grepl( dataA$dataA[ i ], names[ j ] ) ) {
   dataA$newcolors[ i ]  <- colors[ j ] 
    }
  }
}

回答1:


Here is a solution, which eliminates 1 loop:

dataA$newcolors<-as.character(dataA$newcolors)
for( j in 1:length( names ) ) {
    dataA$newcolors[grep(names[j], dataA$dataA)] <- colors[j] 
}

Converting the newcolors column to character instead of a factor makes the updating much easier. If the number of names is short then there should not be much of a performance impact with the single loop.



来源:https://stackoverflow.com/questions/42457149/assign-colors-to-a-data-frame-based-on-shared-values-with-a-character-string-in

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!