问题
I'm working in R. I have many different data frames that have sample names in them and I'm trying to assign a color to each row in each data frame based on the sample names. There are many rows that have the same sample names in them, but I have messy output data so I can't sort by sample name. Here's a small example case of what I have
names <- c( "TC3", "102", "172", "136", "142", "143", "AC2G" )
colors <- c( "darkorange", "forestgreen", "darkolivegreen", "darkgreen", "darksalmon", "firebrick3", "firebrick1" )
dataA <- c( "JR13-101A", "TC3B", "JR12-136C", "AC2GA", "TC3A" )
newcolors <- rep( NA, length( dataA ) )
dataA <- as.data.frame( cbind( dataA, newcolors ) )
and I've tried the following (with loops, I know, but that's all I could think to do). I'm also trying to get away from falling back on loops in R, but I have yet to break the habit.
Here's what I've tried. Probably something obvious, but I just get NA
returned for all the newcolors
for( i in 1:nrow( dataA ) ) {
for( j in 1:length( names ) ) {
if( grepl( dataA$dataA[ i ], names[ j ] ) ) {
dataA$newcolors[ i ] <- colors[ j ]
}
}
}
回答1:
Here is a solution, which eliminates 1 loop:
dataA$newcolors<-as.character(dataA$newcolors)
for( j in 1:length( names ) ) {
dataA$newcolors[grep(names[j], dataA$dataA)] <- colors[j]
}
Converting the newcolors column to character instead of a factor makes the updating much easier. If the number of names is short then there should not be much of a performance impact with the single loop.
来源:https://stackoverflow.com/questions/42457149/assign-colors-to-a-data-frame-based-on-shared-values-with-a-character-string-in