How to “group” order data in R for a barplot?

问题

I'm doing bioinformatics and I need to output a graph (barplot) with the ancestry results in it. Usually, these graphs are draw by grouping the populations together. The way it's done is that you simply plot the barplot of the Q score (data below) for the different putative population (here 4).

The problem is that I use ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),] to sort my values. That way I see that some bars do not cluster in the right population (see the orange bar in the graph that should cluster with the first group). Thus, I would like to know how it would be possible to cluster the bars by colour (which represents the population).

Is there a way to solve this?

barplotgeno <- function(tbl, # To plot the Q scores (ancestry). 
                        col = c("#FF3030", # nice colors
                                "#9ACD31",
                                "#1D90FF",
                                "#FF8001"), 
                        pdf = TRUE,
                        pdf.path.name = "~/Desktop/Stacked_barplot.pdf") {

  ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),]

  if(pdf) {
    pdf(pdf.path.name, width = 11, height = 8.5)
  bp = barplot(t(as.matrix(ord[,1:dim(ord)[2]-1])), 
               space = c(0.2),#space = c(0),# Space between the bars
               col=col, #rainbow(4),
               xlab="Individual #", 
               ylab="Ancestry", xaxt="n",
               border=NA, main = "Stacked barplot from ADMIXTURE analysis")
  labs <- row.names(ord)
  text(cex=0.5, x=bp+1, y=-0.03, labs, xpd=TRUE, srt=90, pos=2)
    dev.off()
  } else {
    bp = barplot(t(as.matrix(ord[,1:dim(ord)[2]])), 
                 space = c(0.2),
                 col=col, 
                 xlab="Individual #", 
                 ylab="Ancestry", xaxt="n",
                 border=NA, 
                 main = "Stacked barplot from ADMIXTURE analysis")
    labs <- row.names(ord)
    text(cex=0.5, x=bp+1, y=-0.03, labs, xpd=TRUE, srt=90, pos=2)
  }
}

# Example 
barplotgeno(tbl = tbl, pdf = FALSE)

Here are the data:

tbl = structure(list(V1 = c(1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.430202, 0.99997, 
0.801974, 1e-05, 0.99997, 0.99997, 1e-05, 0.999968, 1e-05, 1e-05, 
1.3e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 1e-05, 1.8e-05, 
1e-05, 1.2e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1.1e-05, 1e-05, 0.642925, 
1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 0.99997, 1e-05, 0.99997, 
0.99997, 0.287976, 1e-05, 0.99997, 0.99997, 0.99997, 1e-05, 0.533994, 
0.99997, 0.99997, 1e-05, 0.99997, 1e-05, 0.99997, 1e-05, 1e-05, 
0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 0.307669, 0.99997, 
0.604114, 0.604792, 0.29646, 0.514252, 0.99997, 0.798616, 0.516577, 
1e-05, 1e-05, 1e-05, 1e-05, 0.449886, 1e-05, 1e-05, 1e-05, 1e-05, 
0.790272, 1e-05, 0.576786, 0.776731), V2 = c(0.99997, 1e-05, 
1e-05, 1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1.2e-05, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 1e-05, 
0.99997, 0.99997, 1e-05, 1e-05, 0.528138, 0.99997, 1e-05, 1e-05, 
1e-05, 1e-05, 0.99997, 1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.712004, 0.99997, 
1e-05, 1e-05, 1e-05, 0.99997, 0.465986, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 0.99997, 
0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.05777, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.591857, 0.99997, 1e-05, 
0.99997, 0.99997, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05
), V3 = c(1e-05, 1e-05, 1e-05, 0.541112, 1e-05, 1e-05, 0.329922, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.198006, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.999967, 0.451508, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.749225, 
1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 0.442211, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.188, 
1e-05, 0.248756, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 0.395866, 0.395188, 0.293429, 0.427968, 
1e-05, 0.201364, 0.483403, 1e-05, 1e-05, 0.408123, 1e-05, 0.550094, 
1e-05, 1e-05, 1e-05, 1e-05, 0.209708, 0.533729, 0.423194, 0.223249
), V4 = c(1e-05, 1e-05, 0.99997, 0.458868, 1e-05, 1e-05, 0.670058, 
0.99997, 0.99997, 1e-05, 1e-05, 0.99997, 0.99997, 0.569778, 1e-05, 
1e-05, 0.99997, 1e-05, 1e-05, 0.99997, 1e-05, 1e-05, 0.99997, 
1e-05, 0.548472, 1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 
0.471833, 1e-05, 0.250753, 0.99997, 1e-05, 1e-05, 1e-05, 0.999969, 
1e-05, 0.357055, 0.557769, 1e-05, 1e-05, 0.99997, 0.99997, 1e-05, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 0.81198, 1e-05, 0.751224, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.692311, 
1e-05, 1e-05, 1e-05, 0.410101, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
0.466251, 1e-05, 1e-05)), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, 
-93L))

If you need a grouping variable here is one (in order of the data):

species = c("fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", 
"fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", "fortis", 
"fuliginosa", "fuliginosa", "fortis", "fortis", "fortis", "fuliginosa", 
"fuliginosa", "fortis", "scandens", "fortis", "fortis", "scandens", 
"scandens", "fuliginosa", "magnirostris", "scandens", "magnirostris", 
"fortis", "fortis", "fortis", "fortis", "fortis", "scandens", 
"fortis", "scandens", "scandens", "magnirostris", "fortis", "fortis", 
"fortis", "scandens", "magnirostris", "magnirostris", "fortis", 
"fortis", "fortis", "fortis", "magnirostris", "fortis", "magnirostris", 
"magnirostris", "magnirostris", "fortis", "fortis", "fortis", 
"magnirostris", "magnirostris", "fortis", "fortis", "fortis", 
"fortis", "fortis", "fortis", "fortis", "fortis", "fortis", "fortis", 
"magnirostris", "fortis", "fortis", "fortis", "fortis", "fortis", 
"fortis", "fortis", "fortis", "fortis", "fortis", "fuliginosa", 
"fortis", "fortis", "fortis", "fortis", "fuliginosa", "fuliginosa", 
"fuliginosa", "fuliginosa", "fortis", "fortis", "fortis", "fortis", 
"fortis", "fortis")

回答1:

A slightly better way:

library(tidyverse)

plot_data <- tbl %>% 
  mutate(id = row_number()) %>% 
  gather('pop', 'prob', V1:V4) %>% 
  group_by(id) %>% 
  mutate(likely_assignment = pop[which.max(prob)],
         assingment_prob = max(prob)) %>% 
  arrange(likely_assignment, desc(assingment_prob)) %>% 
  ungroup() %>% 
  mutate(id = forcats::fct_inorder(factor(id)))

A plot with ggplot:

ggplot(plot_data, aes(id, prob, fill = pop)) +
  geom_col() +
  theme_classic()

Or with facets:

ggplot(plot_data, aes(id, prob, fill = pop)) +
  geom_col() +
  facet_grid(~likely_assignment, scales = 'free', space = 'free')

来源：https://stackoverflow.com/questions/41679888/how-to-group-order-data-in-r-for-a-barplot

标签

sorting

plot

bar-chart