How to “group” order data in R for a barplot?

最后都变了- 提交于 2019-12-11 07:38:16

问题


I'm doing bioinformatics and I need to output a graph (barplot) with the ancestry results in it. Usually, these graphs are draw by grouping the populations together. The way it's done is that you simply plot the barplot of the Q score (data below) for the different putative population (here 4).

The problem is that I use ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),] to sort my values. That way I see that some bars do not cluster in the right population (see the orange bar in the graph that should cluster with the first group). Thus, I would like to know how it would be possible to cluster the bars by colour (which represents the population).

Is there a way to solve this?

barplotgeno <- function(tbl, # To plot the Q scores (ancestry). 
                        col = c("#FF3030", # nice colors
                                "#9ACD31",
                                "#1D90FF",
                                "#FF8001"), 
                        pdf = TRUE,
                        pdf.path.name = "~/Desktop/Stacked_barplot.pdf") {

  ord = tbl[order(tbl$V1,tbl$V2,tbl$V3),]

  if(pdf) {
    pdf(pdf.path.name, width = 11, height = 8.5)
  bp = barplot(t(as.matrix(ord[,1:dim(ord)[2]-1])), 
               space = c(0.2),#space = c(0),# Space between the bars
               col=col, #rainbow(4),
               xlab="Individual #", 
               ylab="Ancestry", xaxt="n",
               border=NA, main = "Stacked barplot from ADMIXTURE analysis")
  labs <- row.names(ord)
  text(cex=0.5, x=bp+1, y=-0.03, labs, xpd=TRUE, srt=90, pos=2)
    dev.off()
  } else {
    bp = barplot(t(as.matrix(ord[,1:dim(ord)[2]])), 
                 space = c(0.2),
                 col=col, 
                 xlab="Individual #", 
                 ylab="Ancestry", xaxt="n",
                 border=NA, 
                 main = "Stacked barplot from ADMIXTURE analysis")
    labs <- row.names(ord)
    text(cex=0.5, x=bp+1, y=-0.03, labs, xpd=TRUE, srt=90, pos=2)
  }
}

# Example 
barplotgeno(tbl = tbl, pdf = FALSE)

Here are the data:

tbl = structure(list(V1 = c(1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.430202, 0.99997, 
0.801974, 1e-05, 0.99997, 0.99997, 1e-05, 0.999968, 1e-05, 1e-05, 
1.3e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 1e-05, 1.8e-05, 
1e-05, 1.2e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1.1e-05, 1e-05, 0.642925, 
1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 0.99997, 1e-05, 0.99997, 
0.99997, 0.287976, 1e-05, 0.99997, 0.99997, 0.99997, 1e-05, 0.533994, 
0.99997, 0.99997, 1e-05, 0.99997, 1e-05, 0.99997, 1e-05, 1e-05, 
0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 0.307669, 0.99997, 
0.604114, 0.604792, 0.29646, 0.514252, 0.99997, 0.798616, 0.516577, 
1e-05, 1e-05, 1e-05, 1e-05, 0.449886, 1e-05, 1e-05, 1e-05, 1e-05, 
0.790272, 1e-05, 0.576786, 0.776731), V2 = c(0.99997, 1e-05, 
1e-05, 1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1.2e-05, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 1e-05, 
0.99997, 0.99997, 1e-05, 1e-05, 0.528138, 0.99997, 1e-05, 1e-05, 
1e-05, 1e-05, 0.99997, 1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.712004, 0.99997, 
1e-05, 1e-05, 1e-05, 0.99997, 0.465986, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 0.99997, 
0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.05777, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.591857, 0.99997, 1e-05, 
0.99997, 0.99997, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05
), V3 = c(1e-05, 1e-05, 1e-05, 0.541112, 1e-05, 1e-05, 0.329922, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.198006, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.999967, 0.451508, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.749225, 
1e-05, 0.99997, 0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 0.442211, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.188, 
1e-05, 0.248756, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 1e-05, 0.395866, 0.395188, 0.293429, 0.427968, 
1e-05, 0.201364, 0.483403, 1e-05, 1e-05, 0.408123, 1e-05, 0.550094, 
1e-05, 1e-05, 1e-05, 1e-05, 0.209708, 0.533729, 0.423194, 0.223249
), V4 = c(1e-05, 1e-05, 0.99997, 0.458868, 1e-05, 1e-05, 0.670058, 
0.99997, 0.99997, 1e-05, 1e-05, 0.99997, 0.99997, 0.569778, 1e-05, 
1e-05, 0.99997, 1e-05, 1e-05, 0.99997, 1e-05, 1e-05, 0.99997, 
1e-05, 0.548472, 1e-05, 0.99997, 1e-05, 1e-05, 1e-05, 0.99997, 
0.471833, 1e-05, 0.250753, 0.99997, 1e-05, 1e-05, 1e-05, 0.999969, 
1e-05, 0.357055, 0.557769, 1e-05, 1e-05, 0.99997, 0.99997, 1e-05, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
1e-05, 1e-05, 1e-05, 0.81198, 1e-05, 0.751224, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 0.692311, 
1e-05, 1e-05, 1e-05, 0.410101, 1e-05, 1e-05, 1e-05, 1e-05, 0.99997, 
0.99997, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 1e-05, 
0.466251, 1e-05, 1e-05)), .Names = c("V1", "V2", "V3", "V4"), class = "data.frame", row.names = c(NA, 
-93L))

If you need a grouping variable here is one (in order of the data):

species = c("fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", 
"fuliginosa", "fuliginosa", "fuliginosa", "fuliginosa", "fortis", 
"fuliginosa", "fuliginosa", "fortis", "fortis", "fortis", "fuliginosa", 
"fuliginosa", "fortis", "scandens", "fortis", "fortis", "scandens", 
"scandens", "fuliginosa", "magnirostris", "scandens", "magnirostris", 
"fortis", "fortis", "fortis", "fortis", "fortis", "scandens", 
"fortis", "scandens", "scandens", "magnirostris", "fortis", "fortis", 
"fortis", "scandens", "magnirostris", "magnirostris", "fortis", 
"fortis", "fortis", "fortis", "magnirostris", "fortis", "magnirostris", 
"magnirostris", "magnirostris", "fortis", "fortis", "fortis", 
"magnirostris", "magnirostris", "fortis", "fortis", "fortis", 
"fortis", "fortis", "fortis", "fortis", "fortis", "fortis", "fortis", 
"magnirostris", "fortis", "fortis", "fortis", "fortis", "fortis", 
"fortis", "fortis", "fortis", "fortis", "fortis", "fuliginosa", 
"fortis", "fortis", "fortis", "fortis", "fuliginosa", "fuliginosa", 
"fuliginosa", "fuliginosa", "fortis", "fortis", "fortis", "fortis", 
"fortis", "fortis")

回答1:


A slightly better way:

library(tidyverse)

plot_data <- tbl %>% 
  mutate(id = row_number()) %>% 
  gather('pop', 'prob', V1:V4) %>% 
  group_by(id) %>% 
  mutate(likely_assignment = pop[which.max(prob)],
         assingment_prob = max(prob)) %>% 
  arrange(likely_assignment, desc(assingment_prob)) %>% 
  ungroup() %>% 
  mutate(id = forcats::fct_inorder(factor(id)))

A plot with ggplot:

ggplot(plot_data, aes(id, prob, fill = pop)) +
  geom_col() +
  theme_classic()

Or with facets:

ggplot(plot_data, aes(id, prob, fill = pop)) +
  geom_col() +
  facet_grid(~likely_assignment, scales = 'free', space = 'free')



来源:https://stackoverflow.com/questions/41679888/how-to-group-order-data-in-r-for-a-barplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!