New complexity to color coding based on percentile and another factor in ggplot

狂风中的少年 提交于 2019-12-13 05:15:13

问题


I would like to add another level of complexity to the color coding scheme I have going on in the below plot. I want to account for whether each of the values being plotted has passed a statistical test. So, the dots will only be color coded based on the percentile if they pass the test, otherwise, I would like the dot to be grey.

Here is my code as I have it after all the helpful suggestions I received from my first post Color code points based on percentile in ggplot (note: this is some made up data, though I have real data which has many more entries:

dat <- data.frame(key = c("a1-a3", "a1-a2"), position = 1:100, fst = rlnorm(200, 0, 1), fet = rnorm(200, 0.24, 0.54))

#Get quantiles
quants <- quantile(dat$fst, c(0.95, 0.99))

dat$quant  <- with(dat, factor(ifelse(fst < quants[1], 0,
                                  ifelse(fst < quants[2], 1, 2))))

dat$fisher <- with(dat, factor(ifelse(fet > 1.30102999566398, 0, 1)))

dat$col <- with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant)))

########theme set
theme_set(theme_bw(base_size = 10))

p1 <- ggplot(dat, aes(x=position, y=fst)) +
  geom_point(aes(colour = col, size=0.2)) +
  facet_wrap(~key, nrow = 1) +
  scale_colour_manual(values = c("black", "blue", "red", "grey"), labels = c("0-95", "95-99", "99-100", "fail")) +
  ylab(expression(F[ST])) +
  xlab("Genomic Position (Mb)") +
  scale_x_continuous(breaks=c(0, 1e+06, 2e+06, 3e+06, 4e+06), labels=c("0", "1", "2", "3", "4")) +
  scale_y_continuous(limits=c(0,1)) +
  theme(plot.background = element_blank(),
    panel.background = element_blank(),
    panel.border = element_blank(),
    legend.position="none",
    legend.title = element_blank()
    )

tiff(Fstvalues_colourcode3.tiff", height=2.5, width=6.5, units="in", res = 300, pointsize="10")
p1
dev.off()

My problem is in the line: dat$col <- with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant))). I want it to use the value from the $quant if it has an $fet value above the above listed value (or fisher == 0), and if it has an $fet value below, I would like it to make a new factor (3). When I look at the data frame it is doing something different than this. Any comments/suggestions are much appreciated! (I'm pretty new to coding and am finding factors are not easy to work with!!)


回答1:


Yes, your are right, with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant))) gives an 'unexpected' result. Your no return value in ifelse, the factor quant, is coerced to the same class as the yes return value (3), a numeric. Have a look at tail(dat[order(dat$fet), c("fet", "quant", "col")]):

#          fet quant col
# 6   1.202582     0   3
# 40  1.318997     0   1
# 74  1.324552     0   1
# 24  1.415189     1   2
# 38  1.418230     0   1
# 123 1.531584     0   1 

For fet > 1.301 (the test in ifelse), 'col' became 1, 1, 2, 1, 1, instead of 0, 0, 1, 0, 0. Something like this happened:

# original factor version of quant
quant <- as.factor(0:2)
quant
# [1] 0 1 2
# Levels: 0 1 2

# coerce quant to numeric
as.numeric(quant)
# [1] 1 2 3

Compare these two:

set.seed(1)
df <- data.frame(fet = rnorm(9), quant = factor(0:2))
str(df)
df$col <- with(df, ifelse(fet < 0, 3, quant))
df

set.seed(1)
df <- data.frame(fet = rnorm(9), quant = 0:2)
str(df)
df$col <- with(df, ifelse(fet < 0, 3, quant))
df

Thus, try to remove factor from your ifelse call where you create 'quant' and see if it solves the problem.

See also 8.2.1 here: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf‎.

PS. As you phrase your question, it is the single ifelse line is your actual problem (not the plotting part). If so, you may wish to isolate this problem and condense your question.



来源:https://stackoverflow.com/questions/19330546/new-complexity-to-color-coding-based-on-percentile-and-another-factor-in-ggplot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!