问题
I would like to add another level of complexity to the color coding scheme I have going on in the below plot. I want to account for whether each of the values being plotted has passed a statistical test. So, the dots will only be color coded based on the percentile if they pass the test, otherwise, I would like the dot to be grey.
Here is my code as I have it after all the helpful suggestions I received from my first post Color code points based on percentile in ggplot (note: this is some made up data, though I have real data which has many more entries:
dat <- data.frame(key = c("a1-a3", "a1-a2"), position = 1:100, fst = rlnorm(200, 0, 1), fet = rnorm(200, 0.24, 0.54))
#Get quantiles
quants <- quantile(dat$fst, c(0.95, 0.99))
dat$quant <- with(dat, factor(ifelse(fst < quants[1], 0,
ifelse(fst < quants[2], 1, 2))))
dat$fisher <- with(dat, factor(ifelse(fet > 1.30102999566398, 0, 1)))
dat$col <- with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant)))
########theme set
theme_set(theme_bw(base_size = 10))
p1 <- ggplot(dat, aes(x=position, y=fst)) +
geom_point(aes(colour = col, size=0.2)) +
facet_wrap(~key, nrow = 1) +
scale_colour_manual(values = c("black", "blue", "red", "grey"), labels = c("0-95", "95-99", "99-100", "fail")) +
ylab(expression(F[ST])) +
xlab("Genomic Position (Mb)") +
scale_x_continuous(breaks=c(0, 1e+06, 2e+06, 3e+06, 4e+06), labels=c("0", "1", "2", "3", "4")) +
scale_y_continuous(limits=c(0,1)) +
theme(plot.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
legend.position="none",
legend.title = element_blank()
)
tiff(Fstvalues_colourcode3.tiff", height=2.5, width=6.5, units="in", res = 300, pointsize="10")
p1
dev.off()
My problem is in the line: dat$col <- with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant))). I want it to use the value from the $quant if it has an $fet value above the above listed value (or fisher == 0), and if it has an $fet value below, I would like it to make a new factor (3). When I look at the data frame it is doing something different than this. Any comments/suggestions are much appreciated! (I'm pretty new to coding and am finding factors are not easy to work with!!)
回答1:
Yes, your are right, with(dat, factor(ifelse(fet < 1.30102999566398, 3, quant)))
gives an 'unexpected' result. Your no
return value in ifelse
, the factor
quant, is coerced to the same class as the yes
return value (3), a numeric
. Have a look at tail(dat[order(dat$fet), c("fet", "quant", "col")])
:
# fet quant col
# 6 1.202582 0 3
# 40 1.318997 0 1
# 74 1.324552 0 1
# 24 1.415189 1 2
# 38 1.418230 0 1
# 123 1.531584 0 1
For fet > 1.301 (the test
in ifelse
), 'col' became 1, 1, 2, 1, 1, instead of 0, 0, 1, 0, 0. Something like this happened:
# original factor version of quant
quant <- as.factor(0:2)
quant
# [1] 0 1 2
# Levels: 0 1 2
# coerce quant to numeric
as.numeric(quant)
# [1] 1 2 3
Compare these two:
set.seed(1)
df <- data.frame(fet = rnorm(9), quant = factor(0:2))
str(df)
df$col <- with(df, ifelse(fet < 0, 3, quant))
df
set.seed(1)
df <- data.frame(fet = rnorm(9), quant = 0:2)
str(df)
df$col <- with(df, ifelse(fet < 0, 3, quant))
df
Thus, try to remove factor
from your ifelse
call where you create 'quant' and see if it solves the problem.
See also 8.2.1 here: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf.
PS. As you phrase your question, it is the single ifelse
line is your actual problem (not the plotting part). If so, you may wish to isolate this problem and condense your question.
来源:https://stackoverflow.com/questions/19330546/new-complexity-to-color-coding-based-on-percentile-and-another-factor-in-ggplot