问题
I can't find a way to label datapoints in stripchart
. Using the text
function, as suggested in this question, breaks down when points are stacked or jittered.
I have numerical data in 4 categories (columns 2-5) and would like to label each datapoint with the initials (column 1).
This is my data and the code I have tried:
initials,total,interest,slides,presentation
CU,1.6,1.7,1.5,1.6
DS,1.6,1.7,1.5,1.7
VA,1.7,1.5,1.5,2.1
MB,2.3,2.0,2.1,2.9
HS,1.2,1.3,1.4,1.0
LS,1.8,1.8,1.5,2.0
stripchart(CTscores[-1], method = "stack", las = 1)
text(CTscores$total + 0.05, 1, labels = CTscores$name, cex = 0.5)
The plot below is the best I managed so far. As you see, the data point labels overlap. In addition, the longest y label is cut off.
Can points be labelled in a strip chart? Or do I have to display this with another command to allow for labeling?
回答1:
Here's an alternative that allows you to add color to a strip chart in order to identify the initials:
library(ggplot2)
library(reshape2)
library(gtable)
library(gridExtra)
# Gets default ggplot colors
gg_color_hue <- function(n) {
hues = seq(15, 375, length=n+1)
hcl(h=hues, l=65, c=100)[1:n]}
# Transform to long format
CTscores.m = melt(CTscores, id.var="initials")
# Create a vector of colors with keys for the initials
colvals <- gg_color_hue(nrow(CTscores))
names(colvals) <- sort(CTscores$initials)
# This color vector needs to be the same length as the melted dataset
cols <- rep(colvals,ncol(CTscores)-1)
# Create a basic plot that will have a legend with the desired attributes
g1 <- ggplot(CTscores.m, aes(x=variable, y=value, fill=initials)) +
geom_dotplot(color=NA)+theme_bw()+coord_flip()+scale_fill_manual(values=colvals)
# Extract the legend
fill.legend <- gtable_filter(ggplot_gtable(ggplot_build(g1)), "guide-box")
legGrob <- grobTree(fill.legend)
# Create the plot we want without the legend
g2 <- ggplot(CTscores.m, aes(x=variable, y=value)) +
geom_dotplot(binaxis="y", stackdir="up",binwidth=0.03,fill=cols,color=NA) +
theme_bw()+coord_flip()
# Create the plot with the legend
grid.arrange(g2, legGrob, ncol=2, widths=c(10, 1))
回答2:
What about using the labels as the point markers, rather than having separate labels? Here's an example using ggplot2
rather than base graphics.
In order to avoid overlaps, we directly set the amount of vertical offset for repeated values, rather than leaving it to random jitter. To do that, we need to assign numerical y-values (so that we can add the offset) and then replace the numerical axis labels with the appropriate text labels.
library(ggplot2)
library(reshape2)
library(dplyr)
# Convert data from "wide" to "long" format
CTscores.m = melt(CTscores, id.var="initials")
# Create an offset that we'll use for vertically separating the repeated values
CTscores.m = CTscores.m %>% group_by(variable, value) %>%
mutate(repeats = ifelse(n()>1, 1,0),
offset = ifelse(repeats==0, 0, seq(-n()/25, n()/25, length.out=n())))
ggplot(CTscores.m, aes(label=initials, x=value, y=as.numeric(variable) + offset,
color=initials)) +
geom_text() +
scale_y_continuous(labels=sort(unique(CTscores.m$variable))) +
theme_bw(base_size=15) +
labs(y="", x="") +
guides(color=FALSE)
For completeness, here's how to create the graph with jitter for the repeated values, rather than with a specific offset:
# Convert data from "wide" to "long" format
CTscores.m = melt(CTscores, id.var="initials")
# Mark repeated values (so we can selectively jitter them later)
CTscores.m = CTscores.m %>% group_by(variable, value) %>%
mutate(repeats = ifelse(n()>1, 1,0))
# Jitter only the points with repeated values
set.seed(13)
ggplot() +
geom_text(data=CTscores.m[CTscores.m$repeats==1,],
aes(label=initials, x=value, y=variable, color=initials),
position=position_jitter(height=0.25, width=0)) +
geom_text(data=CTscores.m[CTscores.m$repeats==0,],
aes(label=initials, x=value, y=variable, color=initials)) +
theme_bw(base_size=15) +
guides(color=FALSE)
来源:https://stackoverflow.com/questions/34066131/can-data-points-be-labeled-in-stripcharts