问题
Hy guys, I just need a suggestion on how to plot in a proper way some data so that they could be self-explanatory. I have a matrix of counts that looks like this:
Condition1 Condition2 Condition3 ...... Patient1 30 4 23 ...... Patient2 22 1 2 ...... Patient3 23 56 13 ...... Patient4 4 3 29 ...... Patient5 12 6 1 ...... Patient6 98 5 0 ...... ........ .... ... ... ......
This is a table of counts of how many times a patient has an adverse event under the listed conditions. Total patients: 50. Total conditions: 8. I tried to plot the heatmap but I suppose it is not the proper way to plot this type of data because of the discrete nature of them.
Could you help me please just giving me some indications?
thank you in advance
回答1:
Lot's of different things can be done - depends on what you want to show. This page lists several examples along with code. Here's a few interesting ones that I'd try out:
- Heatmap is a good option (I personally prefer using ggplot::geom_tile()withscale_fill_gradient()set to high contrast colours). Example here.
- Lineplot (suggested by @erocoar) is good for small data samples, for 50 patients, it's quite cumbersome.
- You could show the likelihood of one more conditions affecting a patient given the occurance of another condition (e.g. how likely is condition 2 to result in conditions 3, 4 & 5 etc.)
- You could plot the instances for each patient and use facets for each condition (using facet_grid()) - preferred over displaying multiple lines in the same plot. Example here.
- Cumulative hisogram
- Marginal histogram (show distribution of a connected condition on the side?) - useful if exploring the relationship b/w different conditions.
- Animated bubble chart - plot instances for each patient and cycle through the conditions in an animation (using gganimate)
- Overlaid density plots (set transparency alphato a reasonable value to see all different conditions). Might be a bit tough for 8 conditions though unless you're able to/want to group them somehow.
- Facetted heatmap (similar to a calendar heatmap but instead of week/month, you have the patiend Id). The first link has a pretty good example for that.
- Plot instances of different conditions against each other (more for exploring different hypotheses on if/how the conditions are linked)
I would also recommend normalizing the values for each patient to ensure your plots don't go out of scale.
I haven't included any code for brevity and since the first link pretty much covers all of these examples. I generally prefer using ggplot but if you want you can make your plots interactive using plotly. 
Lastly, IF you are trying to explore your data (or providing somene tools to explore your data), coding different plots etc. can be cumbersome when doing that repeatedly, you may want to take a look at creating a shiny app.
回答2:
Although you said you don't want a heatmap, I did it because I thinks its a good solution.
library(plotly)
df <- data.frame(
  PATIENT = c('Patient1', 'Patient2', 'Patient3', 'Patient4', 'Patient5', 'Patient6'),
  COND_1 = c(30, 22, 23, 4, 12, 98),
  COND_2 = c(4, 1, 56, 3, 6, 5),
  COND_3 = c(23, 2, 13, 29, 1, 0),
  stringsAsFactors = F
)
p <- plot_ly(
  x = colnames(df[,-1]),
  y = df$PATIENT[nrow(df):1],    # reversing the order of the rows
  z = as.matrix(df[nrow(df):1,-1]),
  type = "heatmap"
) %>%
  layout(
    xaxis = list(side = "top")
  )
p
There are many options to customize this within plotly (the colors, the axes, the margin on the left). If you need any help you can just ask.
回答3:
Other than a heatmap, you could perhaps also display the data this way --
library(ggplot2)
df_long = df %>% gather()
ggplot() +
  geom_segment(aes(x = df_long$key[df_long$key != "Condition8"],
                   y = df_long$value[df_long$key != "Condition8"],
                   xend = df_long$key[df_long$key != "Condition1"],
                   yend = df_long$value[df_long$key != "Condition1"]), 
               lwd = 1) +
  geom_vline(aes(xintercept = 1:8), alpha = 0.5, lwd = 2)
where each line represents a patient/row. But not sure how it will look for 50 patients
来源:https://stackoverflow.com/questions/49050803/plot-discrete-data