2-y Axes Plot with calculated absolute & relative Change in R

问题

I am trying to calculate two things in R, Relative & Absolute Change, & plot 2-y axis scatter plots. I am still seeking inputs on creating a 2-y axis plot for this type of data.

set.seed(123)
df=expand.grid(PatientID=1:3,time=1:3, Group=1:2)
dat <- data.table(df,Outcome=as.integer(runif(9)*100))

Data Format df #sample

PatientID time Outcome Group
     1     1    87 1
     1     2    32 1
     1     3    76 2
     2     1    21 2
     2     2    23 3
     2     3    23 3

 ## Cont until 200 PatientID or volunteers and there are many outcome measure columns (33:290)

PatientID, time, Outcome, Group denote volunteers' identification number, time of visiting a hospital, outcome measure of interest and Group (whether they belong to a condition A or B) respectively. Data includes 3 visits by participants and two groups.

Relative Change(%), i.e. expresses the absolute change as a percentage of the outcome from baseline time point, for Group 1 & 2.

[(F - B )/ B]*100, here B and F are baseline and follow up values of a outcome measure

Absolute Change, i.e. F-B
2-y axes scatter plots:

The prime purpose of this plot is to look at the changes in outcome measures with respect to baseline (time=1), and also determine if there are group differences. It is prudent to include respective relative/absolute change values in the plot as y1 and y2.

I had made several scatterplots in ggplot2 and ggvis to view the trends, but I did not find a direct option to calculate (& plot) relative & absolute change through the ggplot2 & ggvis packages. I really recommend using them for novice users, like myself. In addition, I am also planning to incorporate relative & absolute change values in one scatterplot itself for one outcome measure, i.e. 2-y axes plots.

Let me know if you require some more clarifications. Thanks, and looking forward!

Answers for 1 & 2 Ques #thought it might help others

This is how I finally did it:

library(dplyr) dft1= filter(df, df$time==1) dft2= filter(df, df$time==2) dft3= filter(df, df$time==3)

To calculate absolute change from second to first time point & third to first time point: abs1=dft2[33:290] - dft1[33:290]

abs2=dft3[33:290] - dft1[33:290]

To calculate relative change from second to first time point & third to first time point: rel1=abs1/dft1[33:290]*100

rel2=abs2/dft1[33:290]*100

I will put absolute change and relative change on different y-axis axes. This link was handy to get me started: (How can I plot with 2 different y-axes?).

Nice resource for learning R: https://stackoverflow.com/tags/r/info

回答1:

Not clear exactly what you mean but you should be able to modify this code to achieve your purpose:

dat = data.table(PatientID=c(1,2), time=c(1:3), Outcome=c(87, 32,76,21,24, 27))
#Modified so you can actually compare across 2 time periods
#Note your data is already sorted, but to be on the safe side:
setkey(dat,PatientID,time)
dat[, `:=`(rel.change.1 = 100 * (Outcome - shift(Outcome)) / Outcome,
           rel.change.2 = 100 * (Outcome - shift(Outcome, 2)) / Outcome,
           abs.change.1 = Outcome - shift(Outcome),
           abs.change.2 = Outcome - shift(Outcome, 2)),
           by = PatientID]

The key idea is to use shift to get a shift of the Outcome column; the second argument to shift is the number of rows by which to shift it. Combined with grouping by PatientID, and given that we keyed the data.table in order to ensure it was sorted by time within groups of PatientID, this ensures the correct comparison. (Note, if your actual data is not complete, this will not produce correct results. For example, if you have observations at times 1 and 4 for PatientID=1 but 2 and 3 for PatientID = 2, then both 1-shifts will compare these observations even though they are not the same number of time units apart. If this is the case you should use CJ on the ID and time columns to get rows in which you fill NAs for all the missing observations; that will ensure that the shifts reflect the correct time differences.)

This produces:

> dat
   PatientID time Outcome rel.change.1 rel.change.2 abs.change.1 abs.change.2
1:         1    1      87           NA           NA           NA           NA
2:         1    2      24   -262.50000           NA          -63           NA
3:         1    3      76     68.42105    -14.47368           52          -11
4:         2    1      21           NA           NA           NA           NA
5:         2    2      32     34.37500           NA           11           NA
6:         2    3      27    -18.51852     22.22222           -5            6

Now, we can melt,

melted <- melt(dat,id.vars=c("PatientID","time"),variable.factor=F)

> melted
    PatientID time     variable      value
 1:         1    1      Outcome   87.00000
 2:         1    2      Outcome   24.00000
 3:         1    3      Outcome   76.00000
 4:         2    1      Outcome   21.00000
 5:         2    2      Outcome   32.00000
 6:         2    3      Outcome   27.00000
 7:         1    1 rel.change.1         NA
 8:         1    2 rel.change.1 -262.50000
 9:         1    3 rel.change.1   68.42105
10:         2    1 rel.change.1         NA
11:         2    2 rel.change.1   34.37500
12:         2    3 rel.change.1  -18.51852
13:         1    1 rel.change.2         NA
14:         1    2 rel.change.2         NA
15:         1    3 rel.change.2  -14.47368
16:         2    1 rel.change.2         NA
17:         2    2 rel.change.2         NA
18:         2    3 rel.change.2   22.22222
19:         1    1 abs.change.1         NA
20:         1    2 abs.change.1  -63.00000
21:         1    3 abs.change.1   52.00000
22:         2    1 abs.change.1         NA
23:         2    2 abs.change.1   11.00000
24:         2    3 abs.change.1   -5.00000
25:         1    1 abs.change.2         NA
26:         1    2 abs.change.2         NA
27:         1    3 abs.change.2  -11.00000
28:         2    1 abs.change.2         NA
29:         2    2 abs.change.2         NA
30:         2    3 abs.change.2    6.00000
    PatientID time     variable      value

And plot

ggplot(melted,aes(x=time,y=value,color=factor(PatientID))) +
    geom_point() +
    facet_wrap(~variable,scales="free") +
    labs(color="PatientID")

回答2:

Other approach:

set.seed(123)
df = expand.grid(PatientID = 1:3,time = 1:3)
dat <- data.table(df,Outcome = as.integer(runif(9) * 100))

setkeyv(dat,"PatientID")
dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID]
dat[, rel.change := abs.change / Outcome[time == 1], by = PatientID]

ggplot(melt(dat,c('PatientID','time')), aes(x = time,y = value,color = factor(PatientID))) + 
  geom_line() + 
  facet_wrap( ~ variable,scales = "free")

Which gives (drawing borrowed from @Philip answer):

You can chain the two steps of adding columns like this (but it's less readable):

dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID][, rel.change := abs.change / Outcome[time == 1], by = PatientID]

来源：https://stackoverflow.com/questions/35741538/2-y-axes-plot-with-calculated-absolute-relative-change-in-r

标签

scatter-plot