问题
I am trying to calculate two things in R, Relative & Absolute Change, & plot 2-y axis scatter plots. I am still seeking inputs on creating a 2-y axis plot for this type of data.
set.seed(123)
df=expand.grid(PatientID=1:3,time=1:3, Group=1:2)
dat <- data.table(df,Outcome=as.integer(runif(9)*100))
Data Format df #sample
PatientID time Outcome Group
1 1 87 1
1 2 32 1
1 3 76 2
2 1 21 2
2 2 23 3
2 3 23 3
## Cont until 200 PatientID or volunteers and there are many outcome measure columns (33:290)
PatientID, time, Outcome, Group denote volunteers' identification number, time of visiting a hospital, outcome measure of interest and Group (whether they belong to a condition A or B) respectively. Data includes 3 visits by participants and two groups.
- Relative Change(%), i.e. expresses the absolute change as a percentage of the outcome from baseline time point, for Group 1 & 2.
[(F - B )/ B]*100, here B and F are baseline and follow up values of a outcome measure
Absolute Change, i.e. F-B
2-y axes scatter plots:
The prime purpose of this plot is to look at the changes in outcome measures with respect to baseline (time=1), and also determine if there are group differences. It is prudent to include respective relative/absolute change values in the plot as y1 and y2.
I had made several scatterplots in ggplot2 and ggvis to view the trends, but I did not find a direct option to calculate (& plot) relative & absolute change through the ggplot2 & ggvis packages. I really recommend using them for novice users, like myself. In addition, I am also planning to incorporate relative & absolute change values in one scatterplot itself for one outcome measure, i.e. 2-y axes plots.
Let me know if you require some more clarifications. Thanks, and looking forward!
Answers for 1 & 2 Ques #thought it might help others
This is how I finally did it:
library(dplyr)
dft1= filter(df, df$time==1)
dft2= filter(df, df$time==2)
dft3= filter(df, df$time==3)
To calculate absolute change from second to first time point & third to first time point:
abs1=dft2[33:290] - dft1[33:290]
abs2=dft3[33:290] - dft1[33:290]
To calculate relative change from second to first time point & third to first time point:
rel1=abs1/dft1[33:290]*100
rel2=abs2/dft1[33:290]*100
I will put absolute change and relative change on different y-axis axes. This link was handy to get me started: (How can I plot with 2 different y-axes?).
Nice resource for learning R: https://stackoverflow.com/tags/r/info
回答1:
Not clear exactly what you mean but you should be able to modify this code to achieve your purpose:
dat = data.table(PatientID=c(1,2), time=c(1:3), Outcome=c(87, 32,76,21,24, 27))
#Modified so you can actually compare across 2 time periods
#Note your data is already sorted, but to be on the safe side:
setkey(dat,PatientID,time)
dat[, `:=`(rel.change.1 = 100 * (Outcome - shift(Outcome)) / Outcome,
rel.change.2 = 100 * (Outcome - shift(Outcome, 2)) / Outcome,
abs.change.1 = Outcome - shift(Outcome),
abs.change.2 = Outcome - shift(Outcome, 2)),
by = PatientID]
The key idea is to use shift
to get a shift of the Outcome
column; the second argument to shift is the number of rows by which to shift it. Combined with grouping by PatientID
, and given that we keyed the data.table
in order to ensure it was sorted by time
within groups of PatientID
, this ensures the correct comparison. (Note, if your actual data is not complete, this will not produce correct results. For example, if you have observations at times 1 and 4 for PatientID=1 but 2 and 3 for PatientID = 2, then both 1-shifts will compare these observations even though they are not the same number of time units apart. If this is the case you should use CJ
on the ID and time columns to get rows in which you fill NAs
for all the missing observations; that will ensure that the shifts reflect the correct time differences.)
This produces:
> dat
PatientID time Outcome rel.change.1 rel.change.2 abs.change.1 abs.change.2
1: 1 1 87 NA NA NA NA
2: 1 2 24 -262.50000 NA -63 NA
3: 1 3 76 68.42105 -14.47368 52 -11
4: 2 1 21 NA NA NA NA
5: 2 2 32 34.37500 NA 11 NA
6: 2 3 27 -18.51852 22.22222 -5 6
Now, we can melt,
melted <- melt(dat,id.vars=c("PatientID","time"),variable.factor=F)
> melted
PatientID time variable value
1: 1 1 Outcome 87.00000
2: 1 2 Outcome 24.00000
3: 1 3 Outcome 76.00000
4: 2 1 Outcome 21.00000
5: 2 2 Outcome 32.00000
6: 2 3 Outcome 27.00000
7: 1 1 rel.change.1 NA
8: 1 2 rel.change.1 -262.50000
9: 1 3 rel.change.1 68.42105
10: 2 1 rel.change.1 NA
11: 2 2 rel.change.1 34.37500
12: 2 3 rel.change.1 -18.51852
13: 1 1 rel.change.2 NA
14: 1 2 rel.change.2 NA
15: 1 3 rel.change.2 -14.47368
16: 2 1 rel.change.2 NA
17: 2 2 rel.change.2 NA
18: 2 3 rel.change.2 22.22222
19: 1 1 abs.change.1 NA
20: 1 2 abs.change.1 -63.00000
21: 1 3 abs.change.1 52.00000
22: 2 1 abs.change.1 NA
23: 2 2 abs.change.1 11.00000
24: 2 3 abs.change.1 -5.00000
25: 1 1 abs.change.2 NA
26: 1 2 abs.change.2 NA
27: 1 3 abs.change.2 -11.00000
28: 2 1 abs.change.2 NA
29: 2 2 abs.change.2 NA
30: 2 3 abs.change.2 6.00000
PatientID time variable value
And plot
ggplot(melted,aes(x=time,y=value,color=factor(PatientID))) +
geom_point() +
facet_wrap(~variable,scales="free") +
labs(color="PatientID")
回答2:
Other approach:
set.seed(123)
df = expand.grid(PatientID = 1:3,time = 1:3)
dat <- data.table(df,Outcome = as.integer(runif(9) * 100))
setkeyv(dat,"PatientID")
dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID]
dat[, rel.change := abs.change / Outcome[time == 1], by = PatientID]
ggplot(melt(dat,c('PatientID','time')), aes(x = time,y = value,color = factor(PatientID))) +
geom_line() +
facet_wrap( ~ variable,scales = "free")
Which gives (drawing borrowed from @Philip answer):
You can chain the two steps of adding columns like this (but it's less readable):
dat[, abs.change := (Outcome - Outcome[time == 1]), by = PatientID][, rel.change := abs.change / Outcome[time == 1], by = PatientID]
来源:https://stackoverflow.com/questions/35741538/2-y-axes-plot-with-calculated-absolute-relative-change-in-r