可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am an inexperienced R user and have been struggling with the By() function and would appreciate your help. The task is simple, I have a longitudinal dataset (How do I declare one?) and need to calculate different metrics by ID. One of the the metrics is a simple percent change (that requires a lag, example below):
ID Date Temp %Change AAA 1/1/2003 0.749881714 NA AAA 1/2/2003 0.666661576 -0.110977687 AAA 1/3/2003 0.773079935 0.159628759 AAA 1/4/2003 0.62902364 -0.186340751 AAA 1/5/2003 0.733312374 0.165794619 BBB 1/1/2003 0.707339766 NA BBB 1/2/2003 0.764986529 0.081497982 BBB 1/3/2003 0.662201467 -0.134361925 BBB 1/4/2003 0.774451765 0.169510798 BBB 1/5/2003 0.50829093 -0.343676453 CCC 1/1/2003 0.836836215 NA CCC 1/2/2003 0.837136823 0.00035922 CCC 1/3/2003 0.809016624 -0.033590924 CCC 1/4/2003 0.690277509 -0.146769685 CCC 1/5/2003 0.796357069 0.153676686
Intuitively I understand the use of By(), but haven't been able to produce the correct result (%Change) using a data.frame that contains $ID, $Date, and $Temp. Any suggestions in how to achieve the desired %Change would be greatly appreciated.
回答1:
You can use a combination of plyr
to handle the group by operation on ID and quantmod
has a function for the percentage change named Delt
.
require(plyr) require(quantmod) > ddply(dat, "ID", transform, DeltaCol = Delt(Temp)) ID Date Temp X.Change Delt.1.arithmetic 1 AAA 1/1/2003 0.7498817 NA NA 2 AAA 1/2/2003 0.6666616 -0.11097769 -0.1109776868 3 AAA 1/3/2003 0.7730799 0.15962876 0.1596287574 4 AAA 1/4/2003 0.6290236 -0.18634075 -0.1863407501 5 AAA 1/5/2003 0.7333124 0.16579462 0.1657946178 6 BBB 1/1/2003 0.7073398 NA NA 7 BBB 1/2/2003 0.7649865 0.08149798 0.0814979813 8 BBB 1/3/2003 0.6622015 -0.13436192 -0.1343619242 9 BBB 1/4/2003 0.7744518 0.16951080 0.1695107963 10 BBB 1/5/2003 0.5082909 -0.34367645 -0.3436764522 11 CCC 1/1/2003 0.8368362 NA NA 12 CCC 1/2/2003 0.8371368 0.00035922 0.0003592196 13 CCC 1/3/2003 0.8090166 -0.03359092 -0.0335909235 14 CCC 1/4/2003 0.6902775 -0.14676969 -0.1467696849 15 CCC 1/5/2003 0.7963571 0.15367669 0.1536766860
Alternatively, you can skip the plyr
bit, calculate the Delta for the entire data.frame and then update the first row for each ID. There are lots of good ideas about selecting the first row of a data.frame based off of an identifier here. Something like this would probably work:
dat$Delta <- Delt(dat$Temp) dat[ diff(c(0,dat$ID)) != 0, 5] <- NA
On a related note, if anyone can explain why Delta doesn't seem to accept my plea to give it a reasonable column name, I'd appreciate it.
回答2:
Since the OP specifically asked about using by()
I thought I'd provide an answer the illustrates it's use.
First you write a function that acts on each 'piece' of the data frame:
myFun <- function(x){ n <- nrow(x) x$Change <- c(NA,diff(x$Temp) / head(x$Temp,n-1)) x }
I've used the base functions diff
to calculate the sequential differences in Temp
and then since the resulting vector has length n-1, we use head
to divide the the differences by all but the last Temp value. (I did this just to work head
in since it's handy; there are lots of other ways to do that).
Then the by
call:
by(dat,dat$ID,FUN=myFun)
If you want to put all the pieces back together again, we can use do.call
and rbind
:
do.call(rbind,by(dat,dat$ID,FUN=myFun))
回答3:
Your suggested output is not "%change" (but rather fractional difference) while this illustrates a method getting "percent of original" using the initial value as the basis for the change:
> dat$pctTemp <- unlist( tapply(dat$Temp, dat$ID, function(x) c(NA, 100*x[-1]/x[1]) ) ) > dat ID Date Temp pctTemp 1 AAA 1/1/2003 0.7498817 NA 2 AAA 1/2/2003 0.6666616 88.90223 3 AAA 1/3/2003 0.7730799 103.09358 4 AAA 1/4/2003 0.6290236 83.88305 5 AAA 1/5/2003 0.7333124 97.79041 6 BBB 1/1/2003 0.7073398 NA 7 BBB 1/2/2003 0.7649865 108.14980 8 BBB 1/3/2003 0.6622015 93.61858 snipped
If you want interval change, you can divide diff(x) by the prceding values
> dat$pctTemp <- unlist(tapply(dat$Temp, dat$ID, function(x) c(NA, 100*diff(x)/x[-length(x)]) ) ) > dat ID Date Temp pctTemp 1 AAA 1/1/2003 0.7498817 NA 2 AAA 1/2/2003 0.6666616 -11.09776868 3 AAA 1/3/2003 0.7730799 15.96287574 4 AAA 1/4/2003 0.6290236 -18.63407501 5 AAA 1/5/2003 0.7333124 16.57946178 6 BBB 1/1/2003 0.7073398 NA 7 BBB 1/2/2003 0.7649865 8.14979813 snipped