I am looking for a function in R similar to lag1
, lag2
and retain
functions in SAS which I can use with data.tables.
I know th
You have to be aware that R works very different from the data step in SAS. The lag
function in SAS is used in the data step, and is used within the implicit loop structure of that data step. The same goes for the retain
function, which simply keeps the value constant when going through the data looping.
R on the other hand works completely vectorized. This means that you have to rethink what you want to do, and adapt accordingly.
retain
is simply useless in R, as R recycles arguments by default. If you want to do this explicitly, you might look at eg rep()
to construct a vector with constant values and a certain length.lag
is a matter of using indices, and just shifting position of all values in a vector. In order to keep a vector of the same length, you need to add some NA
and remove some extra values. A simple example: This SAS code lags a variable x
and adds a variable year
that has a constant value:
data one;
retain year 2013;
input x @@;
y=lag1(x);
z=lag2(x);
datalines;
1 2 3 4 5 6
;
In R, you could write your own lag function like this:
mylag <- function(x,k) c(rep(NA,k),head(x,-k))
This single line adds k times NA at the beginning of the vector, and drops the last k values from the vector. The result is a lagged vector as given by lag1
etc. in SAS.
this allows something like :
nrs <- 1:6 # equivalent to datalines
one <- data.frame(
x = nrs,
y = mylag(nrs,1),
z = mylag(nrs,2),
year = 2013 # R automatically loops, so no extra command needed
)
The result is :
> one
x y z year
1 1 NA NA 2013
2 2 1 NA 2013
3 3 2 1 2013
4 4 3 2 2013
5 5 4 3 2013
6 6 5 4 2013
Exactly the same would work with a data.table
object. The important note here is to rethink your strategy: Instead of thinking loopwise as you do with the DATA step in SAS, you have to start thinking in terms of vectors and indices when using R.