I have a data frame with a large number of variables. I am creating new variables by adding together some of the old ones. The code I am using to do so is:
n
My first instinct was to suggest to use sum()
since then you can use the na.rm
argument. However, this doesn't work, since sum()
reduces it arguments to a single scalar value, not a vector.
This means you need to write a parallel sum
function. Let's call this psum()
, similar to the base R function pmin()
or pmax()
:
psum <- function(..., na.rm=FALSE) {
x <- list(...)
rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm)
}
Now set up some data and use psum()
to get the desired vector:
dat <- data.frame(
x = c(1,2,3, NA),
y = c(NA, 4, 5, NA))
transform(dat, new=psum(x, y, na.rm=TRUE))
x y new
1 1 NA 1
2 2 4 6
3 3 5 8
4 NA NA 0
Similarly, you can define a parallel product
, or pprod()
like this:
pprod <- function(..., na.rm=FALSE) {
x <- list(...)
m <- matrix(unlist(x), ncol=length(x))
apply(m, 1, prod, na.rm=TRUE)
}
transform(dat, new=pprod(x, y, na.rm=TRUE))
x y new
1 1 NA 1
2 2 4 8
3 3 5 15
4 NA NA 1
This example of pprod
provides a general template for what you want to do: Create a function that uses apply()
to summarize a matrix of input into the desired vector.
Using rowSums
and prod
could help you out.
set.seed(007) # Generating some data
DF <- data.frame(V1=sample(c(50,NA,36,24,80, NA), 15, replace=TRUE),
V2=sample(c(70,40,NA,25,100, NA), 15, replace=TRUE),
V3=sample(c(20,26,34,15,78,40), 15, replace=TRUE))
transform(DF, Sum=rowSums(DF, na.rm=TRUE)) # Sum (a vector of values)
transform(DF, Prod=apply(DF, 1, FUN=prod, na.rm=TRUE)) # Prod (a vector of values)
# Defining a function for substracting (resta, in spanish :D)
resta <- function(x) Reduce(function(a,b) a-b, x <- x[!is.na(x)])
transform(DF, Substracting=apply(DF, 1, resta))
# Defining a function for dividing
div <- function(x) Reduce(function(a,b) a/b, x <- x[!is.na(x)])
transform(DF, Divsion=apply(DF, 1, div))