Add variables whilst ignoring NA`s using transform function

后端 未结 2 583
没有蜡笔的小新
没有蜡笔的小新 2021-01-03 05:14

I have a data frame with a large number of variables. I am creating new variables by adding together some of the old ones. The code I am using to do so is:

n         


        
相关标签:
2条回答
  • 2021-01-03 05:49

    My first instinct was to suggest to use sum() since then you can use the na.rm argument. However, this doesn't work, since sum() reduces it arguments to a single scalar value, not a vector.

    This means you need to write a parallel sum function. Let's call this psum(), similar to the base R function pmin() or pmax():

    psum <- function(..., na.rm=FALSE) { 
      x <- list(...)
      rowSums(matrix(unlist(x), ncol=length(x)), na.rm=na.rm)
    } 
    

    Now set up some data and use psum() to get the desired vector:

    dat <- data.frame(
      x = c(1,2,3, NA),
      y = c(NA, 4, 5, NA))
    
    transform(dat, new=psum(x, y, na.rm=TRUE))
       x  y new
    1  1 NA   1
    2  2  4   6
    3  3  5   8
    4 NA NA   0
    

    Similarly, you can define a parallel product, or pprod() like this:

    pprod <- function(..., na.rm=FALSE) { 
      x <- list(...)
      m <- matrix(unlist(x), ncol=length(x))
      apply(m, 1, prod, na.rm=TRUE)
    } 
    
    transform(dat, new=pprod(x, y, na.rm=TRUE))
       x  y new
    1  1 NA   1
    2  2  4   8
    3  3  5  15
    4 NA NA   1
    

    This example of pprod provides a general template for what you want to do: Create a function that uses apply() to summarize a matrix of input into the desired vector.

    0 讨论(0)
  • 2021-01-03 05:49

    Using rowSums and prod could help you out.

    set.seed(007) # Generating some data
    DF <- data.frame(V1=sample(c(50,NA,36,24,80, NA), 15, replace=TRUE),
                     V2=sample(c(70,40,NA,25,100, NA), 15, replace=TRUE),
                     V3=sample(c(20,26,34,15,78,40), 15, replace=TRUE))
    
    transform(DF, Sum=rowSums(DF, na.rm=TRUE)) # Sum (a vector of values)
    transform(DF, Prod=apply(DF, 1, FUN=prod, na.rm=TRUE)) # Prod (a vector of values)
    
    # Defining a function for substracting (resta, in spanish :D)
    resta <- function(x) Reduce(function(a,b) a-b,  x <- x[!is.na(x)])
    transform(DF, Substracting=apply(DF, 1, resta))
    
    # Defining a function for dividing 
    div <- function(x) Reduce(function(a,b) a/b,  x <- x[!is.na(x)])
    transform(DF, Divsion=apply(DF, 1, div))
    
    0 讨论(0)
提交回复
热议问题