Count occurrences of value in a set of variables in R (per row)

后端 未结 5 1741
-上瘾入骨i
-上瘾入骨i 2021-01-06 01:16

Let\'s say I have a data frame with 10 numeric variables V1-V10 (columns) and multiple rows (cases).

What I would like R to do is: For each case, give me the number

5条回答
  •  日久生厌
    2021-01-06 01:34

    Here is another straightforward solution that comes closest to what the COUNT command in SPSS does — creating a new variable that, for each case (i.e., row) counts the occurrences of a given value or list of values across a list of variables.

    #Let df be a data frame with four variables (V1-V4)
    df <- data.frame(V1=c(1,1,2,1,NA),V2=c(1,NA,2,2,NA),
           V3=c(1,2,2,1,NA), V4=c(NA, NA, 1,2, NA))
    
     #This is how to compute a new variable counting occurences of value "1" in V1-V4.      
        df$count.1 <- apply(df, 1, function(x) length(which(x==1)))
    

    The updated data frame contains the new variable count.1 exactly as the SPSS COUNT command would do.

     > df
          V1 V2 V3 V4 count.1
        1  1  1  1 NA       3
        2  1 NA  2 NA       1
        3  2  2  2  1       1
        4  1  2  1  2       2
        5 NA NA NA NA       0
    

    You can do the same to count how many time the value "2" occurs per row in V1-V4. Note that you need to select the columns (variables) in df to which the function is applied.

    df$count.2 <- apply(df[1:4], 1, function(x) length(which(x==2)))
    

    You can also apply a similar logic to count the number of missing values in V1-V4.

    df$count.na <- apply(df[1:4], 1, function(x) sum(is.na(x)))
    

    The final result should be exactly what you wanted:

     > df
          V1 V2 V3 V4 count.1 count.2 count.na
        1  1  1  1 NA       3       0        1
        2  1 NA  2 NA       1       1        2
        3  2  2  2  1       1       3        0
        4  1  2  1  2       2       2        0
        5 NA NA NA NA       0       0        4
    

    This solution can easily be generalized to a range of values. Suppose we want to count how many times a value of 1 or 2 occurs in V1-V4 per row:

    df$count.1or2 <- apply(df[1:4], 1, function(x) sum(x %in% c(1,2)))
    

提交回复
热议问题