Nested if else statements over a number of columns

后端 未结 4 1027
情深已故
情深已故 2020-12-03 00:07

I have a large data.frame where the first three columns contain information about a marker. The remaining columns are of numeric type for that

4条回答
  •  抹茶落季
    2020-12-03 00:58

    Here is my approach using the function pmax. Note that this will give you the maximum if there are two or more values above 0.8 for each individual:

    df <- read.table(textConnection("                      marker alleleA alleleB   X818 X818.1 X818.2   X345 X345.1 X345.2   X346 X346.1 X346.2
    1   kgp5209280_chr3_21902067       T       A 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000
    2 chr3_21902130_21902131_A_T       A       T 0.8626 0.1356 0.0018 0.7676 0.2170 0.0154 0.8626 0.1356 0.0018
    3 chr3_21902134_21902135_T_C       T       C 0.6982 0.2854 0.0164 0.5617 0.3749 0.0634 0.6982 0.2854 0.0164"), header=TRUE)
    
    #data.table solution
    library(data.table)
    DT <- as.data.table(df)
    DT[, M818 := ifelse(pmax(X818, X818.1, X818.2) > 0.8, pmax(X818, X818.1, X818.2), NA)]
    DT[, M345 := ifelse(pmax(X345, X345.1, X345.2) > 0.8, pmax(X345, X345.1, X345.2), NA)]
    DT[, M346 := ifelse(pmax(X346, X346.1, X346.2) > 0.8, pmax(X346, X346.1, X346.2), NA)]
    
    #Base R solution
    df$M818 <- ifelse(pmax(df$X818, df$X818.1, df$X818.2) > 0.8, pmax(df$X818, df$X818.1, df$X818.2), NA)
    df$M345 <- ifelse(pmax(df$X345, df$X345.1, df$X345.2) > 0.8, pmax(df$X345, df$X345.1, df$X345.2), NA)
    df$M346 <- ifelse(pmax(df$X346, df$X346.1, df$X346.2) > 0.8, pmax(df$X346, df$X346.1, df$X346.2), NA)
    

    If you want to get rid of the other columns, just type:

    DT[, list(marker, alleleA, alleleB, M818, M345, M346)]
                           marker alleleA alleleB   M818 M345   M346
    1:   kgp5209280_chr3_21902067       T       A 1.0000    1 1.0000
    2: chr3_21902130_21902131_A_T       A       T 0.8626   NA 0.8626
    3: chr3_21902134_21902135_T_C       T       C     NA   NA     NA
    

提交回复
热议问题