Determine if Value in Final Column exists in respective rows

隐身守侯 提交于 2019-12-23 04:22:34

问题


I have a dataframe as follows:

df1

ColA     ColB    ColC    ColD     ColE     COlF    ColG      Recs
   1      A-1   A - 3       B        B       NA                 C
   1              B-1     C R        D        E      NA         B
   1       NA       A       B        A                          B

How do I determine if the last from the column Recs is found in it's respective row?

I tried below but it doesn't work because there are duplicates in my normal dataset:

df1$Exist <- apply(df1, 1, FUN = function(x) 
            c("No", "Yes")[(anyDuplicated(x[!is.na(x) & x != "" ])!=0) +1])

There are also blanks, NA's, and character values that have spaces and dashes.

Final output should be:

ColA     ColB    ColC    ColD     ColE     COlF    ColG      Recs    Exist?
   1      A-1   A - 3       B        B       NA                 C        No
   1              B-1     C R        D        E      NA         B        No
   1       NA       A       B        A                          B       Yes

Thanks


回答1:


If I understood you correctly, this should work:

# Compute column index of reference variable
col_ind <- which(colnames(df1) == "Recs")

# Compute boolean vector of presence
present_bool <- apply(df1, 1, function(row) {
  any(row[col_ind] == row[-col_ind], na.rm = TRUE)
})

# Create the desired column
df1$Exist <- ifelse(present_bool, "Yes", "No")



回答2:


For efficiency, you could use data.table here.

library(data.table)
setDT(df)[, Exist := Recs %chin% unlist(.SD), .SDcols=-"Recs", by=1:nrow(df)]

which gives

   ColA ColB ColC ColD ColE COlF ColG Recs  Exist
1:    1  A-1  A-3    B    B   NA   NA    C  FALSE
2:    1       B-1  C R    D    E   NA    B  FALSE
3:    1   NA    A    B    A        NA    B   TRUE

Original data:

df <-structure(list(ColA = c(1L, 1L, 1L), ColB = c("A-1", "", NA), 
    ColC = c("A-3", "B-1", "A"), ColD = c("B", "C R", "B"), ColE = c("B", 
    "D", "A"), COlF = c(NA, "E", ""), ColG = c(NA, NA, NA), Recs = c("C", 
    "B", "B")), .Names = c("ColA", "ColB", "ColC", "ColD", "ColE", 
"COlF", "ColG", "Recs"), row.names = c(NA, -3L), class = "data.frame")



回答3:


exist <- rep(NA, nrow(df1))
for (i in 1:nrow(df1)) {
exist[i] <- df1$Recs[i] %in% df1[i, 1:7]
}
df1 <- cbind(df1, exist)



回答4:


This should be another way of obtaining the desired result:

f.checkExist <- function(x) {
 grepl(df[x, 8], df[x, 1:7])
}

df$exists <- grepl(T, lapply(1:nrow(df), f.checkExist))


来源:https://stackoverflow.com/questions/43022095/determine-if-value-in-final-column-exists-in-respective-rows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!