问题
I want to create a new column in my data frame that is either TRUE or FALSE depending on whether a term occurs in two specified columns. This is some example data:
AB <- c('CHINAS PARTY CONGRESS','JAPAN-US RELATIONS','JAPAN TRIES TO')
TI <- c('AMERICAN FOREIGN POLICY', 'CHINESE ATTEMPTS TO', 'BRITAIN HAS TEA')
AU <- c('AUTHOR 1', 'AUTHOR 2','AUTHOR 3')
M <- data.frame(AB,TI,AU)
I can do it for one column, or the other, but I cannot figure out how to do it for both. In other words, I don't know how to combine these two lines that would not mutually overwrite each other.
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$AB)
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$TI)
It is important that I specify the columns, I cannot choose the whole data.frame.I have looked for other similar questions, but none seemed to apply to my case and I haven't been able to adapt any existing examples. This is what would make sense to me:
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=(M$AB|M$TI)
回答1:
Using:
M$China <- !!rowSums(sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO"))
gives:
> M AB TI AU China 1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1 TRUE 2 JAPAN-US RELATIONS CHINESE ATTEMPTS TO AUTHOR 2 TRUE 3 JAPAN TRIES TO BRITAIN HAS TEA AUTHOR 3 FALSE
What this does:
sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO")loops over the twoABandTIcolumns and looks whether one of the parts of the pattern ("CHINA|CHINESE|SINO") is present.The
sapply-call returns a matrix ofTRUE/FALSEvalues:AB TI [1,] TRUE FALSE [2,] FALSE TRUE [3,] FALSE FALSEWith
rowSumsyou check how manyTRUE-values each row has.- By adding
!!in front ofrowSumsyou convert all values from therowSums-call higher than zero toTRUEand all eros toFALSE.
回答2:
If we need to collapse to a single vector, use the Map to loop through the columns, apply the pattern to get a list of logical vector, then Reduce it to a logical vector using |
M$China <- Reduce(`|`, Map(grepl, "CHINA|CHINESE|SINO", M))
M
# AB TI AU China
#1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1 TRUE
#2 JAPAN-US RELATIONS CHINESE ATTEMPTS TO AUTHOR 2 TRUE
#3 JAPAN TRIES TO BRITAIN HAS TEA AUTHOR 3 FALSE
Or using the same methodology in tidyverse
library(tidyverse)
M %>%
mutate_all(funs(str_detect(., "CHINA|CHINESE|SINO"))) %>%
reduce(`|`) %>%
mutate(M, China = .)
来源:https://stackoverflow.com/questions/47941680/grepl-across-multiple-specified-columns