问题
I want to create a new column in my data frame that is either TRUE or FALSE depending on whether a term occurs in two specified columns. This is some example data:
AB <- c('CHINAS PARTY CONGRESS','JAPAN-US RELATIONS','JAPAN TRIES TO')
TI <- c('AMERICAN FOREIGN POLICY', 'CHINESE ATTEMPTS TO', 'BRITAIN HAS TEA')
AU <- c('AUTHOR 1', 'AUTHOR 2','AUTHOR 3')
M <- data.frame(AB,TI,AU)
I can do it for one column, or the other, but I cannot figure out how to do it for both. In other words, I don't know how to combine these two lines that would not mutually overwrite each other.
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$AB)
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$TI)
It is important that I specify the columns, I cannot choose the whole data.frame.I have looked for other similar questions, but none seemed to apply to my case and I haven't been able to adapt any existing examples. This is what would make sense to me:
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=(M$AB|M$TI)
回答1:
Using:
M$China <- !!rowSums(sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO"))
gives:
> M AB TI AU China 1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1 TRUE 2 JAPAN-US RELATIONS CHINESE ATTEMPTS TO AUTHOR 2 TRUE 3 JAPAN TRIES TO BRITAIN HAS TEA AUTHOR 3 FALSE
What this does:
sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO")
loops over the twoAB
andTI
columns and looks whether one of the parts of the pattern ("CHINA|CHINESE|SINO"
) is present.The
sapply
-call returns a matrix ofTRUE
/FALSE
values:AB TI [1,] TRUE FALSE [2,] FALSE TRUE [3,] FALSE FALSE
With
rowSums
you check how manyTRUE
-values each row has.- By adding
!!
in front ofrowSums
you convert all values from therowSums
-call higher than zero toTRUE
and all eros toFALSE
.
回答2:
If we need to collapse to a single vector, use the Map
to loop through the columns, apply the pattern
to get a list
of logical
vector, then Reduce
it to a logical
vector using |
M$China <- Reduce(`|`, Map(grepl, "CHINA|CHINESE|SINO", M))
M
# AB TI AU China
#1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1 TRUE
#2 JAPAN-US RELATIONS CHINESE ATTEMPTS TO AUTHOR 2 TRUE
#3 JAPAN TRIES TO BRITAIN HAS TEA AUTHOR 3 FALSE
Or using the same methodology in tidyverse
library(tidyverse)
M %>%
mutate_all(funs(str_detect(., "CHINA|CHINESE|SINO"))) %>%
reduce(`|`) %>%
mutate(M, China = .)
来源:https://stackoverflow.com/questions/47941680/grepl-across-multiple-specified-columns