grepl across multiple, specified columns

ぐ巨炮叔叔 提交于 2019-12-06 05:59:29

问题


I want to create a new column in my data frame that is either TRUE or FALSE depending on whether a term occurs in two specified columns. This is some example data:

AB <- c('CHINAS PARTY CONGRESS','JAPAN-US RELATIONS','JAPAN TRIES TO')
TI <- c('AMERICAN FOREIGN POLICY', 'CHINESE ATTEMPTS TO', 'BRITAIN HAS TEA')
AU <- c('AUTHOR 1', 'AUTHOR 2','AUTHOR 3')
M  <- data.frame(AB,TI,AU)

I can do it for one column, or the other, but I cannot figure out how to do it for both. In other words, I don't know how to combine these two lines that would not mutually overwrite each other.

M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$AB)
M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=M$TI)

It is important that I specify the columns, I cannot choose the whole data.frame.I have looked for other similar questions, but none seemed to apply to my case and I haven't been able to adapt any existing examples. This is what would make sense to me:

M$China <- mapply(grepl, "CHINA|CHINESE|SINO", x=(M$AB|M$TI)

回答1:


Using:

M$China <- !!rowSums(sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO"))

gives:

> M
                     AB                      TI       AU China
1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1  TRUE
2    JAPAN-US RELATIONS     CHINESE ATTEMPTS TO AUTHOR 2  TRUE
3        JAPAN TRIES TO         BRITAIN HAS TEA AUTHOR 3 FALSE

What this does:

  • sapply(M[1:2], grepl, pattern = "CHINA|CHINESE|SINO") loops over the two AB and TI columns and looks whether one of the parts of the pattern ("CHINA|CHINESE|SINO") is present.
  • The sapply-call returns a matrix of TRUE/FALSE values:

            AB    TI
    [1,]  TRUE FALSE
    [2,] FALSE  TRUE
    [3,] FALSE FALSE
    
  • With rowSums you check how many TRUE-values each row has.

  • By adding !! in front ofrowSums you convert all values from the rowSums-call higher than zero to TRUE and all eros to FALSE.



回答2:


If we need to collapse to a single vector, use the Map to loop through the columns, apply the pattern to get a list of logical vector, then Reduce it to a logical vector using |

M$China <- Reduce(`|`, Map(grepl, "CHINA|CHINESE|SINO", M))
M
#                     AB                      TI       AU China
#1 CHINAS PARTY CONGRESS AMERICAN FOREIGN POLICY AUTHOR 1  TRUE
#2    JAPAN-US RELATIONS     CHINESE ATTEMPTS TO AUTHOR 2  TRUE
#3        JAPAN TRIES TO         BRITAIN HAS TEA AUTHOR 3 FALSE

Or using the same methodology in tidyverse

library(tidyverse)
M %>%
   mutate_all(funs(str_detect(., "CHINA|CHINESE|SINO")))  %>% 
   reduce(`|`) %>%
   mutate(M, China = .)


来源:https://stackoverflow.com/questions/47941680/grepl-across-multiple-specified-columns

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!