Filling a column in a dataframe based on a column in another dataframe in r

混江龙づ霸主 提交于 2019-12-10 19:00:32

问题


I have a dataframe of comments which looks like this(df1)

Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine 
Android phones are great these days
I lost my visa card today

I have another dataframe of merchent names which looks like this(df2):

Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart

If a merchant_name in df2 appears in a Comment in df 1 ,append that merchant name to the second column in df1 in R.The match need not be an exact match.An approximation is what is required.Also,the df1 contains around 500K rows! My final ooutput df may look like this

Comments                                                        Merchant
Apple laptops are really good for work,we should buy them       Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine                         Google
Android phones are great these days                             Android
I lost my visa card today                                       Visa

How may i do this and efficiently in R.?? Thanks


回答1:


This is a job for regex. Check out the grepl command inside the lapply.

comments = c(
   'Apple laptops are really good for work,we should buy them',
   'Apple Iphones are too costly,we can resort to some other brands',
   'Google search is the best search engine ',
   'Android phones are great these days',
   'I lost my visa card today'
)

brands = c(
   'Google',
   'Android',
   'Geoni',
   'Visa',
   'Apple',
   'MC',
   'WallMart'
)

brandinpattern = lapply(
   brands,
   function(brand) {
      commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
      if ( sum(commentswithbrand) > 0) {
         data.frame(
            comment = comments[commentswithbrand],
            brand = brand
         )
      } else {
         data.frame()
      }
   }
)

brandinpattern = do.call(rbind, brandinpattern)


> do.call(rbind, brandinpattern)
                                                          comment   brand
1                        Google search is the best search engine   Google
2                             Android phones are great these days Android
3                                       I lost my visa card today    Visa
4       Apple laptops are really good for work,we should buy them   Apple
5 Apple Iphones are too costly,we can resort to some other brands   Apple



回答2:


Try this

final_df <- data.frame(Comments = character(), Merchant_Name = character(), stringsAsFactors = F)

for(i in df1$Comments){
    for(j in df2$Merchant_Name){ 
        if(grepl(tolower(j),tolower(i))){ 
            final_df[nrow(final_df) + 1,] <- c(i, j)
            break
        }
    }
}


final_df

##                                                        comments  brands
##1       Apple laptops are really good for work,we should buy them   Apple
##2 Apple Iphones are too costly,we can resort to some other brands   Apple
##3                        Google search is the best search engine   Google
##4                             Android phones are great these days Android
##5                                       I lost my visa card today    Visa


来源:https://stackoverflow.com/questions/33688413/filling-a-column-in-a-dataframe-based-on-a-column-in-another-dataframe-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!