问题
I have a dataframe of comments which looks like this(df1)
Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine
Android phones are great these days
I lost my visa card today
I have another dataframe of merchent names which looks like this(df2):
Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart
If a merchant_name in df2 appears in a Comment in df 1 ,append that merchant name to the second column in df1 in R.The match need not be an exact match.An approximation is what is required.Also,the df1 contains around 500K rows! My final ooutput df may look like this
Comments Merchant
Apple laptops are really good for work,we should buy them Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine Google
Android phones are great these days Android
I lost my visa card today Visa
How may i do this and efficiently in R.?? Thanks
回答1:
This is a job for regex
. Check out the grepl
command inside the lapply
.
comments = c(
'Apple laptops are really good for work,we should buy them',
'Apple Iphones are too costly,we can resort to some other brands',
'Google search is the best search engine ',
'Android phones are great these days',
'I lost my visa card today'
)
brands = c(
'Google',
'Android',
'Geoni',
'Visa',
'Apple',
'MC',
'WallMart'
)
brandinpattern = lapply(
brands,
function(brand) {
commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
if ( sum(commentswithbrand) > 0) {
data.frame(
comment = comments[commentswithbrand],
brand = brand
)
} else {
data.frame()
}
}
)
brandinpattern = do.call(rbind, brandinpattern)
> do.call(rbind, brandinpattern)
comment brand
1 Google search is the best search engine Google
2 Android phones are great these days Android
3 I lost my visa card today Visa
4 Apple laptops are really good for work,we should buy them Apple
5 Apple Iphones are too costly,we can resort to some other brands Apple
回答2:
Try this
final_df <- data.frame(Comments = character(), Merchant_Name = character(), stringsAsFactors = F)
for(i in df1$Comments){
for(j in df2$Merchant_Name){
if(grepl(tolower(j),tolower(i))){
final_df[nrow(final_df) + 1,] <- c(i, j)
break
}
}
}
final_df
## comments brands
##1 Apple laptops are really good for work,we should buy them Apple
##2 Apple Iphones are too costly,we can resort to some other brands Apple
##3 Google search is the best search engine Google
##4 Android phones are great these days Android
##5 I lost my visa card today Visa
来源:https://stackoverflow.com/questions/33688413/filling-a-column-in-a-dataframe-based-on-a-column-in-another-dataframe-in-r