Edited df and dict
I have a data frame containing sentences:
df <- data_frame(text = c(\"I love pandas
Update : Here's the easiest dplyr method I've found so far. And I'll add a stringi function to speed things up. Provided there are no identical sentences in df$text, we can group by that column and then apply mutate()
Note: Package versions are dplyr 0.4.1 and stringi 0.4.1
library(dplyr)
library(stringi)
group_by(df, text) %>%
mutate(score = sum(dict$score[stri_detect_fixed(text, dict$word)]))
# Source: local data frame [2 x 2]
# Groups: text
#
# text score
# 1 I love pandas 2
# 2 I hate monkeys -2
I removed the do() method I posted last night, but you can find it in the edit history. To me it seems unnecessary since the above method works as well and is the more dplyr way to do it.
Additionally, if you're open to a non-dplyr answer, here are two using base functions.
total <- with(dict, {
vapply(df$text, function(X) {
sum(score[vapply(word, grepl, logical(1L), x = X, fixed = TRUE)])
}, 1)
})
cbind(df, total)
# text total
# 1 I love pandas 2
# 2 I hate monkeys -2
Or an alternative using strsplit() produces the same result
s <- strsplit(df$text, " ")
total <- vapply(s, function(x) sum(with(dict, score[match(x, word, 0L)])), 1)
cbind(df, total)
A bit of double looping via sapply and gregexpr:
res <- sapply(dict$word, function(x) {
sapply(gregexpr(x,df$text),function(y) length(y[y!=-1]) )
})
rowSums(res * dict$score)
#[1] 2 -2
This also accounts for when there is multiple matches in a single string:
df <- data.frame(text = c("I love love pandas", "I hate monkeys"))
# run same code as above
#[1] 3 -2