Count frequency of dictionary words within a column and generate new “dictfreq” column

試著忘記壹切 提交于 2021-01-07 02:16:08

问题


Seems like a simple command, but i cannot seem to find a good way generate this in R. Basically, I just want to count the frequency of each word in a dictionary, dict, within another dataframe's column, wordsgov:

dict = "apple", "pineapple","pear"
df$wordsgov = "i hate apple", "i hate apple", "i love pear", "i don't like pear", "pear is okay", "i eat pineapple sometimes"

desired output: new frequency ranking, showing all words in dict according to their frequency within df$wordsgov

dict    freq_gov
"pear" : 3
"apple": 2
"pineapple: 1

i tried the following code, but it has given me the count of how many times dict words appear in each row of df$wordgov, which is not what i want:

dictongov <- within(
  df,
  counts <- sapply(
    gregexpr(paste0(dict, collapse = "|"), wordsgov),
    function(x) sum(x > 0)
  )
)

i cannot seem to figure out how to change the function so that it gives me the frequency for each word of the dict on dict$wordsgov instead. i tried str_detect but it is also not working. any help at all would be really appreciated!!!

-- edit: i used the following, which worked well.

dictfreq <- df %>% mutate(dict = str_c(str_extract(wordsgov, str_c(dict, collapse = '|')), ':')) %>% 
                   count(dict, name = 'freq_gov') %>% arrange(desc(freq_gov))

however, it took out all the words that had frequency of 0. is there any way to keep the words with frequency of 0? i tried ".drop=FALSE", but it does not seem to be working within this code. any help would be really appreciated. thanks!


回答1:


We can also do this with str_count

library(stringr)
library(purrr)
out <- map_int(str_c("\\b", v2, "\\b"), ~  sum(str_count(v1, .x)))
out
#[1] 2 1 3

rank(out)

data

v1 <- c("i hate apple", "i hate apple", "i love pear", "i don't like pear", 
       "pear is okay", "i eat pineapple sometimes")

v2 <- c("apple", "pineapple", "pear")


来源:https://stackoverflow.com/questions/64991696/count-frequency-of-dictionary-words-within-a-column-and-generate-new-dictfreq

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!