Count the frequency of strings in a dataframe R

对着背影说爱祢 提交于 2019-12-11 04:15:15

问题


I am wanting to count the frequencies of certain strings within a dataframe.

strings  <- c("pi","pie","piece","pin","pinned","post")
df <- as.data.frame(strings)

I would then like to count the frequency of the strings:

counts <- c("pi", "in", "pie", "ie")

To give me something like:

string  freq
 pi       5
 in       2
 pie      2
 ie       2

I have experimented with grepl and table but I don't see how I can specify the strings I want to search for are.


回答1:


You can use sapply() to go the counts and match every item in counts against the strings column in df using grepl() this will return a logical vector (TRUE if match, FALSE if non-match). You can sum this vector up to get the number of matches.

sapply(df, function(x) {
  sapply(counts, function(y) {
    sum(grepl(y, x))
  })
})

This will return:

    strings
pi        5
in        2
pie       2
ie        2



回答2:


You can use adist from base R:

data.frame(counts,freq=rowSums(!adist(counts,strings,partial = T)))
  counts freq
1     pi    5
2     in    2
3    pie    2
4     ie    2

If you are comfortable with regular expressions then you can do:

 a=sapply(paste0(".*(",counts,").*|.*"),sub,"\\1",strings)
 table(grep("\\w",a,value = T))
 ie  in  pi pie 
  2   2   5   2 



回答3:


Frequency table created by qgrams from the stringdist package

library(stringdist)
strings  <- c("pi","pie","piece","pin","pinned","post")
frequency <- data.frame(t(stringdist::qgrams(freq = strings, q = 2)))

   freq
pi    5
po    1
st    1
ie    2
in    2
nn    1
os    1
ne    1
ec    1
ed    1
ce    1


来源:https://stackoverflow.com/questions/49552174/count-the-frequency-of-strings-in-a-dataframe-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!