Splitting Strings and Generating Frequency Tables in R

醉酒当歌 提交于 2019-12-01 06:44:10

If you wanted to, you could do it in a one-liner:

R> text <- c("ABC Industries", "ABC Enterprises", 
+            "123 and 456 Corporation", "XYZ Company")
R> table(do.call(c, lapply(text, function(x) unlist(strsplit(x, " ")))))

        123         456         ABC         and     Company 
          1           1           2           1           1 
Corporation Enterprises  Industries         XYZ 
          1           1           1           1 
R> 

Here I use strsplit() to break each entry intro components; this returns a list (within a list). I use do.call() so simply concatenate all result lists into one vector, which table() summarises.

Here is another one-liner. It uses paste() to combine all of the column entries into a single long text string, which it then splits apart and tabulates:

text <- c("ABC Industries", "ABC Enterprises", 
         "123 and 456 Corporation", "XYZ Company")

table(strsplit(paste(text, collapse=" "), " "))

You can use the package tidytext and dplyr:

set.seed(42)

text <- c("ABC Industries", "ABC Enterprises", 
       "123 and 456 Corporation", "XYZ Company")

data <- data.frame(category = sample(text, 100, replace = TRUE),
                   stringsAsFactors = FALSE)

library(tidytext)
library(dplyr)

data %>%
  unnest_tokens(word, category) %>%
  group_by(word) %>%
  count()

#> # A tibble: 9 x 2
#> # Groups:   word [9]
#>          word     n
#>         <chr> <int>
#> 1         123    29
#> 2         456    29
#> 3         abc    45
#> 4         and    29
#> 5     company    26
#> 6 corporation    29
#> 7 enterprises    21
#> 8  industries    24
#> 9         xyz    26
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!