Count specific characters from column associated with dual categories of other column. Do it iteratively based on frequency bins

泪湿孤枕 提交于 2020-06-08 12:37:56

问题


I have a huge dataframe df1, whose oversimplified version consists of 3 columns, "Words", "Frequency" and "Letters":

Words           Frequency   Letters
flower/tree     0.15        a(0.1)
tree            0.67        a(0.4)
planet          0.85        b(0.4)
tree/planet     0.42        c(0.5)
tree            0.89        a(0.6)
flower          0.21        b(0.4)
flower/planet   0.53        b
planet          0.07        a

Using R (dplyr, apply family functions, etc.) I would like to count the number of times every letter (a, b, c) of the "Letter" column is associated with every single word from the "Word" column (flower, tree, planet), in an iterative way dependent on the frequency bin of the "Frequency" column values. There are 4 bins: [0, 0.25], [0.25, 0.5], [0.5, 0.75], [0.75, 1].

I expect an output dataframe df2 that looks something like this:

Bin       Word    Letters    count_letters
0-0.25    flower  a          1
0-0.25    flower  b          1
0-0.25    tree    a          1
0-0.25    planet  a          1
0.25-0.5  tree    c          1
0.25-0.5  planet  c          1
0.5-0.75  flower  b          1
0.5-0.75  tree    a          1
0.5-0.75  planet  b          1
0.75-1    tree    a          1
0.75-1    planet  b          1

回答1:


You can use cut to bin Frequency, substr to clean Letters, and tidyr::separate_rows to unnest Word. Aggregate with dplyr::count, and you're set:

library(tidyverse)

df %>% separate_rows(Words) %>% 
    count(Words, 
          Letters = substr(Letters, 1, 1),    # use regex if more than one letter
          Frequency = cut(Frequency, breaks = seq(0, 1, .25)))

## Source: local data frame [11 x 4]
## Groups: Frequency, Words [?]
## 
##     Frequency  Words Letters     n
##        <fctr>  <chr>   <chr> <int>
## 1    (0,0.25] flower       a     1
## 2    (0,0.25] flower       b     1
## 3    (0,0.25] planet       a     1
## 4    (0,0.25]   tree       a     1
## 5  (0.25,0.5] planet       c     1
## 6  (0.25,0.5]   tree       c     1
## 7  (0.5,0.75] flower       b     1
## 8  (0.5,0.75] planet       b     1
## 9  (0.5,0.75]   tree       a     1
## 10   (0.75,1] planet       b     1
## 11   (0.75,1]   tree       a     1


来源:https://stackoverflow.com/questions/42237800/count-specific-characters-from-column-associated-with-dual-categories-of-other-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!