Count the number of times (frequency) a string occurs

大城市里の小女人 提交于 2019-12-12 03:46:29

问题


I have a column in my dataframe as follows

   Col1
   ----------------------------------------------------------------------------
   Center for Animal Control, Division of Hypertension, Department of Medicine
   Department of Surgery, Division of Primary Care, Center for Animal Control
   Department of Internal Medicine, Division of Hypertension, Center for Animal Control

How do I count the number of strings that occur that is separated by a comma, in other words what I am trying to accomplish is something like this below

    Affiliation                         Freq
    ------------------------------------------
    Center for Animal Control           3
    Division of Hypertension            2
    Department of Medicine              1
    Department of Surgery               1
    Division of Primary Care            1
    Department of Internal Medicine     1  

Could someone help me to figure this out?


回答1:


Assumption: Center for Animal Control, Division of Hypertension, Department of Medicine is value for row 1, Department of Surgery, Division of Primary Care, Center for Animal Control for row 2 and so on.

df is the data frame.

aff_val <- trimws(unlist(strsplit(df$col1,",")))

ans <- data.frame(table(aff_val))

colnames(ans)[1] <- 'Affiliation'



回答2:


Here is one approach. Also substitute '\n' with a comma since you have some new lines in your text.

df <- data.frame(col1 = rep("Center for Animal Control, Division of Hypertension, Department of Medicine, Department of Surgery, Division of Primary Care, Center for Animal Control, Department of Internal Medicine, Division of Hypertension, Center for Animal Control", 1), stringsAsFactors = FALSE)
df$col1 <- gsub('\\n', ', ', df$col1)
as.data.frame(table(unlist(strsplit(df$col1, ', '))))

Output as follows (on original data):

                             Var1 Freq
1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1



回答3:


I use scan and trimws for these text processing tasks.

inp <- "    Center for Animal Control, Division of Hypertension, Department of Medicine
    Department of Surgery, Division of Primary Care, Center for Animal Control
    Department of Internal Medicine, Division of Hypertension, Center for Animal Control"

> table( trimws(scan(text=inp, what="", sep=",")))
Read 9 items

      Center for Animal Control Department of Internal Medicine 
                              3                               1 
         Department of Medicine           Department of Surgery 
                              1                               1 
       Division of Hypertension        Division of Primary Care 
                              2                               1 

Can also wrap as.data.frame around that result:

> as.data.frame(table(  trimws(scan(text=inp, what="", sep=","))))
Read 9 items
                             Var1 Freq
1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1



回答4:


library(stringr)
a<-"Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
con<-textConnection(a)
tbl<-read.table(con,sep=",")
vec<-str_trim(unlist(tbl))
as.data.frame(table(vec))

The answer is

1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1



回答5:


text = "Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"

library(stringi)
library(dplyr)
library(tidyr)

data_frame(text = text) %>%
  mutate(line = text %>% stri_split_fixed("\n") ) %>%
  unnest(line) %>%
  mutate(phrase = line %>% stri_split_fixed(", ") ) %>%
  unnest(phrase) %>%
  count(phrase)


来源:https://stackoverflow.com/questions/36877940/count-the-number-of-times-frequency-a-string-occurs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!