R: How to Count All Character Values Separated By Commas In A Column?

我怕爱的太早我们不能终老 提交于 2021-02-17 05:49:08

问题


Below is a couple of rows of some test data I am using. I am wanting to count the frequency of all the characters in the ICD10Code column which are separated by columns. From the segment of code below, I used group_by because every "PatientId" value had duplicates in that column but had unique values in other columns. How can I go about counting the frequency of all character values?

PatientId ReferralSource     NextAppt   Age InsuranceName            ICD10Code
1584      St Francis         Y       34 SLIDING FEE SCHEDULE     M5136, N809, R51, Z6831  
2655      Piedmont Hospital  Y       60 Medicaid-GA (Medicaid)   E119, E782, I10, L729, R809

The result would look something like this below.

M5136=1
N809=1
R51=1

Being fairly new to R, I tried this segment of code found in Stack (sapply) and just produced a total count for each row specific row.

data.id <- data.1 %>% group_by(PatientId) %>%  
      summarise(ReferralSource=first(ReferralSource),NextAppt=first(NextAppt),
      Age=max(Age),InsuranceName=toString(unique(InsuranceName)),
      ICD10Code=toString(unique(ICD10Code)))
sapply(strsplit(data.id$ICD10Code,","),FUN=function(x){length(x[x!="Null"])})

That produced the total count for each row.

 [1] 10 17  5 18  6  5  8  7  2  8  3  8 10 14  5  5  9  8 11  5  6  5  9 16  9  4  3  9 18  9 12 
  12 12  2 16  6 10
   [38]  2  2  3  4  9  7 12  5 10 16 13  9  1  6  2  7  9  8  5  5  4  3 11 19  6  4  3  7  8  6 
  10  8  6 16 11  5  9
   [75] 13  5  8  4 10  3  7  5  6  4  3  4  8  7  7  4  5  9  2  6  1 20  3  3  3  4  5  5  7  3 
  12  7 16  1  7  6  3
  [112]  4  2  7  8  4  1  9  3  8  3  8  5  8  2  4  4  8  4  7 10  8  2  4  4  2  9  7  7  5  1  
  8  6 10  9  3 11 10
  [149]  3  6  4  6 13  3  7 11  6  5  4  3  1  4 10 10 10 10 11  2  1  5  4  5  5  5  5  9  5  7  
  7  2 6  7  7  6  5
 [186]  7  8  9     

回答1:


To count the frequency of ICD10Code in the entire column, we can split the string on comma, unlist it and count it with table.

table(unlist(strsplit(as.character(data.1$ICD10Code), ',')))



回答2:


One option would be to use separate_rows on the 'ICD10Code' column (assuming it as character class), use that as grouping variable along with 'PatientID' and get the count (n()) in summarise along with the other variables needed in output as showed in OP's post

library(dplyr)
library(tidyr)
data.1 %>%
      separate_rows(ICD10Code) %>%
      group_by(PatientID, ICD10Code) %>%
      summarise(Count = n(), 
                ReferralSource=first(ReferralSource),
                NextAppt=first(NextAppt),       
                Age=max(Age),
                InsuranceName=toString(unique(InsuranceName)))

If in case the other summary output should be only based on grouping by 'PatientID', use 'Count' also as grouping variable instead of 'ICD10Code'


If we want a count only from 'ICD10Code' for each 'PatientID', then just do a count after the separate_rows

data.1 %>%
     select(PatientID, ICD10Code) %>%
     separate_rows(ICD10Code) %>%
     count(PatientID, ICD10Code)


来源:https://stackoverflow.com/questions/59298891/r-how-to-count-all-character-values-separated-by-commas-in-a-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!