问题
Below is a couple of rows of some test data I am using. I am wanting to count the frequency of all the characters in the ICD10Code column which are separated by columns. From the segment of code below, I used group_by because every "PatientId" value had duplicates in that column but had unique values in other columns. How can I go about counting the frequency of all character values?
PatientId ReferralSource NextAppt Age InsuranceName ICD10Code
1584 St Francis Y 34 SLIDING FEE SCHEDULE M5136, N809, R51, Z6831
2655 Piedmont Hospital Y 60 Medicaid-GA (Medicaid) E119, E782, I10, L729, R809
The result would look something like this below.
M5136=1
N809=1
R51=1
Being fairly new to R, I tried this segment of code found in Stack (sapply) and just produced a total count for each row specific row.
data.id <- data.1 %>% group_by(PatientId) %>%
summarise(ReferralSource=first(ReferralSource),NextAppt=first(NextAppt),
Age=max(Age),InsuranceName=toString(unique(InsuranceName)),
ICD10Code=toString(unique(ICD10Code)))
sapply(strsplit(data.id$ICD10Code,","),FUN=function(x){length(x[x!="Null"])})
That produced the total count for each row.
[1] 10 17 5 18 6 5 8 7 2 8 3 8 10 14 5 5 9 8 11 5 6 5 9 16 9 4 3 9 18 9 12
12 12 2 16 6 10
[38] 2 2 3 4 9 7 12 5 10 16 13 9 1 6 2 7 9 8 5 5 4 3 11 19 6 4 3 7 8 6
10 8 6 16 11 5 9
[75] 13 5 8 4 10 3 7 5 6 4 3 4 8 7 7 4 5 9 2 6 1 20 3 3 3 4 5 5 7 3
12 7 16 1 7 6 3
[112] 4 2 7 8 4 1 9 3 8 3 8 5 8 2 4 4 8 4 7 10 8 2 4 4 2 9 7 7 5 1
8 6 10 9 3 11 10
[149] 3 6 4 6 13 3 7 11 6 5 4 3 1 4 10 10 10 10 11 2 1 5 4 5 5 5 5 9 5 7
7 2 6 7 7 6 5
[186] 7 8 9
回答1:
To count the frequency of ICD10Code in the entire column, we can split the string on comma, unlist it and count it with table.
table(unlist(strsplit(as.character(data.1$ICD10Code), ',')))
回答2:
One option would be to use separate_rows on the 'ICD10Code' column (assuming it as character class), use that as grouping variable along with 'PatientID' and get the count (n()) in summarise along with the other variables needed in output as showed in OP's post
library(dplyr)
library(tidyr)
data.1 %>%
separate_rows(ICD10Code) %>%
group_by(PatientID, ICD10Code) %>%
summarise(Count = n(),
ReferralSource=first(ReferralSource),
NextAppt=first(NextAppt),
Age=max(Age),
InsuranceName=toString(unique(InsuranceName)))
If in case the other summary output should be only based on grouping by 'PatientID', use 'Count' also as grouping variable instead of 'ICD10Code'
If we want a count only from 'ICD10Code' for each 'PatientID', then just do a count after the separate_rows
data.1 %>%
select(PatientID, ICD10Code) %>%
separate_rows(ICD10Code) %>%
count(PatientID, ICD10Code)
来源:https://stackoverflow.com/questions/59298891/r-how-to-count-all-character-values-separated-by-commas-in-a-column