How to get the mode of a group in summarize in R

被刻印的时光 ゝ 提交于 2019-11-30 04:05:34

问题


I want to compare costs of CPT codes from two different claims payers. Both have par and non par priced providers. I am using dplyr and modeest::mlv, but its not working out as anticipated. Heres some sample data;

source CPTCode ParNonPar Key         net_paid  PaidFreq seq
ABC   100       Y      ABC100Y  -341.00     6   1
ABC   100       Y      ABC100Y     0.00     2   2
ABC   100       Y      ABC100Y   341.00     6   3
XYZ   103       Y      XYZ103Y   740.28     1   1
XYZ   104       N      XYZ104N     0.00     2   1
XYZ   104       N      XYZ104N   401.82     1   2
XYZ   104       N      XYZ104N   726.18     1   3
XYZ   104       N      XYZ104N   893.00     1   4
XYZ   104       N      XYZ104N   928.20     2   5
XYZ   104       N      XYZ104N   940.00     2   6

and the code

str(data)
View(data)

## Expand frequency count to individual observations
n.times <- data$PaidAmounts
dataObs <- data[rep(seq_len(nrow(data)), n.times),]

## Calculate mean for each CPTCode (for mode use modeest library)
library(dplyr)
library(modeest)
dataSummary <- dataObs %>%
  group_by(ParNonPar, CPTCode) %>%
  summarise(mean = mean(net_paid),
            median=median(net_paid),
            mode = mlv(net_paid, method=mfv),
            total = sum(net_paid))
str(dataSummary)                     

I thought I could load modeest in the summarize function with the mean and median, but this formulation errors out with Error in as.character(x) : cannot coerce type 'closure' to vector of type 'character' Without mlv I am getting a df like this, but what I want is to get all the stats for a payer cpt on one line. I envision graphing it in boxplots by limiting the x and y segments, once I get what I need on a row

the inadequate answer is this ( I forgot to get the payer name in here!)

ParNonPar   CPTCode mean          median(net_paid)  total
N           0513F   0.000000    0.000           0.00
N           0518F   0.000000    0.000           0.00 
N           10022   0.000000    0.000           0.00
N           10060   73.660000   90.120        294.64
N           10061   324.575000  340.500      1298.30
N           10081   312.000000  312.000       312.00

thanks very much for your time and effort.

回答1:


You need to make a couple of changes to your code for mlv to work.

  1. the method (mfv) has to be within quotes ('mfv'). That is what is causing your error.
  2. After you do that, since mlv returns a list, you have to feed one value to summarise(). Assuming that you want the mode ('M'), you pick that element from the list.

Try:

dataSummary <- dataObs %>%
  group_by(ParNonPar, CPTCode) %>%
  summarise(mean = mean(net_paid), 
            meadian=median(net_paid), 
            mode = mlv(net_paid, method='mfv')[['M']], 
            total = sum(net_paid))

to get:

> dataSummary
Source: local data frame [3 x 6]
Groups: ParNonPar

  ParNonPar CPTCode     mean meadian     mode   total
1         N     104 639.7111  893.00 622.7333 5757.40
2         Y     100   0.0000    0.00   0.0000    0.00
3         Y     103 740.2800  740.28 740.2800  740.28

Hope that helps you move forward.




回答2:


I use this approach:

df <- data.frame(groups = c("A", "A", "A", "B", "B", "C", "C", "C", "D"), nums = c("1", "2", "1", "2", "3", "4", "5", "5", "1"))

which looks like:

 groups nums
  A    1
  A    2
  A    1
  B    2
  B    3
  C    4
  C    5
  C    5
  D    1

Then I define:

mode <- function(codes){
  which.max(tabulate(codes))
}

and do the following:

mds <- df %>%
  group_by(groups) %>%
  summarise(mode = mode(nums))

giving:

  groups  mode
 A          1
 B          2
 C          5
 D          1


来源:https://stackoverflow.com/questions/30385626/how-to-get-the-mode-of-a-group-in-summarize-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!