Cut() error - 'breaks' are not unique

后端 未结 4 1180
甜味超标
甜味超标 2020-11-29 04:12

I have following dataframe:

 a         
    ID   a.1    b.1     a.2   b.2
1    1  40.00   100.00  NA    88.89
2    2  100.00  100.00  100   100.00
3    3  5         


        
相关标签:
4条回答
  • 2020-11-29 04:15

    If you actually mean the 10% or 25% portions of your population when you say decile, quartile etc. and not the actual numeric values of the decile/quartile buckets, you can rank your values first, and apply the quantile function on the ranks:

    a <- c(1,1,1,2,3,4,5,6,7,7,7,7,99,0.5,100,54,3,100,100,100,11,11,12,11,0)
    a_ranks <- rank(a, ties.method = "first")
    decile <- cut(a_ranks, quantile(a_ranks, probs=0:10/10), include.lowest=TRUE, labels=FALSE)  
    
    0 讨论(0)
  • 2020-11-29 04:21

    If you'd rather keep the number of quantiles, another option is to just add a little bit of jitter, e.g.

    breaks = c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)
    breaks = breaks + seq_along(breaks) * .Machine$double.eps
    
    0 讨论(0)
  • 2020-11-29 04:27

    Instead of cut, you can use .bincode, that accepts a non unique vector of breaks.

    0 讨论(0)
  • 2020-11-29 04:36

    You get this error because quantile values in your data for columns b.1, a.2 and b.2 are the same for some levels, so they can't be directly used as breaks values in function cut().

    apply(a,2,quantile,na.rm=T)
           ID      a.1    b.1   a.2      b.2
    0%   1.00  37.5000  59.38  75.0  59.3800
    25%  2.25  42.5000 100.00  87.5  91.6675
    50%  3.50  58.3350 100.00 100.0 100.0000
    75%  4.75  91.6675 100.00 100.0 100.0000
    100% 6.00 100.0000 100.00 100.0 100.0000
    

    One way to solve this problem would be to put quantile() inside unique() function - so you will remove all quantile values that are not unique. This of course will make less breaking points if quantiles are not unique.

    res <- lapply(dup.temp[,1],function(i) {
      breaks <- c(-Inf,unique(quantile(a[,paste(i,1,sep=".")], na.rm=T)),Inf)
      cut(a[,paste(i,2,sep=".")],breaks)
    })
    
    [[1]]
    [1] <NA>        (91.7,100]  (58.3,91.7] <NA>        <NA>        (91.7,100] 
    Levels: (-Inf,37.5] (37.5,42.5] (42.5,58.3] (58.3,91.7] (91.7,100] (100, Inf]
    
    [[2]]
    [1] (59.4,100]  (59.4,100]  (59.4,100]  (-Inf,59.4] (59.4,100]  (59.4,100] 
    Levels: (-Inf,59.4] (59.4,100] (100, Inf]
    
    0 讨论(0)
提交回复
热议问题