I have following dataframe:
a
ID a.1 b.1 a.2 b.2
1 1 40.00 100.00 NA 88.89
2 2 100.00 100.00 100 100.00
3 3 5
If you actually mean the 10% or 25% portions of your population when you say decile, quartile etc. and not the actual numeric values of the decile/quartile buckets, you can rank your values first, and apply the quantile function on the ranks:
a <- c(1,1,1,2,3,4,5,6,7,7,7,7,99,0.5,100,54,3,100,100,100,11,11,12,11,0)
a_ranks <- rank(a, ties.method = "first")
decile <- cut(a_ranks, quantile(a_ranks, probs=0:10/10), include.lowest=TRUE, labels=FALSE)
If you'd rather keep the number of quantiles, another option is to just add a little bit of jitter, e.g.
breaks = c(-Inf,quantile(a[,paste(i,1,sep=".")], na.rm=T),Inf)
breaks = breaks + seq_along(breaks) * .Machine$double.eps
Instead of cut, you can use .bincode, that accepts a non unique vector of breaks.
You get this error because quantile values in your data for columns b.1, a.2 and b.2 are the same for some levels, so they can't be directly used as breaks values in function cut().
apply(a,2,quantile,na.rm=T)
ID a.1 b.1 a.2 b.2
0% 1.00 37.5000 59.38 75.0 59.3800
25% 2.25 42.5000 100.00 87.5 91.6675
50% 3.50 58.3350 100.00 100.0 100.0000
75% 4.75 91.6675 100.00 100.0 100.0000
100% 6.00 100.0000 100.00 100.0 100.0000
One way to solve this problem would be to put quantile() inside unique() function - so you will remove all quantile values that are not unique. This of course will make less breaking points if quantiles are not unique.
res <- lapply(dup.temp[,1],function(i) {
breaks <- c(-Inf,unique(quantile(a[,paste(i,1,sep=".")], na.rm=T)),Inf)
cut(a[,paste(i,2,sep=".")],breaks)
})
[[1]]
[1] <NA> (91.7,100] (58.3,91.7] <NA> <NA> (91.7,100]
Levels: (-Inf,37.5] (37.5,42.5] (42.5,58.3] (58.3,91.7] (91.7,100] (100, Inf]
[[2]]
[1] (59.4,100] (59.4,100] (59.4,100] (-Inf,59.4] (59.4,100] (59.4,100]
Levels: (-Inf,59.4] (59.4,100] (100, Inf]