I\'m having trouble finding a function in R that performs equal-frequency discretization. I stumbled on the \'infotheo\' package, but after some testing I found that the al
This sort of thing is also quite easily solved by using (abusing?) the conditioning plot infrastructure from lattice, in particular function co.intervals()
:
cutEqual <- function(x, n, include.lowest = TRUE, ...) {
stopifnot(require(lattice))
cut(x, co.intervals(x, n, 0)[c(1, (n+1):(n*2))],
include.lowest = include.lowest, ...)
}
Which reproduces @Joris' excellent answer:
> set.seed(12345)
> x <- rnorm(50)
> table(cutEqual(x, 5))
[-2.38,-0.885] (-0.885,-0.115] (-0.115,0.587] (0.587,0.938] (0.938,2.2]
10 10 10 10 10
> y <- rpois(50, 5)
> table(cutEqual(y, 5))
[0.5,3.5] (3.5,5.5] (5.5,6.5] (6.5,7.5] (7.5,11.5]
10 13 11 6 10
In the latter, discrete, case the breaks are different although they have the same effect; the same observations are in the same bins.
Here's another solution using mltools.
set.seed(1)
x <- round(rnorm(20), 2)
x.binned <- mltools::bin_data(x, bins = 5, binType = "quantile")
table(x.binned)
x.binned
[-2.21, -0.622) [-0.622, 0.1) [0.1, 0.526) [0.526, 0.844) [0.844, 1.6]
4 4 4 4 4