Trying to collapse a nominal categorical vector by combining low frequency counts into an \'Other\' category:
The data (column of a dataframe) looks like this, and c
Seems to work, but it's quite ugly. Is there a more elegant solution?
collapsecatetgory <- function(x, p) {
levels_len = length(levels(x))
levels(x)[levels_len+1] = 'Other'
y = table(x)/length(x)
y1 = as.vector(y)
y2 = names(y)
y2_len = length(y2)
for (i in 1:y2_len) {
if (y1[i]<=p){
x[x==y2[i]] = 'Other'
}
}
x <- droplevels(x)
x
}