Combining low frequency counts

后端 未结 7 736
没有蜡笔的小新
没有蜡笔的小新 2020-12-03 19:24

Trying to collapse a nominal categorical vector by combining low frequency counts into an \'Other\' category:

The data (column of a dataframe) looks like this, and c

7条回答
  •  一生所求
    2020-12-03 20:13

    Seems to work, but it's quite ugly. Is there a more elegant solution?

    collapsecatetgory <- function(x, p) {
    levels_len = length(levels(x))
    levels(x)[levels_len+1] = 'Other'
    y = table(x)/length(x)
    y1 = as.vector(y)
    y2 = names(y)
    y2_len = length(y2)
    
    for (i in 1:y2_len) {
        if (y1[i]<=p){
              x[x==y2[i]] = 'Other'
            }
         }
    x <- droplevels(x)
    x
    }
    

提交回复
热议问题