Combining low frequency counts

后端 未结 7 742
没有蜡笔的小新
没有蜡笔的小新 2020-12-03 19:24

Trying to collapse a nominal categorical vector by combining low frequency counts into an \'Other\' category:

The data (column of a dataframe) looks like this, and c

7条回答
  •  情话喂你
    2020-12-03 19:54

    Using the package dplyr, and assuming your data frame (let's call it State) has one field called ID for each State name...

    filtered_data <-  State %>% group_by(ID) %>% summarise(n = n(), 
                                                           freq = n/nrow(State),  
                                                           above_thresh = freq > 0.2) 
    
    filtered_data$State[filtered_data$above_thres == TRUE] <- "above_0.2"
    

    effectively what this does is gives the state name of anything with a frequency of 0.2, the label "above_0.2".

提交回复
热议问题