How to deal with low frequency examples in classification?
问题 I'm facing a text classification problem, and I need to classify examples to 34 groups. The problem is, the size of training data of 34 groups are not balanced. For some groups I have 2000+ examples, while for some I only have 100+ examples. For some small groups, the classification accuracy is quite high. I guess those groups may have specific key words to recognize and classify. While for some, the accuracy is low, and the prediction always goes to large groups. I want to know how to deal