SPARK ML, Naive Bayes classifier: high probability prediction for one class
问题 I am using Spark ML to optimise a Naive Bayes multi-class classifier. I have about 300 categories and I am classifying text documents. The training set is balanced enough and there is about 300 training examples for each category. All looks good and the classifier is working with acceptable precision on unseen documents. But what I am noticing that when classifying a new document, very often, the classifier assigns a high probability to one of the categories (the prediction probability is