I am checking using the below function what are the most frequent words per category and then observe how some sentences would be classified. The results are surprisingly wrong
The order of names in cat variable and newsgroup_train.target_names is different. The labels assigned target_names are sorted, see here
Output of:
print(cat)
['sci.space','rec.autos','rec.motorcycles']
print(newsgroups_train.target_names)
['rec.autos', 'rec.motorcycles', 'sci.space']
You should this line:
print(" - Predicted as: '{}'".format(cats[predicted]))
to
print(" - Predicted as: '{}'".format(newsgroup_train.target_names[predicted]))