I have a classification problem, with 5 categories, and the aim is to classify text according to these 5 categories. Several data mining experts suggested trying one-vs-others comparison, in order to reveal categories that have the most influence on the overall accuracy of classifier.
However, distribution of test data over different categories is not same. Even if it was, when I create features with two classes where on is "testing" class and other all other classes, I get bad results. The reason is simple; classifier always predicts other category, since that one has more instances than the "testing" one (over 4 times more).
If anyone has suggestion how to test one class vs others, I would appreciate help.
Thanks,
However, distribution of test data over different categories is not same. Even if it was, when I create features with two classes where on is "testing" class and other all other classes, I get bad results. The reason is simple; classifier always predicts other category, since that one has more instances than the "testing" one (over 4 times more).
If anyone has suggestion how to test one class vs others, I would appreciate help.
Thanks,