Hello all,
I have unbalanced data-set, so I down-sample majority class and train JRIP on that balanced set. As a result I have rules with number of covered instances and number of misclassified instances. But the number of misclassified instances is from balanced dataset, so it is somehow not realistic, because there is far less instances from majority class in balanced dataset used for training. What should I do, can I use the rules for dataset description and interpret their "accuracy" only from balanced testsset?
I have unbalanced data-set, so I down-sample majority class and train JRIP on that balanced set. As a result I have rules with number of covered instances and number of misclassified instances. But the number of misclassified instances is from balanced dataset, so it is somehow not realistic, because there is far less instances from majority class in balanced dataset used for training. What should I do, can I use the rules for dataset description and interpret their "accuracy" only from balanced testsset?