Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

using weka to forecast nominal dependent time class and imbalanced data

$
0
0
hello all,
i start a data mining research with weka( my first research) and struggle with some problem.

at first, my data set is an imbalanced dataset(class "yes" with about 96% of instances and class "no" with 4%).
therefore, a lot of algorithm are unuseful. naive bayes gives an excellent result,or this is at least what i was thaught. i divided the data set to 2 parts:

the first one includes data from years 2008 to 2012, and the other one includes data from 2013.

i applied the naive bayes algorithem in the first data set(years 2008-2012) using cross-validation and get good results.
i also applied the naive bayes on the second data set using naive bayes cross-validation and get good results.

the problem start when i tried to use the model learned from the first data set on the second data set, thani get terrible results.

at first i thaught the problem is because the attributes
distribution different between the data sets, but the mean and stdv is differ only in some tenth precentge. i think that the reason to the poor result is because that the probability of the minority value of my class decrease between the years(from 6% to 4% during all the years). what approach i need to use to be able forecasting the data set of year 2013 using the data set from 2008-2012?

thank you all,and sorry about my poor english/

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>