Quantcast
Channel: Pentaho Community Forums
Viewing all articles
Browse latest Browse all 16689

What does J48 split on?

$
0
0
As far as I understand it, the J48 tree uses information gain to decide on which attributes to branch. However, for some datasets, the InfoGainAttributeEval function gives another attribute than the root node as having the highest information gain. Shouldn't the attribute with the highest information gain always be the first split in the tree (the root)?

For example, here is the unpruned J48 tree for the Iris dataset:

J48 unpruned tree
------------------

petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)

And here is the ranking of information gain:

=== Attribute Selection on all input data ===

Search Method:
Attribute ranking.

Attribute Evaluator (supervised, Class (nominal): 5 class):
Information Gain Ranking Filter

Ranked attributes:
1.418 3 petallength
1.378 4 petalwidth
0.698 1 sepallength
0.376 2 sepalwidth

So the question is, why isn’t the “petallength” the root of the tree?

Viewing all articles
Browse latest Browse all 16689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>