As far as I understand it, the J48 tree uses information gain to decide on which attributes to branch. However, for some datasets, the InfoGainAttributeEval function gives another attribute than the root node as having the highest information gain. Shouldn't the attribute with the highest information gain always be the first split in the tree (the root)?
For example, here is the unpruned J48 tree for the Iris dataset:
J48 unpruned tree
------------------
petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
And here is the ranking of information gain:
=== Attribute Selection on all input data ===
Search Method:
Attribute ranking.
Attribute Evaluator (supervised, Class (nominal): 5 class):
Information Gain Ranking Filter
Ranked attributes:
1.418 3 petallength
1.378 4 petalwidth
0.698 1 sepallength
0.376 2 sepalwidth
So the question is, why isn’t the “petallength” the root of the tree?
For example, here is the unpruned J48 tree for the Iris dataset:
J48 unpruned tree
------------------
petalwidth <= 0.6: Iris-setosa (50.0)
petalwidth > 0.6
| petalwidth <= 1.7
| | petallength <= 4.9: Iris-versicolor (48.0/1.0)
| | petallength > 4.9
| | | petalwidth <= 1.5: Iris-virginica (3.0)
| | | petalwidth > 1.5: Iris-versicolor (3.0/1.0)
| petalwidth > 1.7: Iris-virginica (46.0/1.0)
And here is the ranking of information gain:
=== Attribute Selection on all input data ===
Search Method:
Attribute ranking.
Attribute Evaluator (supervised, Class (nominal): 5 class):
Information Gain Ranking Filter
Ranked attributes:
1.418 3 petallength
1.378 4 petalwidth
0.698 1 sepallength
0.376 2 sepalwidth
So the question is, why isn’t the “petallength” the root of the tree?