Data Classification Based on Decision Trees
This method chooses a subset of training examples to form a decision tree.
If the tree does not give the correct answer for all the objects, a selection of the exceptions is added to the samples and the process continues until the correct decision set is found.
The eventual outcome is a tree in which each leaf carries a class name, and each interior node specifies an attribute with a branch corresponding each possible value of that attribute.
Most decision-tree classifiers perform classification in two phases:
- Tree Building
An initial decision tree is grown in this phase by repeatedly partitioning the training data based on an attribute.
This process ends when all the examples in each partition belong to one class.
|
|
MakeTree( Training Data T ) {
Partition( T );
}
Partition( Data S ) {
if (all points in S are in the same class)
then return;
Evaluate splits for each attribute A;
Use best split found to partition S into S1 and S2;
Partition( S1 );
Partition( S2 );
}
|
|
- Tree Pruning
Branches are created even for spurious “noisy” data and statistical fluctuations.
These branches can lead to errors when classifying test data.
Tree pruning is aimed at removing these branches from the decision tree by selecting the subtree with the least estimated error rate.
“Wisely and slow; they stumble that run fast.”
― William Shakespeare, Romeo and Juliet
|