Classification Error
Still another way to measure impurity degree is using index of classification error, which refers to the misclassification of examinees into the pass and fail categories when a passing score is applied.
Classification errors can occur in both directions.
That is, a truly competent examinee might fail the test, while an incompetent examinee might pass the test.
A primary goal of well-designed exam programs is to minimize classification error.
Classification Error = 1 – max{pj}
where pj
is the probability of the class value j
.
For example, given that
Prob( Bus ) = 4 / 10 = 0.4 # 4B / 10 rows
Prob( Car ) = 3 / 10 = 0.3 # 3C / 10 rows
Prob( Train ) = 3 / 10 = 0.3 # 3T / 10 rows
we can now compute Classification error as
Classification error
= 1 – Max{0.4, 0.3, 0.3}
= 1 – 0.4 = 0.60
Similar to Entropy and Gini Index, Classification error index of a pure table (consist of single class) is zero because the probability is 1 and 1-max(1)=0
.
The value of classification error index is always between 0 and 1.
In fact the maximum Gini index for a given number of classes is always equal to the maximum of classification error index because for a number of classes n
, we set probability is equal to p=1/n
and maximum Gini index happens at
1 – n×(1/n)2 = 1 – 1/n
while maximum classification error index also happens at
1 – max{1/n} = 1 – 1/n
Knowing how to compute degree of impurity, now we are ready to proceed with decision tree algorithms.