Slide 13.10: Classification error

Classification Error

Still another way to measure impurity degree is using index of classification error, which refers to the misclassification of examinees into the pass and fail categories when a passing score is applied. Classification errors can occur in both directions. That is, a truly competent examinee might fail the test, while an incompetent examinee might pass the test. A primary goal of well-designed exam programs is to minimize classification error.

    Classification Error = 1 – max{p_j}

where p_j is the probability of the class value j. For example, given that

    Prob( Bus )   = 4 / 10 = 0.4        # 4B / 10 rows
    Prob( Car )   = 3 / 10 = 0.3        # 3C / 10 rows
    Prob( Train ) = 3 / 10 = 0.3        # 3T / 10 rows

we can now compute Classification error as

     Classification error
   = 1 – Max{0.4, 0.3, 0.3}
   = 1 – 0.4 = 0.60

Similar to Entropy and Gini Index, Classification error index of a pure table (consist of single class) is zero because the probability is 1 and 1-max(1)=0. The value of classification error index is always between 0 and 1. In fact the maximum Gini index for a given number of classes is always equal to the maximum of classification error index because for a number of classes n, we set probability is equal to p=1/n and maximum Gini index happens at

     1 – n×(1/n)² = 1 – 1/n

while maximum classification error index also happens at

     1 – max{1/n} = 1 – 1/n

Knowing how to compute degree of impurity, now we are ready to proceed with decision tree algorithms.