An Example of Decision Tree Using Sklearn


A decision tree is a flow chart, and can help you make decisions based on previous experience. In the example, a person will try to decide if he/she should go to a comedy show or not based on the data saved in data.csv.

A Decision Tree Example Using Sklearn
Now, based on this data set, Python can create a decision tree that can be used to decide if any new shows are worth attending to:
  1. Read the dataset with pandas.
  2. To make a decision tree, all data has to be numerical. We have to convert the non numerical columns “Nationality” and “Go” into numerical values. Pandas has a map( ) method that takes a dictionary with information on how to convert the values. For example,
       { 'UK': 0, 'USA': 1, 'N': 2 }
    which converts the values UK to 0, USA to 1, and N to 2.
  3. Separate the feature columns from the target column. The feature columns are the columns that we try to predict from, and the target column is the column with the values we try to predict.
  4. Create the actual decision tree, fit it with our details.
  5. Use the decision tree to predict new values. For example: Should I go see a show starring a 40 years old American comedian, with 10 years of experience, and a comedy ranking of 6?
Below is the Python source code for the decision tree method. The decision tree gives you different results if you run it enough times, even if you feed it with the same data. It is because the decision tree does not give us a 100% certain answer. It is based on the probability of an outcome, and the answer will vary. The code does not work for the Web since Python is not a client-side language. To see a working example, check the W3Schools.

Training data (data.csv)
The decision tree
The decision
The Python Source Code for a Decision Tree
# Three lines to make Python compiler able to draw:
import sys
import matplotlib
matplotlib.use( 'Agg' )

import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt

df = pandas.read_csv( "data.csv" )

d = { 'UK': 0, 'USA': 1, 'N': 2 }
df['Nationality'] = df['Nationality'].map( d )
d = { 'YES': 1, 'NO': 0 }
df['Go'] = df['Go'].map( d )

features = [ 'Age', 'Experience', 'Rank', 'Nationality' ]

X = df[ features ]
y = df[ 'Go' ]

dtree = DecisionTreeClassifier( )
dtree = dtree.fit( X, y )

# Predict new values.
# Should I go see a show
#  starring a 40 years old American comedian,
#  with 10 years of experience, and
#  a comedy ranking of 6?
print( dtree.predict( [[40, 10, 6, 1]] ) )

print( "[1] means 'GO'" )
print( "[0] means 'NO'" )

tree.plot_tree( dtree, feature_names=features )

# Two lines to make our Python able to draw:
plt.savefig( sys.stdout.buffer )
sys.stdout.flush( )