Slide 13.21: An example of decision tree using Sklearn

Slide 13.20: The third iteration for information gain
Slide 14.1: K-nearest neighbor (kNN) algorithm
Home Print version

An Example of Decision Tree Using Sklearn

A decision tree is a flow chart, and can help you make decisions based on previous experience. In the example, a person will try to decide if he/she should go to a comedy show or not based on the data saved in data.csv.

A Decision Tree Example Using Sklearn
Now, based on this data set, Python can create a decision tree that can be used to decide if any new shows are worth attending to:

Read the dataset with pandas.
To make a decision tree, all data has to be numerical. We have to convert the non numerical columns “Nationality” and “Go” into numerical values. Pandas has a map( ) method that takes a dictionary with information on how to convert the values. For example,
```
   { 'UK': 0, 'USA': 1, 'N': 2 }
```
which converts the values UK to 0, USA to 1, and N to 2.
Separate the feature columns from the target column. The feature columns are the columns that we try to predict from, and the target column is the column with the values we try to predict.
Create the actual decision tree, fit it with our details.
Use the decision tree to predict new values. For example: Should I go see a show starring a 40 years old American comedian, with 10 years of experience, and a comedy ranking of 6?

Below is the Python source code for the decision tree method. The decision tree gives you different results if you run it enough times, even if you feed it with the same data. It is because the decision tree does not give us a 100% certain answer. It is based on the probability of an outcome, and the answer will vary. The code does not work for the Web since Python is not a client-side language. To see a working example, check the W3Schools.

Training data (data.csv)

The decision tree

The decision

The Python Source Code for a Decision Tree

# Three lines to make Python compiler able to draw:
import sys
import matplotlib
matplotlib.use( 'Agg' )

import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt

df = pandas.read_csv( "data.csv" )

d = { 'UK': 0, 'USA': 1, 'N': 2 }
df['Nationality'] = df['Nationality'].map( d )
d = { 'YES': 1, 'NO': 0 }
df['Go'] = df['Go'].map( d )

features = [ 'Age', 'Experience', 'Rank', 'Nationality' ]

X = df[ features ]
y = df[ 'Go' ]

dtree = DecisionTreeClassifier( )
dtree = dtree.fit( X, y )

# Predict new values.
# Should I go see a show
#  starring a 40 years old American comedian,
#  with 10 years of experience, and
#  a comedy ranking of 6?
print( dtree.predict( [[40, 10, 6, 1]] ) )

print( "[1] means 'GO'" )
print( "[0] means 'NO'" )

tree.plot_tree( dtree, feature_names=features )

# Two lines to make our Python able to draw:
plt.savefig( sys.stdout.buffer )
sys.stdout.flush( )

◀
Previous

Slide 13.20: The third iteration for information gain
Slide 14.1: K-nearest neighbor (kNN) algorithm
Home Print version

▶
Next

Worry is a total waste of time.
It doesn’t change anything.
All it does is steal your joy and keep you very busy doing nothing.