Slide 17.5: Sklearn (cont.)

Sklearn (Cont.)

Use Sklearn in Python
The use of this library generally starts with splitting the dataset into training and test sets, here is how you can split your data:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split( x, y, random_state=0 )

Then we need to process the data to fit it into a machine learning model. Here we generally need to scale the data which can be done by using standardization and normalization. Below is the scikit-learn’s way of processing the data:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler( ).fit( x_train )
scaler.transform( x_train )
scaler.transform( x_test )

from sklearn.preprocessing import Normalizer
scaler = Normalizer( ).fit( x_train )
scaler.transform( x_train )
scaler.transform( x_test )

As the next step, we need to fit the data into the model. Below is an implementation of training some of the most common machine learning algorithms:

from sklearn.linear_model import LinearRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn import neighbors
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

lr = LinearRegression( normalize=True )
lr.fit( x_train, y_train )

knn = neighbors.KNeighborsClassifier( n_neighbors=5 )
knn.fit( x_train, y_train )

svc = SVC( kernel='linear' )
svc.fit( x_train, y_train )

k_means = KMeans( n_clusters=3, random_state=0 )
k_means.fit( x_train )

pca = PCA( n_components=0.95 )
pca.fit_transform( x_train )