Sklearn (Cont.)
Use Sklearn in Python (Cont.)
The next step is to make predictions on the test set:
y_pred = lr.predict( x_test )
ypred = k_means.predict( x_test )
y_pred = knn.predict_proba( x_test )
|
The last step is to determine how the machine learning model performed on the test set. Below are the method provided by the Scikit-learn library to evaluate the performance of machine learning models for the tasks of classification, regression, and clustering:
# Classification
from sklearn.metrics import accuracy_score
accuracy_score( y_test, y_pred )
# Regression
from sklearn.metrics import mean_absolute_error
mean_absolute_error( y_test,y_pred )
# Clustering
from sklearn.metrics import adjusted_rand_score
adjusted_rand_score( y_test,y_pred )
|
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is focused on modeling the data. Some of the most popular groups of models provided by Sklearn are as follows:
- Supervised learning algorithms:
Almost all the popular supervised learning algorithms, like linear regression, support vector machine (SVM), decision tree, etc., are the part of scikit-learn.
- Unsupervised learning algorithms:
On the other hand, it also has all the popular unsupervised learning algorithms from clustering, factor analysis, PCA (principal component analysis) to unsupervised neural networks.
- Clustering,
which is used for grouping unlabeled data
- Cross validation,
which is used to check the accuracy of supervised models on unseen data
- Dimensionality reduction,
which is used for reducing the number of attributes in data which can be further used for summarisation, visualisation and feature selection
- Ensemble methods:
As name suggest, it is used for combining the predictions of multiple supervised models.
- Feature extraction,
which is used to extract the features from data to define the attributes in image and text data
- Feature selection,
which is used to identify useful attributes to create supervised models
- Open source,
which is open source library and also commercially usable under BSD license