Top 10 Algorithms in Data Mining (Cont.)
- PageRank
-
It is a search ranking algorithm using hyperlinks on the Web.
- AdaBoost (Adaptive Boosting)
-
AdaBoost can be used in conjunction with many other types of learning algorithms to improve performance.
The output of the other learning algorithms (“weak learners”) is combined into a weighted sum that represents the final output of the boosted classifier.
- kNN (k-Nearest Neighbor Classification)
-
One of the simplest classifier memorizes the entire training data and performs classification only if the attributes of the test object match one of the training examples exactly.
An obvious drawback of this approach is that many test records will not be classified because they do not exactly match any of the training records.
A more sophisticated approach, kNN classification, finds a group of k objects in the training set that are closest to the test object, and bases the assignment of a label on the predominance of a particular class in this neighborhood.
- Naive Bayes
-
Given a set of objects, each of which belongs to a known class, and each of which has a known vector of variables, its aim is to construct a rule which will allow us to assign future objects to a class, given only the vectors of variables describing the future objects.
- CART (Classification and Regression Trees)
-
The CART is important for the comprehensiveness of its study of decision trees, the technical innovations it introduces, its sophisticated discussion of tree-structured data analysis, and its authoritative treatment of large sample theory for trees.
The CART decision tree is a binary recursive partitioning procedure capable of processing continuous and nominal attributes both as targets and predictors.