Sequential Pattern Mining: Algorithm


Sequential pattern mining is evolved from association rule mining. The major difference between them is the order of items is ignored by association rules, but not by the sequential patterns. For example: from {1}, {2} sequential pattern mining can generate two candidates {1} {2} and {2} {1}, but association rule mining only generates {1, 2}. The sequential pattern mining algorithm has basic steps to form reasonable sequential patterns, and the steps consist of the following five phases:
  1. Sort phase: Transaction database is sorted by user id and then sorted by transaction time. Through this phase, the transaction database becomes sequence database ordered by time.

  2. L-itemsets (large itemsets) phase: A large item is a certain item which is presented frequently. The acceptable standard of selection of large items is minimum support. Support of items indicates the number of peoples who contain the items. So, minimum support is a threshold value for picking out large items. We can determine minimum support as 20 percent or 25 percent of the whole transactions, and then choose the items whose frequencies are above the minimum support.

  3. Transformation phase: The sequence database which is passed through the first phase transforms into reduced sequence database. The reduced database consists of selected sequences which contain large items. According to transformation of sequence database, we can easily focus on significant sequence.

  4. Sequence phase: This phase generates sequential patterns within the transformed sequential database which is a result of third phase.

  5. Maximal phase: This phase selects the maximal sequential pattern among the candidate sequential patterns. The maximal sequential pattern is the final pattern of sequential pattern mining.