An Example of Sequential Pattern Mining
The last two steps of a sequential pattern mining algorithm ignored by the previous example will be discussed in the second example.
The table below gives a database with the customer-sequences.
We have not shown the original database in this example.
The customer sequences are in transformed form where each transaction has been replaced by the set of large itemsets contained in the transaction and the large itemsets have been replaced by integers.
|
|
|
For example, the customer sequence of ID 2 includes four transactions:
〈
{
1}
{
3}
{
4}
{
3, 5}
〉
It might be interpreted as
〈
{
milk}
{
bread}
{
butter}
{
bread, cheese}
〉
Two more terms are required for the following discussions:
- The length of a sequence is the number of itemsets in the sequence.
- A sequence of length
k
is called a k-sequence.
Modified Apriori algorithms are used in the Step 4. Sequence Phase to find all frequent sequences.
Assume the minimum support has been specified to be 40% = 2/5 (i.e., 2 customer sequences).
The first pass over the database results the large 1-sequences shown below.
For example, the support of the sequence 〈 1〉 is 4 because the Itemset 1 appears in Customer IDs 1 to 4.
|
|
|