Sequential Pattern Mining: Example I (Cont.)
3. Transformation Phase
The customer sequences are replaced by those large itemsets they contain.
All the large itemsets are mapped into a series of integers to make the mining more efficient.
|
|
Large Itemsets |
Mapped to |
(30) |
1 |
(40) |
2 |
(70) |
3 |
(40, 70) |
4 |
(90) |
5 |
|
For example, the transformation of the customer sequence of Customer 2:
- The transaction
(
10, 20)
is dropped because it does not contain any large itemset.
Parentheses ‘(
’ and ‘)
’ instead of ‘{
’ and ‘}
’ are used to avoid the ambiguity of doubly using ‘{
’ and ‘}
’ as shown next.
- The transaction
(
40, 60, 70)
is replaced by the set of itemsets {
(
40)
, (
70)
, (
40, 70)
}
because each itemset is large/frequent.
- The above set is mapped to the sequence
〈
{
1}
{
2, 3, 4}
〉
.
Customer-id |
Customer Sequence |
Transformed DB |
After Mapping |
1 |
⟨ (30) (90) ⟩ |
⟨ { (30) } { (90) } ⟩ |
⟨ {1} {5} ⟩ |
2 |
⟨ (10, 20) (30) (40, 60, 70) ⟩ |
⟨ { (30) } { (40) (70) (40, 70) } ⟩ |
⟨ {1} {2, 3, 4} ⟩ |
3 |
⟨ (30, 50, 70) ⟩ |
⟨ { (30) (70) } ⟩ |
⟨ {1, 3} ⟩ |
4 |
⟨ (30) (40, 70) (90) ⟩ |
⟨ { (30) } { (40) (70) (40, 70) } { (90) } ⟩ |
⟨ {1} {2, 3, 4} {5} ⟩ |
5 |
⟨ (90) ⟩ |
⟨ { (90) } ⟩ |
⟨ {5} ⟩ |
The last two steps will be detailed in the
Example II and the next slide will explain how the sequential patterns are found.
4. Sequence Phase
Use the set of large itemsets to find the desired sequences.
All frequent sequential patterns are generated from the transformed sequential database.
5. Maximal Phase
Find the maximal sequences among the set of large sequences.
Those sequential patterns that are contained in other super sequential patterns are pruned, since we are only interested in maximum sequential patterns.