Sequential Pattern Mining: Example I (Cont.)
Customer-id |
Customer Sequence |
Transformed DB |
After Mapping |
1 |
⟨ (30) (90) ⟩ |
⟨ { (30) } { (90) } ⟩ |
⟨ {1} {5} ⟩ |
2 |
⟨ (10, 20) (30) (40, 60, 70) ⟩ |
⟨ { (30) } { (40) (70) (40, 70) } ⟩ |
⟨ {1} {2, 3, 4} ⟩ |
3 |
⟨ (30, 50, 70) ⟩ |
⟨ { (30) (70) } ⟩ |
⟨ {1, 3} ⟩ |
4 |
⟨ (30) (40, 70) (90) ⟩ |
⟨ { (30) } { (40) (70) (40, 70) } { (90) } ⟩ |
⟨ {1} {2, 3, 4} {5} ⟩ |
5 |
⟨ (90) ⟩ |
⟨ { (90) } ⟩ |
⟨ {5} ⟩ |
Given a database D
of customer transactions the problem of mining sequential patterns is to find the maximal sequences among all sequences that have a certain user-specified minimum support.
Each such maximal sequence represents a sequential pattern.
Below briefly describes the sequential patterns of the above customer sequences, but NOT the transformed DB.
|
|
Sequential Patterns with Support > 25% |
⟨ (30) (90) ⟩
⟨ (30) (40 70) ⟩ |
|
With minimum support set to 25% (or 2 customers since 25%
>1/5
), two sequence:
〈
{
30
}
{
90
}
〉
and
〈
{
30
}
{
40, 70
}
〉
are maximal among those satisfying the support constraint, and are the desired sequential patterns.
- The sequential pattern
〈
{
30}
{
90}
〉
is supported by Customers 1 and 4.
Customer 4 buys items {
40, 70}
in between items 30 and 90, but supports the pattern 〈
{
30}
{
90}
〉
since we are looking for patterns that are not necessarily contiguous.
- The sequential pattern
〈
{
30}
{
40, 70}
〉
is supported by Customers 2 and 4.
Customer 2 buys 60 along with 40 and 70, but supports this pattern since {
40, 70}
is a subset of {
40, 60, 70}
.
An example of a sequence that does not have minimum support is the sequence
〈
{
10, 20
}
{
30
}
〉
, which is only supported by Customer 2.
The sequences
〈
{
30
}
〉
,
〈
{
40
}
〉
,
〈
{
70
}
〉
,
〈
{
90
}
〉
,
〈
{
30
}
{
40
}
〉
,
〈
{
30, 70
}
〉
, and
〈
{
40, 70
}
〉
, though having minimum support, are not in the answer because they are not maximal.