Problem Setting Experiments Main Contributions ...

5 downloads 0 Views 3MB Size Report
we can prune the branch. Top-20 patterns Journal Of Machine Learning Abstracts. Fragment of the novel Moby Dick with 4 sequential patterns highlighted.
Mining Top-k Quantile-based Cohesive Sequential Patterns Len Feremans, Boris Cule, Bart Goethals University of Antwerp, Belgium

Problem Setting

Algorithm

Upper bound

Experiments

In a single event sequence !, a sequence of events (or items), find sequential

We generate candidate sequential patterns using

Key contribution:

patterns "#:

constrained prefix-projected pattern growth

New interesting sequential patterns, existing methods struggle to find, come out on top:

• Where cohesive occurrences of "# have items near each other • The percentage (or quantile) of cohesive occurrences versus all occurrences is high , , ∈ ./012 )* ∧ 4, )* ≤ 6 . )* | $%&'( )* = |./012 )* | where

Upper bound on quantile-based cohesion of any super-sequence that can be generated in the branch of the search tree: If the current minimal value of $%&'( in the heap is smaller than $Z'[%&'( )*, \ we can prune the branch

./012 )* = , 9, , ∈ ; ∧ 9 ∈ )* } 4 , )* =

min

; ,' , ,@

,' ≤ , ≤ ,@ ∧ )* ≺ ;[,', ,@]}

∞ 9E ∄; ,', ,@ : ,' ≤ , ≤ ,@ ∧ )* ≺ ;[,', ,@] Problem: • Given 1 < )* ≤ JKL#MNO ∀ 9 ∈ );: *&QQ/2, 9 ≥ S a cohesion threshold T Ø Find the exact set of top-U sequential pattern ranked according to $%&'( )* Difficulties: • Given Ω items, there are W JKL#MNO candidates • $%&'(()*) is not anti-monotonic Example:

Top-20 patterns Journal Of Machine Learning Abstracts

Sequential patterns with longer length are discovered in top-250: support, vector, machin, svm reproduce, hilbert, space chain, monte, carlo reproduc, kernel, hilbert, space nearest, neighbor, rule independ, compon, analysi induct, logic, program blind, sourc, separ leave, cross, valid markov, chain, mont, carlo ….

#49 #54 #58 #87 #132 #139 #143 #144 #159 #183

Main Contributions • New interestingness measure for ranking sequential patterns • Efficient algorithm for discovering the exact top-k set of sequential patterns ranked using $%&'(()*) , based on constrained prefixed-projected pattern growth and a new upper bound for pruning • Compared to state-of-the-art methods, we discover new interesting patterns that can be less frequent and longer and where we know a high percentage of occurrences is cohesive

Fragment of the novel Moby Dick with 4 sequential patterns highlighted.

2018 SIAM International Conference on Data Mining

Suggest Documents