we can prune the branch. Top-20 patterns Journal Of Machine Learning Abstracts. Fragment of the novel Moby Dick with 4 sequential patterns highlighted.
Mining Top-k Quantile-based Cohesive Sequential Patterns Len Feremans, Boris Cule, Bart Goethals University of Antwerp, Belgium
Problem Setting
Algorithm
Upper bound
Experiments
In a single event sequence !, a sequence of events (or items), find sequential
We generate candidate sequential patterns using
Key contribution:
patterns "#:
constrained prefix-projected pattern growth
New interesting sequential patterns, existing methods struggle to find, come out on top:
• Where cohesive occurrences of "# have items near each other • The percentage (or quantile) of cohesive occurrences versus all occurrences is high , , ∈ ./012 )* ∧ 4, )* ≤ 6 . )* | $%&'( )* = |./012 )* | where
Upper bound on quantile-based cohesion of any super-sequence that can be generated in the branch of the search tree: If the current minimal value of $%&'( in the heap is smaller than $Z'[%&'( )*, \ we can prune the branch
./012 )* = , 9, , ∈ ; ∧ 9 ∈ )* } 4 , )* =
min
; ,' , ,@
,' ≤ , ≤ ,@ ∧ )* ≺ ;[,', ,@]}
∞ 9E ∄; ,', ,@ : ,' ≤ , ≤ ,@ ∧ )* ≺ ;[,', ,@] Problem: • Given 1 < )* ≤ JKL#MNO ∀ 9 ∈ );: *&QQ/2, 9 ≥ S a cohesion threshold T Ø Find the exact set of top-U sequential pattern ranked according to $%&'( )* Difficulties: • Given Ω items, there are W JKL#MNO candidates • $%&'(()*) is not anti-monotonic Example:
Top-20 patterns Journal Of Machine Learning Abstracts
Sequential patterns with longer length are discovered in top-250: support, vector, machin, svm reproduce, hilbert, space chain, monte, carlo reproduc, kernel, hilbert, space nearest, neighbor, rule independ, compon, analysi induct, logic, program blind, sourc, separ leave, cross, valid markov, chain, mont, carlo ….
#49 #54 #58 #87 #132 #139 #143 #144 #159 #183
Main Contributions • New interestingness measure for ranking sequential patterns • Efficient algorithm for discovering the exact top-k set of sequential patterns ranked using $%&'(()*) , based on constrained prefixed-projected pattern growth and a new upper bound for pruning • Compared to state-of-the-art methods, we discover new interesting patterns that can be less frequent and longer and where we know a high percentage of occurrences is cohesive
Fragment of the novel Moby Dick with 4 sequential patterns highlighted.
2018 SIAM International Conference on Data Mining