This post presents the key papers about episode mining. If you are not familiar with what is episode mining, it is a data mining task, which aims at finding patterns in a sequence of events or symbols . A short introduction to episode mining is posted on my blog. Finding patterns in a sequence can be useful to discover regularities that may provide insights about the data, and even support decision making.
Below, I list the most important episode mining papers as a table with my comments on each paper to allow to get a quick overview of the field. That list is based on my analysis of the papers and is thus subjective and I may have missed some papers that other researchers would have deemed important.
|Author and date||Paper title||Algorithm(s)||Key idea|
|Mannila et al. (1995)||Discovering frequent episodes in sequences||WINEPI,|
|– This paper proposed the problem of episode mining. It is thus a key paper in that field.|
– A first algorithm called WINEPI finds episodes
by performing a breadth-first search and using a sliding window. It counts the support (occurrence frequency) of an episode as the number of windows where the episode appears. However, WINEPIhas the problem that an occurrence may be counted more than once.
– To address this issue, a second algorithm called MINEPI find frequent episodes by only considering the minimal occurrences of each episode.
– This paper also presents a basic algorithm to generate episode rules by combining pairs of episodes. This is done as post-processing after applying WINEPI or MINEPI.
|Huang et a. (2008)||Efficient mining of frequent episodes from complex sequences||EMMA,|
|– It is observed that window-based algorithms for episode mining such as MINEPI and WINEPI sometimes produce some strange results.|
– To address this issue, a novel measure is proposed to count the support (number of occurrences) of an episode called the head support or head frequency.
– Two algorithms are designed to find frequent episodes using the head frequency: EMMA (a depth-first search algorithm) and MINEPI+ (a modified version of MINEPI)
|Fournier-Viger et al. (2019)||TKE: Mining Top-K Frequent Episodes||TKE||– This paper makes the observation that it is difficult for users to set the minsup parameter for frequent episode mining. As a result users often have to spend considerable time fine tuning the parameters and may find too many or too few episodes.|
– As a solution this paper redefine the task of episode mining as top-k frequent episode mining, where the user can directly choose the number of patterns to be discovered (which is more intuitive).
– The TKE algorithm is defined to efficiently find the top-k most frequent episodes.
|Ao et al. (2018)||Online frequent episode mining||– This paper extended the concept of episode mining|
|Zhou et al. (2010)||Mining closed episodes from event sequences efficiently||Clo_episode||– This paper designed an algorithm to find closed episodes. The goal is to reduce the number of episodes presented by only showing some episodes that are said to be closed. This gives sometimes a small set of episodes that summarizes the frequent episodes.|
– An algorithm is presented to find closed episodes called Clo_episode. This algorithm adopts a breadth-first search and use the concept of minimal occurrences.
– A limitation: this paper cannot handle the case of multiple events happening at the same time (parallel episodes). while most algorithms can handle this case.
|Tati and Cule (2011)||Mining closed episodes with simultaneous events||– This paper presents algorithms to find closed episodes where events may be simultaneous.|
– This fixes the main limitation of the paper of Zhou et al. (2010).
|Laxman et al.(2007)||Discovering frequent episodes and learning Hidden Markov Models: A formal connection||– Laxman proposes to only count the non overlapping occurrences of episodes.|
– This counting method is called the non overlapping support or non overlapping frequency, and was then used in many other papers after.
|Laxman et al. (2007)||A Fast Algorithm For Finding Frequent|
Episodes In Event Streams
|Algorithm 1 and Algorithm 2||– This paper introduces an algorithms to find frequent episodes in a potentially infinite sequence of events (a data stream)|
|Oualid et al. (2021)||Mining Episode Rules from Event Sequences Under Non-overlapping Frequency||NONEPI||– This paper presents an algorithm named NONEPI for episode rule mining using the concept of non overlapping frequency.|
– The goal is to find rules that are easier to interpret as occurrences must be non overlapping.
|Fournier-Viger et al (2021)||Mining Partially-Ordered Episode Rules in an Event Sequence||POERM||– This paper makes the observation that traditional episode rules have a very strict ordering between events.|
– This paper defines are more general type of episode rules called partially-ordered episode rules. These rules loosen the ordering constraint between events. As a result, a partially-ordered episode rule can summarize multiple traditional episode rules.
– The POERM algoriths is defined to find partially-ordered episode rules. It finds rules directly without first having to apply a frequent episode mining algorithm.
– Another version of POERM called POERM_H was proposed in a subsequent paper “Mining Partially-Ordered Episode Rules with the Head Support“, where occurrences are counted using the head support of EMMA.
|Fahed et al. (2018)||DEER: Distant and Essential Episode Rules for early prediction||DEER||– This paper presented an algorithm named DEER to find episore rules that can predict distant events rather than events that appear closely after other events. These rules are called essential rules.|
– The algorithm is based on the concept of minimal occurrences.
– Limitation: the algorithm does not handle the case of simultaneous events.
|Wu et al. (2013)||Mining high utility episodes in complex|
|US-SPAN||– A problem with traditional episode mining algorithm is that all event types are considered as equally important.|
– To address this issue, the concept of utility is added to episode mining to define a new problem of high utility episode mining.
– In that problem each event occurrence can have a quantity as well as a weight. This allows for example to model the purchases of customers with quantities and unit prices to find episodes that yield the most money (the highest utility).
– An efficient algorithm called US-SPAN is proposed for this problem, which is based on the concept of minimal occurrences.
|Fournier-Viger et al. (2019)||HUE-Span: Fast High Utility Episode Mining||HUE-SPAN||– This paper makes the important observation that the previous algorithm for high utility episode mining US-SPAN can underestimate the utilitiy of episodes by not taking into account all timestamps of minimal occurrences for utility calculations. As a result, some high utility episodes can be missed.|
– To address this issue, the definition of utility is modified.
– Moreover, a new and more efficient algorithm named HUE-SPAN is proposed for high utility episode mining.
– This algorithm is based on the concept of minimal occurrences.
|Ao et al. (2018)||Large-scale Frequent Episode Mining from Complex|
Event Sequences with Hierarchies
|LA-FEMH||– A big data algorithm for episode mining called LA-FEMH is proposed using the Spark architecture.|
– The algorithm can find closed and maximal episodes and also consider that events are organized as a taxonomy.
– Limitation: This algorithm does not handle the case of simultaneous events. In other words, the algorithm can only find serial episodes.
|Fournier-Viger et al. (2022)||MaxFEM: Mining Maximal Frequent Episodes in Complex Event Sequences||MaxFEM, AFEM||– An algorithm called MaxFEM to find maximal episodes in a complex event sequence (the general case).|
– A version called AFEM to find all frequent episodes.
– Those extends the EMMA algorithm with new optimizations and use the head support definition.
There are very few software programs and source code available online for episode mining. The most complete software, which offers a dozen episode mining algorithms and is open-source is the SPMF data mining software (which I am the founder). It provides implementations of many algorithms such as MINEPI, EMMA, TKE， US-SPAN, POERM, and HUE-SPAN.
Besides episode mining, SPMF also offers algorithms for many other data mining tasks such as sequence prediction, high utility itemset mining, sequential pattern mining, periodic pattern mining, sequential rule mining and subgraph mining.
In this blog post, I have given a list of key papers about episode mining. Of course, making such list is subjective. But I believe that this list can be useful for those who wants to learn quickly about episode mining by having a quick summary.
If you have enjoyed this blog post, you may also check other content of this blog. There are many posts and resources related to pattern mining.