Discovering the Top-K Stable Periodic Patterns in a Sequence of Events

In this blog post, I will give a brief introduction to the TSPIN paper about how to find stable periodic patterns in a sequence of events. This algorithm was presented in this research paper:

Fournier-Viger, P., Wang Y., Yang, P., Lin, J. C.-W., Yun, U. (2021). TSPIN: Mining Top-k Stable Periodic Patterns. Applied Intelligence. [source code & data]

What is a periodic pattern? Periodic patterns are sets of values or symbols that appear repeatedly in a sequence of events at more or less regular intervals. For example, in a customer transaction database, a periodic pattern may indicate that some customers buy milk every week or bread every two days. Discovering periodic patterns can thus help to understand customer behavior, optimize inventory management, forecast sales, and so on. For example, here is an illustration of a periodic pattern {a,c} that appears every two hours in a sequence of events, where a is for apple and c is for cake:

The idea of finding periodic pattern is interesting as it can provide insights about the data. However, a major problem is that traditional algorithms for periodic pattern mining have two important limitations:

First, they use a very strict definition of periodicity that requires a pattern to always appear again within a fixed time limit. As a result, some meaningful patterns may be discarded if they are slightly irregular. For example, if a user wants to find periodic patterns that appear weekly, a traditional algorithm will discard a pattern if it does not appear for a week even though it may appear weekly for the rest of the year.

Second, several algorithms use a minimum support threshold to filter out infrequent patterns. This parameter is useful but generally users don’t know how to set it to find patterns and thus adjust it by trial and error to find a suitable value.

To overcome these limitations, the TSPIN (Top-k Stable Periodic pattern mINer) algorithm was designed. It combines two key ideas:

1) Stability: A measure of stability is used in TSPIN to evaluate how consistent a pattern is in terms of its periodicity. A stable pattern has relatively small variations in its occurrence intervals, while an unstable pattern has large variations. TSPIN finds stable periodic patterns but it is designed to be less strict than traditional algorithms so as to also find patterns that are slightly irregular.

2) Top-k: A way of ranking patterns based on their frequency and selecting only the k most frequent ones. This avoids the need to specify a minimum support threshold.

The TSPIN algorithm is an efficient algorithm that integrates several optimizations. Those are explained in the TSPIN research paper. Also that paper describes several experiments on synthetic and real datasets. The results show that TSPIN has good performance and can reveal interesting patterns in real-life data.

There are several potential applications of TSPIN. For example, example, applying TSPIN on a dataset of web clickstream data may reveal that some users visit certain websites periodically with high stability, such as news portals or social media platforms. This information could then be used for personalized recommendation or advertising.

To try TSPIN, the code and datasets are available in the SPMF pattern mining library, which is a popular Java software for pattern mining, which can also be used from other programming languages. SPMF also offers implementations of a dozen other algorithms for discovering periodic patterns and hundreds of algorithms to find other types of patterns such as sequential rules, sequential patterns, frequent itemsets and high utility itemsets.

Conclusion

In conclusion, TSPIN is an innovative algorithm for mining top-k stable periodic patterns in discrete sequences. It addresses some limitations of traditional algorithms by using a more flexible definition of periodicity and avoiding parameter tuning. It can discover meaningful patterns that reflect regular behaviors or habits in various domains.

Hope this blog post has been interesting. If you like this topic, you may also check my list of key papers on periodic pattern mining on this blog.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

This entry was posted in Big data, Data Mining, Pattern Mining, spmf and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *