The future of pattern mining

In this blog post, I will talk about the future of research on pattern mining. I will also discuss some lessons learnt from the decades of research in this field and talk about research opportunities.


What is the state of research on pattern mining?

Over the last decades, many things have been discovered in pattern mining. The field has become more mature. For example,  algorithms for pattern mining generally always follow the same general approaches, established more than a decade ago. The main types of algorithms in pattern mining are the Apriori based algorithms, pattern growth algorithms and vertical algorithms. The proposal of these fundamental approaches has facilitated the development of new algorithms.

However, although traditional pattern mining problems have been well-studied such as frequent itemset mining, novel pattern mining problems are constantly proposed, and these problems often have unique challenges that require new tailored solutions. For example, this is the case for subgraph mining, where a subgraph mining algorithm must be able to deal with the problem of subgraph isomorphism checking, which does not exist in traditional pattern mining problems such as itemset mining. Another example is the design of efficient algorithms for novel architecture such as cloud systems, parallel systems, GPUs, and FPGAs, which requires to rethink traditional algorithms and their data structures.

A second observation about the state of research on pattern mining is that not all research areas of pattern mining have been explored equally. For example, some topics such as frequent itemset mining and association have received a lot of attention while other problems such as sequential rule mining and periodic pattern mining have been much less explored. In my opinion, this is not because these latter problems are less useful but perhaps because the problem of frequent itemset mining is simpler.

A third observation is that the field of pattern mining seems to be less popular in the last decade.  This is certainly true but it is not something to worry about because there are countless research problems that have not been solved in this field. Besides, all fields of computer science follow some trends that are cyclic.  This is the case for example for research on artificial intelligence which currently receives a lot of attention but was previously met with disinterest and lack of funding opportunities during specific time periods in the last decades (the “AI winters”). Besides, although pattern mining may seem to be less studied than before, some subfields of pattern mining are actually becoming more and more popular. For example, this is the case for high utility pattern mining, which has been growing steadily since the last 15 years. Here is a plot of the number of papers per year on utility mining (a figure prepared by Gan et al (2018):

This figure clearly shows a growing interest on the topic of utility pattern mining. Besides, quality papers in the field of pattern mining are still published in top conferences and journals.

What lessons can we learn?

Several lessons can be learnt. The first one is that too much research have in my opinion focused on improving the performance of algorithms in the last decades, while neglecting the applications of these algorithms. Don’t get me wrong. Performance is very important as one does not want to wait several hours to find patterns. However, considering the usefulness of the discovered patterns ensure that these algorithms will actually be used in real applications.  If researchers would think more about the usefulness of patterns, I think that this could help grow the field of pattern mining further.

There are several pattern mining problems, which have not been applied in real life. Why? A first reason is that the assumptions of some of these problems are unrealistic or too simple.

For researchers working on pattern mining, I think that potential applications should always be considered first.  Working on problems that have many potential applications or are more useful should be preferred. Thus a key lesson is to not forget the user and the applications. If possible discussions with potential users should be carried to learn about their needs. In general, a principle is that the more a problem is specialized, the less likely it will be to be used in real-life. For example, if someone would propose a very specialized problem such as “mining recent high utility episode patterns in an uncertain data streams when considering a sliding window and a gap constraint”, it is certainly less likely to be useful than the more general problem of “mining high utility episodes“.

A second reason why many algorithms are not used in real life is that many researchers do not provide their source code or applications. Sometimes, it is because the authors cannot share them due to restrictions from their institutions or collaborators. And sometimes, it is simply because researchers are worried that someone could design a better algorithm. There are also other reasons such as the lack of time to release the algorithms.  But sharing the source code of algorithms could greatly help other researchers and people interesting in using the algorithms. I previously wrote a detailed blog post about why researchers should share their implementations.

Research opportunities

Having discussed the state of research on pattern mining, there are actually many research opportunities such as:

  • Proposing faster and more memory efficient algorithms,
  • Proposing algorithms having more features or more user-friendly (e.g. interactive algorithms, visualization or algorithms offering to specify additional constraints that are useful for the user)
  • Proposing new pattern mining tasks that have novel challenges,
  • Proposing new applications of existing algorithms,
  • Proposing variations of existing problems (e.g. mining patterns in big data, using parallel architectures, etc.)

I personally think that pattern mining is a good research area because it is challenging and many things can be done.


This is what I wanted to talk about for today. Hope you will have enjoyed this blog post. If you have any other ideas or comments, please leave them in the comment section.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

(Visited 296 times, 1 visits today)


The future of pattern mining — 5 Comments

  1. Dear Philippe Fournier,
    i read the blog very informative and found useful information. sir i have a qustion why we mining top-k periodic frequent pattrens and what is tha real world applications of top-k periodic frequent pattern? i am very thakful your support.


    • Dear Habib, Thanks for reading the blog and commenting. I think you should first think about why we find periodic patterns. Periodic patterns appear in many real-life applications. For example, in real-life, maybe a customer buy some wine and bread every week. If we can discover the periodic patterns of this person, we can learn about his behavior. It could used to give discounts or use other marketing strategies. But more generally, if we find that some event is happening periodically, we could use that to make predictions. For example, if we know that every week some people watches movies during the week-end, then we can show him some recommendation just before the week-end related to movies. I think we can imagine several scenarios like that. Now let me give you the second part of my answer. Why top-k patterns? Usually the main reason about top-k patterns is that we want to find the k best patterns according to some measures. For example, we may want to find the k patterns having the highest utility (make the most money), which is called top-k high utility itemset mining. For the periodicity, I am not sure about what top-k would mean. Would you want to find the top-k patterns that are periodic and appears the most frequently (top-k periodic frequent pattern mining) or perhaps the k patterns that yield the most money (top-k periodic high utility itemsets). I think for these problems you could maybe find some scenarios where it is useful. Or you could define the “top-k” condition on something else beside frequency and money. I think you can probably find some justifications. But maybe need to continue to think a little bit about it. Best regards, Philippe

  2. Pingback: 25 years of pattern mining - The Data Mining Blog

  3. Hello sir, your blog is very helpful for me but can you please tell me the latest problem related to pattern mining, and which topic I can take for further research.

    • Hi, Glad that the blog is useful. There are many possibilities for research projects in pattern mining. Generally, there are three types of projects:
      (1) work on some existing pattern mining problem and propose an algorithm that is better than current algorithms in terms of some criteria such as memory consumption, speed, scalability… For example, you can propose a Spark algorithm for mining sequential rules in big data. This is just an example.
      (2) or you can decide to propose a new pattern mining problem. For example, you can extend the problem of sequential rule mining with uncertain data, to mine sequential rules in uncertain transaction database. This is just an example.

      For a research topic like (1), it is easier if you are a good programmer and can find some original ideas to improve the existing algorithms for making them more efficient. For a research topic like (2), you are creating a new problem, so you dont really need to write an efficient algorithm because you may not be able to compare with other people. But you can compare different versions of your own algorithm. For topic like (2), since you propose a new problem, you need to come up with a new problem that is interesting and convince other people that it is useful and not trivial to solve. And preferably, you need to apply your new pattern mining task to real data to show that you can find interesting patterns.

      For example, recently I worked on high utility itemset mining, but found that most models do not consider the time. By thinking for a while about this, I observed that the utility (profit) of items in real-life will increase sometimes during the year such as before Christmas, and I wanted to find these time intervals. Thus, I proposed with my student a new problem of mining peak high utility itemsets. We then use the real data that we found online and have shown that we could find some interesting patterns and we also proposed some algorithms. This is just an example.

      So there are a lot of possibilities but of course not all possibilities are good. Some topics have never been explored because the topic is not useful or because the problem is too easy. You need to take the time to choose a good topic. For my students, I spend quite a lot of time to choose a good topic and I discuss with them many times to improve their topics… We do not find a topic in just a few minutes.. Why I spend time to find a good topic is that it is really important. I want to work on topics that are useful and where I can do something interesting to make good papers.

      If you are new to data mining, I think a good starting point is to look at my SPMF software. It offers hundreds of algorithms that you can use as the basis to start your research and also the webpage provides many public datasets that you can use. Then, if you choose a topic like (2), you can already have some code that you can extend with new features to make some new algorithm, or if you do a topic like (1), you can compare with some existing algorithm. Having some code and data can save a lot of time.

      About the topic, you could consider something related to high utility pattern mining as it is a popular topic. There is a lot of code and datasets for this in SPMF.

      Or another possibility is to consider sequential rules. There are not many work on that topic but I think it is an interesting topic.

      Best regards,

Leave a Reply

Your email address will not be published. Required fields are marked *