25 years of pattern mining

This year, we are in 2019, and it is already 25 years since Agrawal wrote his seminal papers on frequent itemset mining and association rule mining in 1994. Since then, there has been thousands of papers published on this topic, some about algorithm design, new pattern mining problems, and others about applications in a multitude of fields. And there is still many research issues to work on!

After all these years, it is a good time to look back at what has been achieved to get a new perspective. This is what I did recently with colleagues in a survey paper called “Frequent Itemset Mining: a 25 Years Review“. If you are interested by frequent pattern mining, I encourage you to read the paper, as it makes some interesting observations. For example, it is found that some ideas used in recent algorithms for mining patterns in big data can be traced back to some of the early algorithms. Here is a picture from the paper showing a timeline of key algorithms and events in frequent pattern mining:

What will be the future of pattern mining? You can read my blog post about the future of pattern mining to know more about it!

That is all I wanted to write for today!

β€”
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

(Visited 233 times, 1 visits today)

Comments

25 years of pattern mining — 4 Comments

  1. Thanks for introducing this review paper. As an expert on pattern mining, what do you think is the next phase of this field and how’s about its future?

    • Hi Dang, Thanks for your comment! πŸ™‚

      I think that there are many challenges. Currently, I think that one of the biggest problem of many papers is that some pattern mining problems are too simple and some of them are unrealistic (have no applications and not even applied on real data in papers). Besides, too many researchers just focus on performance. While performance is important, I think the most important is to focus on what the user needs. Thus, I see the future as:
      – Treating more complex types of data (e.g. dynamic attributed graphs), that may be more suitable to real applications
      – Finding more complex types of patterns (by considering time, etc.)
      – Having more constraints (because the user may need constraints in practical applications to filter patterns)
      – Integrating the concept of statistical significance in pattern mining to avoid finding spurious patterns that only appear by chance (this is a good topic, and there has been some good papers about that in recent years)
      – Designing a more interactive system to explore the data and visualization

      Personally, my view when starting a new pattern mining problem is to always think about applications first. Would it be useful to learn something about the data? Can I have some real data to show that my new problem is useful?

      Best,

      Philippe

  2. Thanks for your useful response. I totally agree that having real-world application is one of the most important aspects when developing a new pattern mining problem. I have seen many papers introduce new methods but fail to show its real-world applications.

    Recently, I have been working on a problem which learns continuous low-dimension representations for patterns. The main idea is to learn a continuous vector form for a discrete pattern. For example, a pattern {data, mining} will become [0.1, 0.5, 0.2]. I think this one is an interesting problem since it has a wide range of applications such as pattern matching, classification, and clustering. Considering pattern matching, since each pattern now is represented by a vector (not a set), we can use many distance measures to compute their similarity, such as cosine distance, Euclidean distance, Manhattan distance, etc. If we use a pattern as a set, we can only use Hamming distance. Similarly, using patterns as vectors also show very good performance in classification and clustering tasks e.g. graph classification/clustering, sequence (text) classification/clustering, and transaction classification.

    • Hi Dang,
      Always great to read your comments, and glad to know you are using pattern mining in your current work.

      What you are doing is quite interesting. I think that representing the patterns as vectors indeed is a good idea for combining patterns with classification, clustering and other methods, and as you said, other distance measures. It provides some new interesting possibilities. πŸ˜‰

      Best regards,
      Philippe

Leave a Reply

Your email address will not be published. Required fields are marked *