Some common problems of pattern mining papers

Today, I will talk about pattern mining, a subfield of data mining, that aims at discovering interesting patterns in data. In particular, I will talk about why several papers from that field do not have much impact. This is generally because of some of the following limitations:

  • Irrealistic problem definition: Several papers on pattern mining defines a problem that is irrealistic and has no real application or few applications. For example, I have seen many papers about proposing algorithms for mining some types of patterns, but never saw any papers that have applied these algorithms to do anything in real life.
  • A problem definition that is too narrow or an algorithm that is not flexible. Another issue is that many papers propose algorithms for problem definitions that are too specialized or too simple to be used in real-life. For example, there are a lot of papers that talk about analyzing customer shopping data but fail to consider many important data such as the customer profile, the categories of items, the cost and profit, and the time of purchase. Without considering such data, many pattern mining model are not useful.
  • Incremental contributions. Too many papers in pattern mining provide simple ideas that are just a simple combination of existing ideas. It is important to come up with original research ideas.
  • Focusing too much on performance. A lot of papers on pattern mining focus on performance. While performance is important and it is exciting to design fast algorithm, it is not always the most important for the user. A user may rather want to have more flexibility and to have the ability to set constraints on patterns to be found.
  • Poor or limited experiments. Another problem of many pattern mining papers is that poor or limited experiments are carried out to evaluate the algorithms. The experiments often focus on performance but do not show that interesting patterns are discovered or that the patterns are useful in real-life. This is important to show that the algorithms are useful for something.
  • Poor presentation. Another issue is a poor presentation. Even if an idea is very good, if it is poorly presented in a paper, it will have a limited impact.
  • Incorrect and imcomplete algorithms. Several pattern mining algorithms are claimed to be correct and complete but are not. Generally this is because no proofs are made in the papers and the authors forget some important cases when designing the algorithms. This is something to be careful about.
  • Approach that is not reasonable or well-justified. Another problem in several pattern mining papers is that there is not enough justifications about the design of the algorithm to convince the reader that the approach is reasonable and innovative.

That was just a short blog post! Hope it has been interesting. Leave a comment below, if you want to add something else or give your opinion.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

This entry was posted in Data Mining, Pattern Mining, Research and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published.