Why data mining researchers should evaluate their algorithms against state-of-the-art algorithms?

A common problem in research on data mining is that researchers proposing new data mining algorithms often do not compare the performance of their new algorithm with the current state-of-the art data mining algorithms.
data mining algorithm evaluation

For example, let me illustrate this problem.  Recently, I have done a literature review on the topic of frequent subgraph mining. I have searched for all recent papers on this topic and also older papers.  Then, based on this search, I have drawn the following diagram, where an arrow X –>Y from an algorithm X to an algorithm Y indicates that X was shown to be a better algorithm than Y in terms of execution time by the authors of X in an aexperiment.

What is the problem ?   The problem is that several algorithms have not been compared with each other.  Therefore if someone wants to know which algorithm is the best, it is impossible to know the answer without implementing the algorithms again and performing the comparison (because many of them have not disclosed their source code or binary files publicly).  For example, as we can see in the above diagram,

  • GSPAN (2002) was shown to be faster than FSG (2001), the initial algorithm for this problem.
  • Then FFSM (2003) was shown to be faster than GSPAN (2002).
  • Then, GASTON (2004) was shown to be faster than FSG (2001) and GSPAN (2002).  But it was not compared with FFSM (2003). It is therefore still unknown if GASTON is faster than FFSM (2003).
  • Thereafter, SLAGM (2007) was proposed and was not compared with any of the previous algorithms.
  • Then, FSMA (2008) was proposed and was only compared with SLAGM (2007), which was has still not been compared with any of the previous algorithms.
  • Then, FPGraphMiner (2011) was proposed.  It was compared with GSPAN (2002). But it was not compared with Gaston (2004), which was shown to be faster than GSPAN (2002). Moreover, it was not compared with FFSM (2003)

Note that on the above diagram boxes in green represents algorithms for closed subgraph mining (ClosedGraphMiner (2003)  and maximal subgraph mining (SPIN(2004) and MARGIN (2006)).  For these algorithms:

  • ClosedGraphMiner (2003) was compared with GSPAN (2002), which is fine since it was the best algorithm at that time.
  • SPIN (2004) was compared with FFSM (2003) and GSPAN (2002), which is fine since they were the two best algorithms at that time.
  • However, MARGIN (2006) was not compared with SPIN (2004). It was only compared with FFSM (2003). Moreover, it was not compared with GASTON (2004).

As you can see from this example, the topic of subgraph mining is a little bit of a “mess” if one wants to know which algorithm is the best.  It could be tempting to use the above diagram to make transitive inference. For example, we could use the links  FFSM <– MARGIN <– FPGraphMiner to infer that FPGraphMiner should be faster than FFSM). However, there is a risk to draw such inference, given that performance comparison can often vary depending on the choice of datasets and so on.

So what is the solution to this problem?   Personally,  I think that the solution is that researchers should make their source code or binary files public, as well as their datasets. This would facilitate the comparison of algorithm performance.  However, it would not solve all problems since different researchers may use different programming languages for example.    Another part of the solution would be that researchers would make the extra effort to implement more competitor algorithms when doing performance comparison, and that they should do their best to choose the most recent algorithms.

This is all of what I wanted to write for today! Hope that you have enjoyed this post.  If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 52 data mining algorithms.

 

Related posts:

This entry was posted in Data Mining, Programming, Research. Bookmark the permalink.

4 Responses to Why data mining researchers should evaluate their algorithms against state-of-the-art algorithms?

  1. Dang Nguyen says:

    Hi Prof. Philippe,

    I enjoy reading your posts so much. Your shared experiences and ideas help me a lot in writing a good research paper, for example the way to make the charts look more professional. My current research interests are association rule and class association rule mining. After visiting your blog and being encouraged by your idea, I have published some of my algorithm implementations on my personal blog.

    I’m really happy to see your new posts.

    Best and Happy New Year,
    Dang Nguyen

  2. Dang Nguyen says:

    Dear Prof. Philippe,

    I have a trivial question which is how can we identify which algorithm is state-of-the-art, the first algorithm developed for the problem or one cited by many other researchers? Indeed, we can’t implement all existing algorithms. Thus, I’d like to know which algorithm I should compare with when I propose a new algorithm so that my work can impress the reviewers. I often evaluate my method against the latest one, it’s a good strategy?

    Thanks so much for your response.
    Dang Nguyen

    • Dear Dang,

      Thanks for your message. It is great to know that you like the blog and that it is useful. Could you give me the link to your blog? I would like to have a look at it !

      Now let’s talk about your question about how to determine the state-of-the-art algorithm. The answer is to try to read quickly all the articles about the topic to try to see which algorithm is the best. For example, this is what I have done in this blog post for the topic of “frequent subgraph mining”. I took one day to quickly check all the papers on this topic to try to find which one is the best by looking at the experiments in each paper. I have then drawn the diagram that you see in the blog post. But from this diagram, there is no clear winner.

      In general, when you propose a new algorithm, you should compare with the best algorithm if there is one. Ideally, you should also compare with some algorithm(s) that have been published in good journals and conferences and that are not too old.

      If you choose an old algorithm, the reviewer may wonder why you did not choose a recent algorithms and whether your algorithm would perform well against a newer algorithm.

      If you compare with a famous algorithm that is not the best, then it can be ok. But it is always better to compare with the best algorithm.

      Besides, it is sometimes good to to compare your algorithm with an algorithm that has a similar design to your algorithm. For example, if you modified the GSPAN algorithm to make it faster, then you would need to compare with GSPAN to show the improvement over GSPAN. Moreover, you should ideally compare with the newer algorithms too.

      Finally, it is always better to compare with more than one algorithm, if you can and have time to do that. Obviously more comparison is better.

      Happy new year!

      Philippe

  3. Dang Nguyen says:

    Dear Prof. Philippe,

    Thanks for your quick reply. I absolutely agree with you about this point “it is ok if we compare with a famous method but it will be better if we also compare with the best one”.

    In fact, I’ve just researched on data mining recently, one year ago. I currently apply parallel computing techniques to data mining. My personal website is http://nphdang.wordpress.com. I highly appreciate any suggestions and feedback from experts like you.

    Best,
    Dang Nguyen

Leave a Reply

Your email address will not be published. Required fields are marked *