SPMF 2.55 is released (10 new algorithms!)

Hi everyone,

This is a short blog post to let you know that a new version of the SPMF data mining library has been released (version 2.55) with 10 new algorithms for pattern mining. SPMF is by far the most complete library for pattern mining and can be used from various languages. Thanks again to all contributors and users who made this project a success and supported it through the years.

If you are a researcher and would like your algorithms to be included in SPMF to provide more visibility to your work, feel free to send me an e-mail.

Hope you will enjoy this new version of SPMF!


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in Uncategorized | Leave a comment

Upcoming SPMF 2.55 + UDML 2022 + BDA 2022

I have not written on the blog in the past few weeks. This is because I was quite busy and also it was the summer vacation. I had to temporarilly focus on other things. Today, I want to give you some quick news:

  • There is about one more week to submit your paper to a workshop that I co-organize at the IEEE ICDM 2022 conference. The workshop is UDML 2022 (5th International Workshop on Utility-Driven Mining and Learning). You are welcome to submit any papers related to machine learning or pattern mining. The deadline is : 2nd September 2022
  • A new version of SPMF (v 2.55) is under preparation. There will be about 10 new algorithms, including 5 high utility itemset mining algorithms and 3 episode mining algorithms. I am very excited to release this new version soon. If you have some algorithm implementation that you would like to include in SPMF, feel free to contact with me at philfv8 AT yahoo DOT COM.
  • You may also want to consider the BDA 2022 conference on Big Data Analytics for submitting your papers. It is an international conference held in India and published by Springer. The deadline is September 5th.

That is all for today. I will be back soon with more content for the blog and my Youtube channel. Thanks for reading! 🙂

Posted in General | Tagged , | Leave a comment

Turnitin, a smart tool for plagiarism detection?

In this blog post, I will talk about Turnitin, a service used for plagiarism-checking by some journals and conferences in academia. I already wrote a blog post about this, which you can read here:

How journal paper similarity checking works? (CrossCheck) | The Data Mining Blog (philippe-fournier-viger.com)

Today, I just want to show you that this service is in my opinion not very “smart”. Although, this service is useful to detect plagiarism, I notice that it also sometimes flag some very generic text that in my opinion should not be considered for evaluating plagiarism. I will show you seven excerpts from a Turnitin report for a conference paper that I submitted as example:

(1) In a sentence with 23 words, Turnitin flagged that there is a similarity with another source because I have used six words in the same order as another source, even though there is no more than three words that appear consecutively

(2) At another place in the paper, Turnitin flags a similarity because I have used a same keyword as in another paper, while all other keywords are different.

(3) At another place, the submitted paper is considered similar with another paper because I say that in this paper I propose something novel (!):

(4) Here is another example that shows how not “smart” this tool can sometimes be (in my opinion). Having a section called “Experimental evaluation” and saying that we will assess the performance of something is considered similar (!).

(5) Another example of very generic sequences of words that are deemed similar:

(6) And another example of paragraph, where some sentences are said to be similar to four different sources but actually, all of this is just some very generic text used to describe experiments with a same dataset as in another paper:

(7) And in the conclusion a few words are said to be matching with another source but this is not relevant:

By the above examples, I just want to show that Turnitin can sometimes flag some text as similar but the similarity can be due to using some very generic text. Sometimes, this is due because an author tend to the same writing style between different papers and sometimes it is also because there are just not so many different ways of explaining sometimes. For example, here is one more example:

In the last sentence, if I want to say that the next section will talk about some new algorithm, there are not so many ways that I can say that. I could way, “the … algorithm will be presented/described/introduced in the next section” or “The next section will present/describe/introduce the … algorithm”. But I do not see many other ways of explaining this.

Conclusion

In this blog post, I have shown some examples that I consider as “not smart” produced in a Turnitin report. But to be fair, I should say that Turnitin will also flag some text that is very relevant or somewhat relevant. Here I just want to show some examples that do not look relevant to highlight that there still a lot of room for improvement in this tool.

As for the use of Turnitin, it is certainly useful for plagiarism detection, but like any other tools, it also depends on how the results are used by humans. Unfortunately, I notice that many conferences and journals do not take the time to read the reports and instead just fix some thresholds to determine what is acceptable. For example, some conference stated that “similarity index should not be greater than 20% in general and not more than 3% with any single source”. This may seem reasonable but in practice this is quite strict as there is always some similarity with other papers. Ideally, someone would manually check the report to determine what is acceptable.

Posted in Academia, Research | Tagged , , , , | Leave a comment

Brief report about the SMARTDSC 2022 conference

This week, besides IEA AIE 2022, I am also participating to the SMARTDSC 2022 conference (5th international conference on Smart Technologies in Data Science and Communication) as general co-chair and keynote speaker. I will give a brief report about this conference in this post.

What is SMART-DSC?

SMARTDSC is a conference organized by the KL (deemed to be) University in India in collaboration with several international researchers. This is the fifth edition of the conference. The conference focuses on data science, communication and smart technologies and the quality is good. This year, over 150 papers have been received and less than 20% have been accepted for oral presentation, which makes this conference competitive. The proceedings are also published by Springer, which ensures indexing and a good visibility for papers.

The accepted papers are from oven ten different states in India and also from 5 other countries. There is also an excellent line-up of eight keynote speakers for the conference from various countries including Turkey, Egypt, China, France, and Malaysia.

The first keynote talk was by Shumaila Javaid affliated to Shanghai Research Institute for Intelligent Autonomous Systems in China. The talk was about medical sensors and their integration for pervasive healthcare. This was a quite interesting topic has it has the real-life applications that may change lives.

Then, I gave a keynote talk about the automatic discovery of interesting patterns in data. Here are a few slides of my talk where I introduced various topics.

There was then several paper presentations followed by other keynote talks. I will try to add more details about these presentations later, in this blog post. The SMART DSC conference is held for three days. I am attending the conference at different moments during these three days as I have to participate to two conferences at the same time (SMARTDSC 2022 and IEA AIE 2022).

On overall, SMARTDSC 2022 is an interesting conference. It is especially great for participants in India for the convenience of travelling but it is also international with several participants, and keynote speakers from abroad. I am happy to participate to it.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Conference, Data Mining, Data science | Tagged , , , , , , , | Leave a comment

Brief report about the IEA AIE 2022 conference

This week, I am attending the 35th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems conference (IEA AIE 2022) as a member of the organization committee. I will give a brief report about the conference.

What is IEA AIE?

IEA AIE is a well-established conference (35 years already!) about applied artificial intelligence, that is not only the theoretical aspects of artificial intelligence but also the applications. It is a medium-sized conference with an audience that is very international with authors from all over the world. I have attended this conference many times, and it has always been interesting. You can see my report about previous editions of the conference here: IEA AIE 2016IEA AIE 2018IEA AIE 2019 , IEA AIE 2020, and IEA AIE 2021.

Program

This year, the conference received 127 paper submissions, from which 65 have been accepted as full papers, and 14 as short papers. The proceedings are published by Springer. All the papers have been reviewed by at least 3 members from the program committee. The program committee consists of 73 persons from 23 countries. I was one of the two PC chairs this year.

Opening ceremony

The IEA AIE 2022 conference was held in Kitakyushu City in Japan in hybrid mode. I think the majority of attendants were online but there was still many people attending in person. On the first day, there was the opening ceremony. The conference was introduced including the program and other aspects.

It was announced that IEA AIE 2023 will be held in Shanghai, China. The call for papers of IEA AIE 2023 was presented as well as an overview of the organization and other details. Here is a screenshot of the call for papers of IEA AIE 20223 (http://www.ieaaie2023.com/):

The deadline for submitting papers to IEA AIE 2023 is in December 2022.

Paper presentations

There was several paper presentations, covering many different topics such as: industrial applications, health informatics, optimization, video and image processing, natural language processing, agent and group-based systems, pattern recognition, security.

Here is screenshots from some presentations, that I have attended.

This is a paper about air pollution, which use image processing combined with a periodic pattern mining algorithm to obtain good detection:

Below is a paper from my collaborators about parallel high utility itemset mining based on Spark. In that paper some good results are obtained where a parallel version of EFIM and d2HUP provides some good speed-up (up to 20 times) over the sequential versions of those algorithms for mining high utility itemsets.

There was also an interesting paper about weighted sequential pattern mining in uncertain data:

There was also many other papers that I have listened too. I will not report on all of them.

Keynote talks

At IEA AIE 2022, there was two keynote talks. The first keynote was by Prof. Tao Wu from Shanghai University of Medicine & Health Sciences about health informatics. I miss the first half of the keynote, so I will not report on the details but it seems that the audience enjoyed very much that presentation.

The second keynote talk was by Prof. Sebastian Ventura from University of Cordoba, Spain about Improving Predictive Maintenance with Advanced Machine Learning. He talked about how to build models and systems to prevent failure from happening in industrial systems by doing maintenance in advance (predictive maintenance – PdM). He explained that various techniques can be used such as for outlier detection and classification. Prof. Ventura told that he is doing a project for the maintenance of military vehicles. Following the talk, there was a good discussion with conference participants. Prof. Ventura explained that building simple models is good but it is not necessarily the most important. A complex model can be acceptable if it is explainable. In fact, he said that it is more important to have explainable models because in real-applications, models often need to be verified by domain experts. Here are a few slides from that talk about the introduction:

Here are some slides about potential data mining techniques that can be used:

And here are some techniques that have been used in the specific project for predictive maintenance of vehicles:

Here are some challenges and open problems and the conclusion from the talk:

If you are interested by this topic, you may also check the survey paper published recently by Prof. Ventura:

A. Esteban, A Zafra & S. Ventura. Data Mining in Predictive Maintenance Systems. WIREs DMKD. https://doi.org/10.1002/widm.1471

Best paper awards

Several awards have also been announced at the conference. The selection was made by looking at the scores during the review process but also by analyzing carefully the reviews and the papers.

  • Best student paper award
    Question Difficulty Estimation with Directional Modality Association in Video Question Answering
    Bong-Min Kim and Seong-Bae Park
  • Best theory paper award
    Evolution of Prioritized EL Ontologies 
    Rim Mohamed and Zied Loukil and Faiez Gargouri and Zied Bouraoui
  • Best application paper award
    A Generalized Inverted Dirichlet Predictive Model for Activity Recognition using Small Training Data
    Jiaxun Guo and Manar Amayri and Wentao Fan and Nizar Bouguila
  • Best special session paper award
    An Oriented Attention Model for Infectious Disease Cases Prediction
    Peisong Zhang and Zhijin Wang and Guoqing Chao and Yaohui Huang and Jingwen Yan

There was also an award for the best technical presentation given to the best presenter who attended the IEA AIE conference in person.

Conclusion

This was a good conference. Looking forward to IEA AIE 2023 next year in Shanghai, China.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Conference, Machine Learning | Tagged , , , , , , | 2 Comments

Drawings a figure of subgraphs side-by-side with captions in Latex with TIKZ

Today, I will give an example of how to draw a figure containing three subgraphs that appear side by side in Latex using the TIKZ library, and where each subgraph has a caption. This can be useful when writing research papers, where we want to discuss different types of subgraphs.

The result will be like this:

And here is the Latex code:

\documentclass{article} 
\usepackage{caption}
\usepackage{subcaption}
\usepackage{tikz}
\usetikzlibrary{automata,arrows,positioning,calc}

\begin{document}


\begin{figure}
  \begin{subfigure}[b]{0.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.5cm]
        \node[state] (v) {$A$};
        \node[state] (w) [right of=v] {$B$};
        \node[state] (t) [right of=w] {$C$};

		\path[->] (v)  edge node {0} (w);
		\path[->] (w) edge   node  {1}(t);
      \end{tikzpicture}
    \caption{Subgraph 1}
  \end{subfigure}
  \begin{subfigure}[b]{.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.5cm]
        \node[state] (x) {$E$};
        \node[state] (y) [below of=x] {$F$};
        \node[state] (n) [right of=x] {$G$};
        \node[state] (z) [below of=y] {$H$};


		\path[->] (x)  edge node {0} (y);
		\path[->] (x)  edge node {0} (n);
		\path[->] (n)  edge node {0} (z);
		\path[->] (y) edge   node  {1}(z);
      \end{tikzpicture}
    \caption{Subgraph 2}
  \end{subfigure}
  \begin{subfigure}[b]{.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.7cm]
        \node[state] (g) {$A$};
        \node[state] (h) [above of=g] {$B$};
        \node[state] (e) [above right= 0.3 cm and 0.3 cm of g] {$C$};
		\path[->] (g)  edge node {} (e);
		\path[->] (g)  edge node {} (h);
		\path[->] (h) edge   node  {}(e);
      \end{tikzpicture}
    \caption{Subgraph 3}
  \end{subfigure}
\caption{Three subgraphs}
\end{figure}

In this code, I use the automata package of TIKZ, which is great for drawing graphs. You could also use other packages and tweak the above example.

Hope this is useful.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Latex | Tagged , , , , | Leave a comment

(videos) Introduction to sequential rule mining + the CMRules algorithm

I have made two new videos to explain interesting topics about pattern mining. The first video is an introduction to sequential rule mining, while the second video explains in more details how the CMRules algorithm for sequential rule mining works!

You can watch the videos here:

And you can also find them on my Youtube Channel.

If you want to try these algorithms, you can check the SPMF open-source software, which offers fast implementations of these algorithms.

Hope you will enjoy the videos. I will make more videos about pattern mining soon. By the way, you can also check my website about The Pattern Mining course. It gives videos and slides for a free online course on pattern mining. It explains all the main topics about pattern mining and is good for students who are starting to do research in this area. But this course is in beta version, which means that I am still updating it. More videos and content will be added over time.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Pattern Mining, Video | Tagged , , , , | Leave a comment

(video) Periodic Pattern Mining

Hi all, this is to let you know that I have made another video to explain some interesting pattern mining topics. This time, I will talk about periodic pattern mining.

You can watch the video here: (pdf / ppt / video – 34 min) 

This video is part of the free online course on pattern mining.

Hope that it is interesting!


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Pattern Mining | Tagged , , , , , , , | Leave a comment

Drawing the Powerset of a Set using Latex and TIKZ (Hasse Diagram)

Today, I will show how to draw the powerset of a set using Latex and TIKZ, to produce some nice figures that can be used in paper written using Latex. The code that is shown below is adapted from code from StackOverflow.

First, we will draw the powerset of the set {a,b} as a Hasse diagram:

\documentclass{article}
\usepackage{tikz-cd}
\begin{document}

\begin{tikzpicture}
    \matrix (A) [matrix of nodes, row sep=1.2cm]
    { 
        & $\{a,b\}$ \\  
	$\{a\}$ &  & $\{b\}$\\
        & $\emptyset$ \\
    };
    \draw (A-1-2)--(A-2-1);
    \draw (A-1-2)--(A-2-3);
    \draw (A-2-1)--(A-3-2);
    \draw (A-2-3)--(A-3-2);
\end{tikzpicture}
\end{document} 

Then, I will show how to draw the powerset of the set {a,b,c}:

\documentclass{article}
\usepackage{tikz-cd}
\begin{document}

\begin{tikzpicture}
    \matrix (A) [matrix of nodes, row sep=1.2cm]
    {
        $\{a,b\}$ & $\{a,c\}$ & $\{b,c\}$ \\
        $\{a\}$ & $\{b\}$ & $\{c\}$ \\
        & $\emptyset$ \\
    };
    \path (A-1-1)--(A-1-2) node[above=1.2cm] (link) {$\{a,b,c\}$};
    
    \foreach \i in {1,...,3}
    \draw (link.south) -- (A-1-\i.north);
    \foreach \i/\j in {1/2, 3/2, 2/1, 1/1, 3/3, 2/3}
    \draw (A-1-\i.south)--(A-2-\j.north);
    \foreach \i/\j in {1/2, 2/2, 3/2}
    \draw (A-2-\i.south)--(A-3-\j.north);
\end{tikzpicture}

\end{document} 

Then, I will show how to draw the powerset of the set {a,b,c,d}:

\begin{tikzpicture}
    \matrix (A) [matrix of nodes, row sep=1.5cm]
    { 
       ~ &  ~ & $\{a,b,c,d\}$ \\  

	~ &  $\{a,b,c\}$ & $\{a,b,d\}$  & $\{a,c,d\}$ & $\{b,c,d\}$\\
	$\{a,b\}$ & $\{a,c\}$  & $\{a,d\}$ & $\{b,c\}$ & $\{b,d\}$ & $\{c,d\}$\\
	~ &  $\{a\}$ & $\{b\}$  & $\{c\}$ & $\{d\}$\\
      ~ & ~ &  $\emptyset$ \\
    };
    \draw (A-1-3.south)--(A-2-2.north);
    \draw (A-1-3.south)--(A-2-3.north);
    \draw (A-1-3.south)--(A-2-4.north);
    \draw (A-1-3.south)--(A-2-5.north);

    \draw (A-2-2.south)--(A-3-1.north);
    \draw (A-2-2.south)--(A-3-2.north);
    \draw (A-2-2.south)--(A-3-4.north);

    \draw (A-2-3.south)--(A-3-1.north);
    \draw (A-2-3.south)--(A-3-3.north);
    \draw (A-2-3.south)--(A-3-5.north);

    \draw (A-2-4.south)--(A-3-2.north);
    \draw (A-2-4.south)--(A-3-3.north);
    \draw (A-2-4.south)--(A-3-6.north);

    \draw (A-2-5.south)--(A-3-4.north);
    \draw (A-2-5.south)--(A-3-5.north);
    \draw (A-2-5.south)--(A-3-6.north);

    \draw (A-3-1.south)--(A-4-2.north);
    \draw (A-3-1.south)--(A-4-3.north);

    \draw (A-3-2.south)--(A-4-2.north);
    \draw (A-3-2.south)--(A-4-4.north);

    \draw (A-3-3.south)--(A-4-2.north);
    \draw (A-3-3.south)--(A-4-5.north);

    \draw (A-3-4.south)--(A-4-3.north);
    \draw (A-3-4.south)--(A-4-4.north);

    \draw (A-3-5.south)--(A-4-3.north);
    \draw (A-3-5.south)--(A-4-5.north);

    \draw (A-3-6.south)--(A-4-4.north);
    \draw (A-3-6.south)--(A-4-5.north);

    \draw (A-4-2.south)--(A-5-3.north);
    \draw (A-4-3.south)--(A-5-3.north);
    \draw (A-4-4.south)--(A-5-3.north);
    \draw (A-4-5.south)--(A-5-3.north);
\end{tikzpicture}

Here is another diagram that I have done but this time using the TIKZ automata package instead:

Here is the code:

\documentclass{article} 
\usepackage{caption}
\usepackage{subcaption}
\usepackage{tikz}  
\usetikzlibrary{automata,arrows,positioning,calc}

\begin{document}


\begin{figure}\centering
\resizebox{\columnwidth}{!}{
       \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 3cm,state/.style={circle, draw, minimum size=1.8cm}]

%%% LEVEL 1
        \node[state] (x) {$\emptyset$};
%%% LEVEL 2
        \node[state] [below left of=x] (B) {$\{ B \}$};
        \node[state][below right of=x]  (C) {$\{ C \}$};
        \node[state] [left  of=B] (A) {$\{ A \}$};
        \node[state][right of=C]  (D) {$\{ D \}$};
%%%% LEVEL 3
        \node[state][below of=B]  (AD) {$\{ A,D \}$};
        \node[state][below of=C]  (BC) {$\{ B,C \}$};
        \node[state][below of=D]  (BD) {$\{ B,D \}$};
        \node[state][below of=A]  (AC) {$\{ A,C \}$};
        \node[state]  [left  of=AC] (AB){$\{ A,B \}$};
        \node[state]  [right of=BD] (CD){$\{ C,D \}$};

%%  LEVEL   4
        \node[state][below of=AD]  (ABD) {$\{ A,B,D \}$};
        \node[state][below of=BC]  (ACD) {$\{ A,C,D \}$};
        \node[state][left of=ABD]  (ABC) {$\{ A,B,C \}$};
        \node[state][right of=ACD]  (BCD) {$\{ B,C,D \}$};

%%%% LEVEL 5
\node[state][below right of=ABD]  (ABCD) {$\{ A,B,C,D \}$};


%% LEVEL 1 to LEVEL 2
		\path[->] (x)  edge node {} (A);
		\path[->] (x)  edge node {} (B);
		\path[->] (x)  edge node {} (C);
		\path[->] (x)  edge node {} (D);

%% LEVEL 2 to LEVEL 3
		\path[->] (A)  edge node {} (AB);
		\path[->] (A)  edge node {} (AC);
		\path[->] (A)  edge node {} (AD);
		\path[->] (B)  edge node {} (AB);
		\path[->] (B)  edge node {} (BC);
		\path[->] (B)  edge node {} (BD);
		\path[->] (C)  edge node {} (AC);%
		\path[->] (C)  edge node {} (BC);
		\path[->] (C)  edge node {} (CD);
		\path[->] (D)  edge node {} (AD);
		\path[->] (D)  edge node {} (BD);
		\path[->] (D)  edge node {} (CD);
%%% LEVEL 3 TO 4

		\path[->] (AB)  edge node {} (ABC);
		\path[->] (AB)  edge node {} (ABD);
		\path[->] (AC)  edge node {} (ABC);
		\path[->] (AC)  edge node {} (ACD);
		\path[->] (AD)  edge node {} (ABD);
		\path[->] (AD)  edge node {} (ACD);
		\path[->] (BC)  edge node {} (ABC);
		\path[->] (BC)  edge node {} (BCD);
		\path[->] (BD)  edge node {} (ABD);
		\path[->] (BD)  edge node {} (BCD);
		\path[->] (CD)  edge node {} (ACD);
		\path[->] (CD)  edge node {} (BCD);
%%%% LEVEL 4 to 5
		\path[->] (ABC)  edge node {} (ABCD);
		\path[->] (ABD)  edge node {} (ABCD);
		\path[->] (ACD)  edge node {} (ABCD);
		\path[->] (BCD)  edge node {} (ABCD);
%%%% Dashed lines
\draw[dashed] (-10,-1) -- (9,-1); 
\draw[dashed] (-10,-4) -- (9,-4); 
\draw[dashed] (-10,-6) -- (9,-6); 
\draw[dashed] (-10,-9) -- (9,-9); 
\node[] at (-10,-1.5) {level 1};
\node[] at (-10,-4.5) {level 2};
\node[] at (-10,-6.5) {level 3};
\node[] at (-10,-9.5) {level 4};
      \end{tikzpicture}
}
\caption{The caption}
\end{figure}

\end{document}

That is all for today. Hope this will be useful.

Note that I had also written a blog post about how to draw the powerset of a set using Java and GraphViz instead of using Latex.

Posted in Latex | Tagged , , , | Leave a comment

A New Tool for Running Performance Comparison of Algorithms in SPMF 2.54

Today, I am happy to announce a cool new feature in SPMF 2.54, which is a tool to automatically run performance experiments to compare several algorithms when a parameter is varied. This is a useful feature to compare algorithms when writing a research paper. The new tool to do performance experiments let you choose oneor more algorithms, indicate default parameter values and that a parameter must be varied. A time-out can be specified to avoid running algorithms for too long. Besides, each algorithm execution is done in a separate Java virtual machine to ensure that the comparison is always fair (e.g. memory will not accumulate from previous algorithm executions). When running the tool, the results can also be easily exported to Excel and Latex to generate charts for research papers. This tool can save a lot of time for performing experimental comparisons of algorithms!

Briefly the main way to use the new tool is to run the graphical user interface of SPMF and then select the tool from the list of algorithms:

This will open-up a new window, where we can configure the parameters for running the experiment:

For example, without going into details, here we choose five algorithms called Apriori, Eclat, FPGrowth_itemsets, FPClose and FPMax. We also select an input file called Chess.text on which we will run the algorithms and we select a directory called EXPERIMENTS to save all the results that will be generated to files. We also say that the algorithms have one parameter which will be varied (We indicate that a parameter is varied using a special code ##), and the parameter will be varied from 0.95 to 0.80. We say that the time limit for each execution will be 10 seconds, and we select the option that we want not only to compare time and memory usage but also the number of patterns (lines in the output) and save all results as Latex figures, as well.

Then we click “Run the experiments” to run the experiments. We will get summarized results as shown below for execution time, memory and number of patterns:

TIME (S)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 0,39 0,4 0,23 0,48 0,52 0,59 0,67 0,75 0,93 1,25 1,52 1,94 2,33 2,48 2,87 3,43
Eclat 0,53 0,61 0,32 0,58 0,63 0,71 0,75 0,81 0,88 0,86 0,92 1,07 1,11 1,27 1,41 1,47
FPGrowth_itemsets 0,77 0,59 0,39 0,55 0,57 0,53 0,6 0,65 0,63 0,54 0,55 0,62 0,6 0,6 0,62 0,69
FPClose 0,61 0,57 0,41 0,67 0,63 0,66 0,65 0,54 0,62 0,59 0,72 0,64 0,63 0,72 0,69 0,73
FPMax 0,67 0,51 0,33 0,5 0,59 0,56 0,58 0,59 0,56 0,58 0,59 0,59 0,55 0,72 0,68 0,58

MEMORY (MB)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 18,88 18,88 18,88 18,88 18,88 19,32 19,76 19,76 19,76 20 20,44 21,32 21,32 22 3,34 3,36
Eclat 19,92 27,93 27,93 26,76 11,96 38,9 72,43 110,26 154,03 94,25 141,23 91,27 176,8 71,23 15,29 172,98
FPGrowth_itemsets 6,75 7,43 7,43 7,43 7,44 7,44 8,11 8,1 8,78 9,44 10,1 10,77 11,44 12,78 14,12 15,45
FPClose 6,76 7,43 7,43 7,43 7,43 7,43 8,1 8,77 8,76 9,43 10,1 11,43 12,11 12,1 13,45 15,45
FPMax 6,76 7,44 7,44 7,42 7,43 7,43 7,44 7,43 8,1 8,1 8,77 9,44 9,43 9,44 10,11 10,78

OUTPUT_SIZE (LINES)
parameter 0,95 0,94 0.93. 0,92 0,91 0,9 0,89 0,88 0,87 0,86 0,85 0,84 0,83 0,82 0,81 0,8
Apriori 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
Eclat 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
FPGrowth_itemsets 77 139 -1 305 421 622 883 1195 1553 1987 2669 3484 4243 5312 6656 8227
FPClose 73 124 -1 269 362 498 689 922 1183 1457 1885 2400 2883 3487 4216 5083
FPMax 11 18 -1 26 30 34 51 75 69 89 119 145 133 163 221 226

This was generated in just a few seconds. If we would have run these experiments by hand, it would have took us a lot more time. Now, after we have these results, since they are tab-separated we can directly import them into Excel to generate charts:

Besides, since we selected the option to generate PGFPlots figures for Latex, SPMF will have generated latex documents as output, that we can directly compile to use in Latex documents:

As you can see these figures are quite beautiful.

This is just a brief overview of this new feature to do performance experiments. There are more explanations about how it works in the documentation of SPMF, here:
Example: Run Experiments To Compare The Performance of One or More Algorithms on a Dataset (One Parameter Is Varied)  (SPMF – Java)

I will continue improving this tool to generate experiments in the next few weeks. In particular, I am now working on an option to generate scalability experiments as well, which should be released soon. Also, I will modify the tool to make it easier to compare algorithms from SPMF with algorithms that are not in SPMF.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in open-source, Pattern Mining | Tagged , , , , , | Leave a comment