The Data Blog

An EXE version of SPMF for Windows

Posted on 2026-04-20 by Philippe Fournier-Viger

Today, I want to announce that I have included a compiled EXE version of SPMF.jar for Windows 64 bits on the Download page of SPMF. It is especially useful if you cannot or do not want to install Java on a computer.

This portable EXE version of SPMF is slightly bigger (55 mb instead of around 11 mb) because it includes the Java runtime environment.

You can download it from the download page on the website of SPMF:

Later, I might also include a compiled version for Linux and other platforms, if some people request it.

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 300 algorithms for analyzing data, implemented in Java.

Posted in Java, open-source, spmf | Leave a comment

CFP: Special session at SOMET 2026

Posted on 2026-02-25 by Philippe Fournier-Viger

In this post, I want to talk about a special session on Knowledge Science and Intelligent Computing (KSIC) that I am co-organizing this year at the SOMET 2026 conference (25th Int. Conf. on Intelligent Software Methodologies, Tools, and Techniques). I would like to invite you to submit your research papers!

The conference proceedings will be published by IOS Press and indexed in SCOPUS. The important dates are:

Deadline: April 14, 2026 (updated)
Notification to authors: May 10, 2026.
Camera-Ready papers: June 15, 2026.

Relevant topics include, but are not limited to,
the following:

Knowledge Reasoning and Representation
Knowledge-based software engineering.
Knowledge Representation and Reasoning
Knowledge engineering application.
Ontological engineering.
Symbolic reasoning in Large Language Models
Reality automated generation
Cognitive foundations of knowledge
Intelligent systems.
Intelligent Information Systems.
Robotics and Cybernetics.
Distributed and Parallel Processing.
Aspects of Data Mining.
Bio-informatics.
Knowledge extraction from text, video, signals and images.
Search and Mining of variety of data including scientific and engineering, social, sensor/IoT/IoE.
Intelligent Computational Modeling.
Mobility and Big Data.

Session Organizers
▪ Nhon V. Do, Hong Bang International University,
Vietnam.
▪ Philippe Fournier-Viger, Shenzhen University, China.
▪ Hien D. Nguyen, University of Information Technology,
VNU-HCM, Vietnam.

For more information about the special session and conference, click here.

Posted in cfp, Conference | Leave a comment

SPMF 2.65 is released!

Posted on 2026-02-19 by Philippe Fournier-Viger

Today, I want to announce that a new version of the SPMF data mining library and software has been released, which is version 2.65. This version bring several improvements, including 8 new algorithms, several optimizations, and new user interface tools for data analysis, and some tools for data processing. The details of this new version can be found on the download page of SPMF. Here is a brief overview.

Eight new algorithms:

the LinearTable algorithm for mining frequent itemsets, which can work especially well when the number of items is relatively small. This algorithm has very low memory usage in some cases (Lu et al. 2023)
The SAM algorithm for mining frequent itemsets (Borgelt et al., 2009)
The TM algorithm for mining frequent itemsets (Song et al., 2006)
The NEWCHARM algorithm for mining frequent closed itemsets (Ye et al., 2015)
The DBVMiner algorithm for mining frequent closed itemsets (Vo et al., 2012)
The FTARM algorithm for top-k association rule mining, which is a variation of ETARM with additional strategies (Liu et al., 2019)
The ETARM algorithm for top-k association rule mining, which is a variation of TopKRules with additional pruning strategies (Nguyen et al., 2017)
The AprioriTID_HD algorithm, a modification of AprioriTID for better performance (thanks to Harshil Damania for proposing this improvement )

Performance improvement

I have added several optimizations to improve the performance of algorithms such as Apriori, AprioriClose, Eclat, Relim, AprioriInverse, AprioriRare, AprioriTopK, dEclat, Charm, dCharm, TopKRules, TopKClassRules, etc. In some case, the speed can be improved by several times and the memory performance reduced considerably.

New user interface tools

One new user interface tools is the Itemset-Item Matrix Viewer, which allows to visualize the relationship and similarities between itemsets discovered by itemset mining algorithms. Here is a screenshot:

There is also a new Item Co-Occurrence HeatMap Viewer to visualize co-occurrences between items in transaction databases. For example, here is a visualiztion of the co-occurrences of the top 20 most frequent items in the Chess dataset:

I have also added panels in the dataset viewers to provide interesting statistics about datasets. For example, for the Transaction dataset viewer:

Bug fixes

I have also fixed various small bugs.

Conclusion

This is just a quick overview of this new version of the SPMF pattern mining software, version 2.65. Thanks again to all users of SPMF and contributors for your support!

Posted in open-source, Pattern Mining | Tagged association rule, data mining, data science, itemset mining, java, library, pattern mining, software, spmf | Leave a comment

Merry X-Mas and Happy New year to SPMF users!

Posted on 2025-12-24 by Philippe Fournier-Viger

Just a short blog post today to wish you happy holidays and Merry X-Mas to those who are celebrating it, among the users of SPMF! Thanks again for your support!

Posted in Other | Leave a comment

The 1st HP4MoDa workshop was held at BIBM 2025

Posted on 2025-12-15 by Philippe Fournier-Viger

Today, the first workshop 1st Workshop on Heuristic and Pattern Mining for Multi-Omics Data Analytics was held at IEEE BIBM 2025, online. I co-organize this workshop with M. Saqib Nawaz and other collaborators. The workshop focus on various machine learning and pattern mining methods and their applications to the analysis of multi-omics data.

Here are the slides from the opening ceremony:

It was announced that 8 papers have been accepted by the workshop this year. They cover multiple topics such as interpretable deep learning for regulatory sequence analysis, graph neural networks for single-cell and spatial omics data, heuristic and bio-inspired optimization for protein fitness landscapes, minimum description length and evolutionary approaches for protein compression, advanced sequential and temporal pattern mining methods, and multi-source data fusion models for predicting complex genetic traits.

We also mentioned that we are currently working to organize a special issue for extensions of the papers (to be confirmed later).

We also announced that the best paper award of the workshop was given to this paper by researchers from Canada:

Mahshad Hashemi, Sharjeel Mustafa, Alioune Ngom, and Luis Rueda, HeteroGraphNet: A Ligand–Receptor Informed, Heterophily-Adapted Graph Neural Network for Cell Type Prediction in scRNA-seq Data

There was several interesting presentations on diverse topics. Here for example, a screenshot from the first paper presentation by Lin, Yuexi et al. about Decoding Translation-Related Functional Sequences in 5’ UTRs Using Interpretable Deep Learning Models:

Among the papers, my PhD student presented a new algorithm called GMP for protein sequence compression based on pattern mining. Here are a few slides to show an overview:

This is just a short report about the workshop. It has been a success for the first edition of this workshop. We thus plan to organize it again next year!

Posted in Bioinformatics, Conference | Tagged bibm, bioinformatics, hp4moda, ieee bibm, machine learning, pattern, pattern mining, workshop | Leave a comment

A prototype of an improved GUI for the SPMF pattern mining software

Posted on 2025-12-06 by Philippe Fournier-Viger

Recently, I have been working on improving the SPMF data mining software. Something good about SPMF is that it has a simple user interface. But as SPMF has evolved with more and more algorithms, the list of algorithms in the software has become very long and it may be not so easy to browse through the list of algorithms. Thus, I have started to think about upgrading the user interface to make it more user-friendly. Here is some new prototype welcome window for SPMF that I am working on:

This window provides access to all the main features of SPMF through a centralized screen. Thus, the user can clearly focus on the different tasks such as generating data, or choosing a data mining or pattern mining algorithm, or viewing and transforming data. When the user will click on “View and transform data” for example, he will access only the algorithms and tools for viewing and transforming data.

I think that this type of interface can be an improvement over the existing user interface. However, for now, this is only a prototype and I am working on putting this all up together, and testing. I will not release a new interface for SPMF until I am sure that everything works well and that it is good. And I might also leave the option of choosing between the traditional user interface and the new user interface.
If you have any ideas or suggestions to make this better, please leave me a comment below or email me! I think that if work on this user interface go well, maybe it could be released early next year.
Again, thanks for all users of the SPMF pattern mining library for your support!

Posted in Pattern Mining, spmf | Tagged open source, pattern mining, pattern mining software, spmf, user interface | Leave a comment

Upcoming in SPMF 2.64b : The “Pattern Diff Analyzer”

Posted on 2025-12-05 by Philippe Fournier-Viger

Today, I will talk to you about an upcoming feature of SPMF pattern mining software 2.64b, which I think will be very useful to many people. It is a new tool, called the Pattern Diff Analyzer that allows to calculate the contrast between two files containing patterns.

For example, lets say that you extract sequential patterns from two text documents. You can now use this tool to find variations in the patterns found in both document to discover patterns that distinguish each document. Another example is you extract patterns from the genome sequences of two viruses and want to find patterns that differ in the two sequences.

The new Pattern Diff Analyzer tool is very simple to use and looks like this:

In this screen, we can select two files containing patterns found by a pattern mining algorithm. For example, I will use two files called patternsA.txt and patternsB.txt.

After that, we can go to the second tab called “Compute contrast” to find the differences in patterns between these two files. In the picture below， I choose the “SUP” measure (support) for calculating the difference, and I choose “Absolute difference” with the threshold of 10. This means that I want to find all the patterns where the difference in support is more than 10 between the two files. The result is 20 patterns:

I can also choose other contrast methods such “Exclusive in file 1“, which means all patterns that only appear in the first file but not in the second file:

Or similarly, I can choose “Exclusive in file 2“：

There are also other contrast methods available such as the ratio of a pattern’s measure value for file A to that in file B. For example, here I select patterns where the ratio of A to B is at least 1.2:

After discovering the contrast patterns, we can also Export them to a text file for saving these results!

I think this tool will be very useful for classification problems where we want to compare patterns from different classes. Related to classification, note that in SPMF, we also have multiple algorithms for classification using association rules. But this is a different approach.

So today, I just wanted to show you a preview of this upcoming tool in SPMF. I will continue testing and may made some changes before the final release. Also, I will provide an algorithm that could be called from the command line to do the same thing as this Pattern Diff Analyzer tool so that it can be used without the graphical user interface as well.

Posted in Data Mining, Pattern Mining, spmf | Tagged contrast pattern, emerging pattern, itemset, pattern, pattern diff, pattern mining library, pattern mining software, sequential pattern, spmf | Leave a comment

Fixing the reviewresponse.cls LaTeX Class to Allow Multi-Page Comments

Posted on 2025-10-22 by Philippe Fournier-Viger

Today, I will show how to fix the Latex reviewresponse.cls class to allow multi-page comments.

If you have ever written a detailed response to reviewers in LaTeX, you may have noticed that long reviewer comments sometimes get cut off instead of continuing on the next page. This happens because the comments are enclosed in non-breakable tcolorbox environments.

The Problem

In the original version of reviewresponse.cls, the environments for reviewer comments look something like this:

\newenvironment{generalcomment}{%
  \begin{tcolorbox}[attach title to upper,
    title={General Comments},
    after title={.\enskip},
    fonttitle={\bfseries},
    coltitle={colorcommentfg},
    colback={colorcommentbg},
    colframe={colorcommentframe},
  ]
}{\end{tcolorbox}}

\newenvironment{revcomment}[1][]{\refstepcounter{revcomment}
  \begin{tcolorbox}[adjusted title={Comment \arabic{revcomment}},
    fonttitle={\bfseries},
    colback={colorcommentbg},
    colframe={colorcommentframe},
    coltitle={colorcommentbg},
    #1
  ]
}{\end{tcolorbox}}

\newenvironment{changes}{\begin{tcolorbox}[colback={colorchangebg},
  colframe={colorchangeframe},enhanced jigsaw,]
}{\end{tcolorbox}}

These definitions produce nice colored boxes, but the problem is that tcolorbox by default does not break across pages. When your reviewer writes a long paragraph, LaTeX tries to keep the entire box on one page, which can result in missing text or strange layout issues.

The Solution

The fix is simple: you need to make the boxes breakable and enhanced. The tcolorbox package provides two key options for this:

breakable — allows the content to flow onto the next page.
enhanced jigsaw — ensures compatibility with decorations, titles, and other layout features when breaking boxes.

Here is the fixed version of the environments:

\newenvironment{generalcomment}{%
  \begin{tcolorbox}[
    enhanced jigsaw,
    breakable,
    attach title to upper,
    title={General Comments},
    after title={.\enskip},
    fonttitle={\bfseries},
    coltitle={colorcommentfg},
    colback={colorcommentbg},
    colframe={colorcommentframe},
  ]
}{\end{tcolorbox}}

\newenvironment{revcomment}[1][]{%
  \refstepcounter{revcomment}
  \begin{tcolorbox}[
    enhanced jigsaw,
    breakable,
    adjusted title={Comment \arabic{revcomment}},
    fonttitle={\bfseries},
    colback={colorcommentbg},
    colframe={colorcommentframe},
    coltitle={colorcommentbg},
    #1
  ]
}{\end{tcolorbox}}

\newenvironment{revresponse}[1][{}]{%
  \textbf{Response:} #1\par
}{\vspace{4em plus 0.2em minus 1.5em}}

\newenvironment{changes}{%
  \begin{tcolorbox}[
    enhanced jigsaw,
    breakable,
    colback={colorchangebg},
    colframe={colorchangeframe},
  ]
}{\end{tcolorbox}}

Result

After this modification, your reviewer comments and “changes” boxes will automatically continue onto the next page, no matter how long they are. You can now safely include large comments or detailed explanations without worrying about text being cut off.

Conclusion

By simply adding enhanced jigsaw and breakable to the tcolorbox environments, you make your LaTeX review responses much more robust. This small fix prevents truncated comments and keeps your document professional and reviewer-friendly.

Posted in Latex | Tagged fix, latex, response, review, reviewresponse.cls | Leave a comment

How to fix reviewresponse.cls for custom reviewer numbering

Posted on 2025-09-30 by Philippe Fournier-Viger

Recently, I have found anice Latex class that can be used to write answers to reviewers for the rebuttal of journal papers. This latex class is called reviewresponse.cls, which can be found on GitHub. It allows to write an answer to reviewers with comments such as:

....

\reviewer

\begin{revcomment}
Figure 4 - please include legend to the right or below the main figure as in panel b legend overlaps with line of plot making confusion i interpretation. gentle grey grid in backround will also be valuable for plot investigation.
\end{revcomment}
\begin{revresponse}
    [your answer]
\end{revresponse}
\begin{changes}
    some changes you made
\end{changes}

\begin{revcomment}
    No avaliable implementation.
\end{revcomment}
\begin{revresponse}
     [your answer]
\end{revresponse}
\begin{changes}
    some changes you made
\end{changes}

which will then generate something beautiful like:

However, I have found a problem with this class, which is that the reviewers are automatically numbered as Reviewer 1, 2, 3, 4, 5…. But, in several cases, the reviewers are not numbered sequentially and some numbers may be skipped.

To fix this issue, the solution is to redefine the /reviewer command in reviewresponse.cls as follows:

\newcommand*{\reviewer}[1][]{%
  \clearpage
  % If no optional argument, step the counter as before.
  \if\relax\detokenize{#1}\relax
    \refstepcounter{reviewer}%
  \else
    % If an argument was given, set reviewer to N-1 then refstep to N.
    % Using \numexpr avoids the off-by-one problem while keeping refstepcounter
    % (so labels/anchors behave correctly).
    \setcounter{reviewer}{\numexpr#1-1\relax}%
    \refstepcounter{reviewer}%
  \fi
  \@ifundefined{pdfbookmark}{}{%
    \pdfbookmark[1]{Reviewer \arabic{reviewer}}{hyperref@reviewer\arabic{reviewer}}%
  }%
  \section*{Authors' Response to Reviewer~\arabic{reviewer}}
}

After making this modification, the \reviewer command can now be used in your latex document with a parameter to specify the reviewer number that you want, like this: \reviewer[5]. The result then looks like this:

And now the problem is fixed.

That is all for today, I just wanted to share this solution in case someone has the same problem with reviewresponse.cls.

Posted in Latex | Leave a comment

The Conference Hotel Booking Scam

Posted on 2025-08-31 by Philippe Fournier-Viger

Something interesting happened to me in the last few days. To my knowledge, this seems to be a scam, and to be something relatively new, so I want to share the information.

Here is the context. I will be a keynote speaker at a conference in Asia in a few months, and out of the blue, a company that appeared to be based in the Netherlands contacted me a few days ago by email offering to arrange my hotel accommodation. At first, the email from “ExploreEra Reservations” (reservation.nl@exploreera.info) looked very professional. They mentioned the conference location and month, and politely asked for my exact arrival and departure dates to reserve my hotel room. Their email was worded in the kind of tone you might expect from a real conference travel desk. Here is a screenshot:

But there was some red flag already in this e-mail, such as indicating that they require 30 days to cancel the reservation, which is highly unusual. In fact, a hotel reservation can in general be cancelled in 24 hours for most hotels without fees. But I still responded with basic details about my dates to see what they would say. In the follow-up email, there was more serious red flags. Here is a screenshot:

At about the same time as this, in a separated e-mail, they sent me a PandaDoc form for a hotel booking with a proposed rate of €200 per night, while also asking for personal information and a signature, and there was a weird disclaimer in small print indicating that they are not affiliated to the conference (very suspicious!), and there are HUGE cancellation fees:

Thus, I decided to investigate this. I Googled the proposed hotel name and found that their real rate is more like 20-50 euros per night on Booking DOT com, not 199 euros.

Then, I googled their organization — ExploreEra.info — and quickly discovered that at least two conferences have issued very serious warnings about emails from this domain approaching their attendees to book hotels on their behalf without authorization.

For example, the World Psychiatric Association (WPA) posted an alert noting that emails from ExploreEra.info have been contacting their delegates, pretending to arrange accommodation on behalf of the conference. Here is a screenshot of this warning:

Another event also issued a similar warning:

So, is this a scam? Well, in the emails I have received, they never mentioned directly that they work for the conference, but the emails are worded in a way that gives this impression. And based on the above warnings from other conferences, and the apparently inflated price and 30 days cancellation policy, it seems indeed to be a scam. Thus, be warned!

By the way, there are several messages on Twitter warning about similar schemes, although I dont know if it is from the same people:

Posted in Academia | Tagged conference, exploreera, fraud, hotel booking, scam | Leave a comment

An EXE version of SPMF for Windows

CFP: Special session at SOMET 2026

SPMF 2.65 is released!

Merry X-Mas and Happy New year to SPMF users!

The 1st HP4MoDa workshop was held at BIBM 2025

A prototype of an improved GUI for the SPMF pattern mining software

Upcoming in SPMF 2.64b : The “Pattern Diff Analyzer”

Fixing the reviewresponse.cls LaTeX Class to Allow Multi-Page Comments

The Problem

How to fix reviewresponse.cls for custom reviewer numbering

The Conference Hotel Booking Scam

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

The Problem

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: