The 1st HP4MoDa workshop was held at BIBM 2025

Today, the first workshop 1st Workshop on Heuristic and Pattern Mining for Multi-Omics Data Analytics was held at IEEE BIBM 2025, online. I co-organize this workshop with M. Saqib Nawaz and other collaborators. The workshop focus on various machine learning and pattern mining methods and their applications to the analysis of multi-omics data.

Here are the slides from the opening ceremony:

It was announced that 8 papers have been accepted by the workshop this year. They cover multiple topics such as interpretable deep learning for regulatory sequence analysis, graph neural networks for single-cell and spatial omics data, heuristic and bio-inspired optimization for protein fitness landscapes, minimum description length and evolutionary approaches for protein compression, advanced sequential and temporal pattern mining methods, and multi-source data fusion models for predicting complex genetic traits.

We also mentioned that we are currently working to organize a special issue for extensions of the papers (to be confirmed later).

We also announced that the best paper award of the workshop was given to this paper by researchers from Canada:

Mahshad Hashemi, Sharjeel Mustafa, Alioune Ngom, and Luis Rueda, HeteroGraphNet: A Ligand–Receptor Informed, Heterophily-Adapted Graph Neural Network for Cell Type Prediction in scRNA-seq Data

There was several interesting presentations on diverse topics. Here for example, a screenshot from the first paper presentation by Lin, Yuexi et al. about Decoding Translation-Related Functional Sequences in 5’ UTRs Using Interpretable Deep Learning Models:

Among the papers, my PhD student presented a new algorithm called GMP for protein sequence compression based on pattern mining. Here are a few slides to show an overview:


This is just a short report about the workshop. It has been a success for the first edition of this workshop. We thus plan to organize it again next year!

Posted in Bioinformatics, Conference | Tagged , , , , , , , | Leave a comment

Another release of SPMF: v2.64b

In November, I released the version 2.64 of SPMF with 5 new algorithms and 2 new tools, among other improvements, but I still had several other pattern mining algorithms waiting to be released after more tests.

Thus, today, I make a new release of SPMF for December 2025, called version 2.64b. This release has several new algorithms:

  • The HMP-SA algorithm for discovering compressing itemsets in a transaction database using a simulated annealing approach (Chen, E. et al., 2026).
  • The HMP-HC algorithm for discovering compressing itemsets in a transaction database using a hill-climbing approach (Chen, E. et al., 2026).
  • the GENMAX algorithm for mining frequent maximal itemsets from a transaction database (Gouda et al., 2005)
  • the DIC algorithm for mining frequent itemsets in a transaction database using dynamic itemset counting (Brin et al. 1997)
  • the Talky-G and Talky-G-Diffset algorithms for mining frequent generator itemsets in a transaction database (Szathmary, 2009)
  • the iMEFIM algorithm for high utility itemset mining, a variation of EFIM that can be faster but can consume more memory (Nguyen et al., 2019)
  • the PUCPMiner algorithm for high utility itemset mining, a variation of FHM with a potentially tighter upper bound (Patel et al., 2022

Moreover, there is a new Pattern Diff Analyzer tool that allows to compare two pattern files to find contrast patterns. This tool works letting the user first select two files containing patterns:

And then we can compute the differences between these files using different contrast methods. For example, we can identify all patterns that are in the first file but not in the second, among multiple other options for identifying contrast patterns:

That is all I wanted to share about this new version! Hope that you will enjoy this new release of the SPMF pattern mining software. So far we have released 18 new algorithms this year! And more features are coming soon…

Posted in Uncategorized | Leave a comment

A prototype of an improved GUI for the SPMF pattern mining software

Recently, I have been working on improving the SPMF data mining software. Something good about SPMF is that it has a simple user interface. But as SPMF has evolved with more and more algorithms, the list of algorithms in the software has become very long and it may be not so easy to browse through the list of algorithms. Thus, I have started to think about upgrading the user interface to make it more user-friendly. Here is some new prototype welcome window for SPMF that I am working on:

This window provides access to all the main features of SPMF through a centralized screen. Thus, the user can clearly focus on the different tasks such as generating data, or choosing a data mining or pattern mining algorithm, or viewing and transforming data. When the user will click on “View and transform data” for example, he will access only the algorithms and tools for viewing and transforming data.

I think that this type of interface can be an improvement over the existing user interface. However, for now, this is only a prototype and I am working on putting this all up together, and testing. I will not release a new interface for SPMF until I am sure that everything works well and that it is good. And I might also leave the option of choosing between the traditional user interface and the new user interface.
If you have any ideas or suggestions to make this better, please leave me a comment below or email me! I think that if work on this user interface go well, maybe it could be released early next year.
Again, thanks for all users of the SPMF pattern mining library for your support!

Posted in Pattern Mining, spmf | Tagged , , , , | Leave a comment

Upcoming in SPMF 2.64b : The “Pattern Diff Analyzer”

Today, I will talk to you about an upcoming feature of SPMF pattern mining software 2.64b, which I think will be very useful to many people. It is a new tool, called the Pattern Diff Analyzer that allows to calculate the contrast between two files containing patterns.

For example, lets say that you extract sequential patterns from two text documents. You can now use this tool to find variations in the patterns found in both document to discover patterns that distinguish each document. Another example is you extract patterns from the genome sequences of two viruses and want to find patterns that differ in the two sequences.

The new Pattern Diff Analyzer tool is very simple to use and looks like this:

In this screen, we can select two files containing patterns found by a pattern mining algorithm. For example, I will use two files called patternsA.txt and patternsB.txt.

After that, we can go to the second tab called “Compute contrast” to find the differences in patterns between these two files. In the picture below, I choose the “SUP” measure (support) for calculating the difference, and I choose “Absolute difference” with the threshold of 10. This means that I want to find all the patterns where the difference in support is more than 10 between the two files. The result is 20 patterns:

I can also choose other contrast methods such “Exclusive in file 1“, which means all patterns that only appear in the first file but not in the second file:

Or similarly, I can choose “Exclusive in file 2“:

There are also other contrast methods available such as the ratio of a pattern’s measure value for file A to that in file B. For example, here I select patterns where the ratio of A to B is at least 1.2:

After discovering the contrast patterns, we can also Export them to a text file for saving these results!

I think this tool will be very useful for classification problems where we want to compare patterns from different classes. Related to classification, note that in SPMF, we also have multiple algorithms for classification using association rules. But this is a different approach.

So today, I just wanted to show you a preview of this upcoming tool in SPMF. I will continue testing and may made some changes before the final release. Also, I will provide an algorithm that could be called from the command line to do the same thing as this Pattern Diff Analyzer tool so that it can be used without the graphical user interface as well.

Posted in Data Mining, Pattern Mining, spmf | Tagged , , , , , , , , | Leave a comment

A new version of SPMF (v2.64, november 2025)!

Today, I am happy to announce that a new version of the SPMF data mining library has been released.

It introduces five new algorithms:

  • the Carpenter algorithm (Pan et al., ICDM 2003), which is specialized for mining closed itemsets in transactions where the transactions contains a large number of items, and the number of transactions is relatively small. This is especially useful for biological data. The implementation is very efficient with multiple optimizations such as transaction merging.
  • the Carpenter Max algorithm, which is a version of Carpenter for mining maximal itemsets by post-processing after applying Carpenter.
  • The HMG_GA algorithms for discovering compressing sequential patterns in sequences using a genetic algorithm (M. Z. Nawaz et al., 2025)
  • The HMG_SA algorithms for discovering compressing sequential patterns in sequences using simulated annealing
  • And the GRIMP algorithm for discovering compressing itemsets in a transaction database using a genetic algorithm (M. Z. Nawaz et al., 2025).

Besides, some various other improvements have been made to the SPMF software, including a new tool in the GUI of SPMF called the Algorithm_Graph_Visualizer to visualize the similar algorithms in terms of input and output or categories as a graph (I had discussed about that in a recent blog post):

Besides, I have improved various graphical interface components such as the MemoryViewer, HistogramViewer, GraphViewer and SPMF Text Editor by fixing a few bugs and adding new features. I have also added a new mode in the TransactionDatabaseViewer to view transactions either as lists or as columns.

If everything goes, I might release more algorithms before the end of this year or early next year. I have several algorithms waiting to be integrated in SPMF!

To all users, thanks again for using the software! And to all contributors: thanks for your help!

If you have any algorithm that you would like to integrate in SPMF, feel free to contact with me by e-mail.

Posted in spmf, Uncategorized | Tagged , , , , , , , , | Leave a comment

GMP: A new algorithm for compressing protein sequences

Today, I am pleased to share that our team has published a new algorithm called GMP for the compression and analysis of protein sequences. The paper will appear next month at BIBM 2025. Here is the full reference:

Nawaz, M. Z., Nawaz, S., Fournier-Viger, P, Niu, X., Li, M. (2025). A Multipurpose Protein Compressor based on MDL and Genetic Algorithm. Proceedings of BIBM 2025.

And you can watch the video of the presentation on Youtube:
Presentation (18 minutes)

Abstract

The rapid expansion of protein sequence databases has created challenges for efficient storage, transmission, and analysis. Unlike genomic sequences with only four nucleotide bases, proteins are composed of twenty amino acids, making compression more complex. Existing specialized protein compressors, such as AC, AC2, and CPM-FCM, have achieved promising performance but still face limitations, including high computational cost, low adaptability, and limited biological interpretability. This paper introduces GMP (Genetic algorithm-based MDL Protein compressor), a novel protein compression framework that leverages the Minimum Description Length (MDL) principle with a genetic algorithm to discover optimal patterns of amino acid subsequences (kAA-mers). Experimental results demonstrate that GMP attains compression performance comparable to state-of-the-art methods while additionally supporting tasks such as classification and clustering—capabilities absent from traditional protein compressors. This makes GMP not only an efficient compression framework but also a biologically interpretable tool for protein sequence analysis. GMP is available at github.com/MuhammadzohaibNawaz/GMP.

Index Terms—Protein sequences, Compression, Genetic Algorithm, Minimum Description Length, kAA-mers

In summary

GMP was designed not only to compress protein sequences but also to provide insights into their structure through the discovery of meaningful subsequence patterns. By integrating MDL with a genetic algorithm, it strikes an effective balance between compression quality and interpretability. One of the unique strengths of GMP is that it can simultaneously serve multiple purposes: compression, classification, clustering, and pattern discovery—functions rarely combined in a single framework. Here is the main flowchart from the paper:

We will release the paper soon after it is published next month at BIBM 2025.

Posted in Uncategorized | Leave a comment

A new tool for visualizing algorithms from SPMF

Today, I wrote some code to visualize the relationships between the algorithms offered in the SPMF open-source pattern mining library. Here is the graph of all algorithms in SPMF (excluding tools) that have the same input and output type:

We can see a few big clusters such as high utility itemset mining algorithms:

Frequent itemset mining algorithms:

Closed itemset mining algorithms:

Episode rule mining algorithms:

Frequent sequential pattern mining algorithms:

Sequential rule mining algorithms:

Frequent episode mining algorithms:

If we only display the algorithms using edges if they have the same input file type (but may not have the output), the graph is different:

Now, there is a huge cluster for itemset mining and association rule mining:

And there is a big cluster for sequence mining algorithms:

And a smaller cluster for episode mining:

and a big cluster for high utility pattern mining:

The tool that I use to draw these pictures is the AlgorithmGraphVisualizer, a new GUI tool, which will be offered in the next version of SPMF. It allows to visualize algorithm relationships with different options and export to PNG. Here is the current GUI interface:

Hope that this has been interesting! The new version of SPMF should come out probably next week!

Posted in Uncategorized | Leave a comment

Fixing the reviewresponse.cls LaTeX Class to Allow Multi-Page Comments

Today, I will show how to fix the Latex reviewresponse.cls class to allow multi-page comments.

If you have ever written a detailed response to reviewers in LaTeX, you may have noticed that long reviewer comments sometimes get cut off instead of continuing on the next page. This happens because the comments are enclosed in non-breakable tcolorbox environments.

The Problem

In the original version of reviewresponse.cls, the environments for reviewer comments look something like this:

\newenvironment{generalcomment}{%
  \begin{tcolorbox}[attach title to upper,
    title={General Comments},
    after title={.\enskip},
    fonttitle={\bfseries},
    coltitle={colorcommentfg},
    colback={colorcommentbg},
    colframe={colorcommentframe},
  ]
}{\end{tcolorbox}}

\newenvironment{revcomment}[1][]{\refstepcounter{revcomment}
  \begin{tcolorbox}[adjusted title={Comment \arabic{revcomment}},
    fonttitle={\bfseries},
    colback={colorcommentbg},
    colframe={colorcommentframe},
    coltitle={colorcommentbg},
    #1
  ]
}{\end{tcolorbox}}

\newenvironment{changes}{\begin{tcolorbox}[colback={colorchangebg},
  colframe={colorchangeframe},enhanced jigsaw,]
}{\end{tcolorbox}}

These definitions produce nice colored boxes, but the problem is that tcolorbox by default does not break across pages. When your reviewer writes a long paragraph, LaTeX tries to keep the entire box on one page, which can result in missing text or strange layout issues.

The Solution

The fix is simple: you need to make the boxes breakable and enhanced. The tcolorbox package provides two key options for this:

  • breakable — allows the content to flow onto the next page.
  • enhanced jigsaw — ensures compatibility with decorations, titles, and other layout features when breaking boxes.

Here is the fixed version of the environments:

\newenvironment{generalcomment}{%
  \begin{tcolorbox}[
    enhanced jigsaw,
    breakable,
    attach title to upper,
    title={General Comments},
    after title={.\enskip},
    fonttitle={\bfseries},
    coltitle={colorcommentfg},
    colback={colorcommentbg},
    colframe={colorcommentframe},
  ]
}{\end{tcolorbox}}

\newenvironment{revcomment}[1][]{%
  \refstepcounter{revcomment}
  \begin{tcolorbox}[
    enhanced jigsaw,
    breakable,
    adjusted title={Comment \arabic{revcomment}},
    fonttitle={\bfseries},
    colback={colorcommentbg},
    colframe={colorcommentframe},
    coltitle={colorcommentbg},
    #1
  ]
}{\end{tcolorbox}}

\newenvironment{revresponse}[1][{}]{%
  \textbf{Response:} #1\par
}{\vspace{4em plus 0.2em minus 1.5em}}

\newenvironment{changes}{%
  \begin{tcolorbox}[
    enhanced jigsaw,
    breakable,
    colback={colorchangebg},
    colframe={colorchangeframe},
  ]
}{\end{tcolorbox}}

Result

After this modification, your reviewer comments and “changes” boxes will automatically continue onto the next page, no matter how long they are. You can now safely include large comments or detailed explanations without worrying about text being cut off.

Conclusion

By simply adding enhanced jigsaw and breakable to the tcolorbox environments, you make your LaTeX review responses much more robust. This small fix prevents truncated comments and keeps your document professional and reviewer-friendly.

Posted in Latex | Tagged , , , , | Leave a comment

How to fix reviewresponse.cls for custom reviewer numbering

Recently, I have found anice Latex class that can be used to write answers to reviewers for the rebuttal of journal papers. This latex class is called reviewresponse.cls, which can be found on GitHub. It allows to write an answer to reviewers with comments such as:

....

\reviewer

\begin{revcomment}
Figure 4 - please include legend to the right or below the main figure as in panel b legend overlaps with line of plot making confusion i interpretation. gentle grey grid in backround will also be valuable for plot investigation.
\end{revcomment}
\begin{revresponse}
    [your answer]
\end{revresponse}
\begin{changes}
    some changes you made
\end{changes}

\begin{revcomment}
    No avaliable implementation.
\end{revcomment}
\begin{revresponse}
     [your answer]
\end{revresponse}
\begin{changes}
    some changes you made
\end{changes}

which will then generate something beautiful like:

However, I have found a problem with this class, which is that the reviewers are automatically numbered as Reviewer 1, 2, 3, 4, 5…. But, in several cases, the reviewers are not numbered sequentially and some numbers may be skipped.

To fix this issue, the solution is to redefine the /reviewer command in reviewresponse.cls as follows:

\newcommand*{\reviewer}[1][]{%
  \clearpage
  % If no optional argument, step the counter as before.
  \if\relax\detokenize{#1}\relax
    \refstepcounter{reviewer}%
  \else
    % If an argument was given, set reviewer to N-1 then refstep to N.
    % Using \numexpr avoids the off-by-one problem while keeping refstepcounter
    % (so labels/anchors behave correctly).
    \setcounter{reviewer}{\numexpr#1-1\relax}%
    \refstepcounter{reviewer}%
  \fi
  \@ifundefined{pdfbookmark}{}{%
    \pdfbookmark[1]{Reviewer \arabic{reviewer}}{hyperref@reviewer\arabic{reviewer}}%
  }%
  \section*{Authors' Response to Reviewer~\arabic{reviewer}}
}

After making this modification, the \reviewer command can now be used in your latex document with a parameter to specify the reviewer number that you want, like this: \reviewer[5]. The result then looks like this:


And now the problem is fixed.

That is all for today, I just wanted to share this solution in case someone has the same problem with reviewresponse.cls.

Posted in Latex | Leave a comment

The Conference Hotel Booking Scam

Something interesting happened to me in the last few days. To my knowledge, this seems to be a scam, and to be something relatively new, so I want to share the information.

Here is the context. I will be a keynote speaker at a conference in Asia in a few months, and out of the blue, a company that appeared to be based in the Netherlands contacted me a few days ago by email offering to arrange my hotel accommodation. At first, the email from “ExploreEra Reservations” (reservation.nl@exploreera.info) looked very professional. They mentioned the conference location and month, and politely asked for my exact arrival and departure dates to reserve my hotel room. Their email was worded in the kind of tone you might expect from a real conference travel desk. Here is a screenshot:

But there was some red flag already in this e-mail, such as indicating that they require 30 days to cancel the reservation, which is highly unusual. In fact, a hotel reservation can in general be cancelled in 24 hours for most hotels without fees. But I still responded with basic details about my dates to see what they would say. In the follow-up email, there was more serious red flags. Here is a screenshot:

At about the same time as this, in a separated e-mail, they sent me a PandaDoc form for a hotel booking with a proposed rate of €200 per night, while also asking for personal information and a signature, and there was a weird disclaimer in small print indicating that they are not affiliated to the conference (very suspicious!), and there are HUGE cancellation fees:


Thus, I decided to investigate this. I Googled the proposed hotel name and found that their real rate is more like 20-50 euros per night on Booking DOT com, not 199 euros.

Then, I googled their organization — ExploreEra.info — and quickly discovered that at least two conferences have issued very serious warnings about emails from this domain approaching their attendees to book hotels on their behalf without authorization.

For example, the World Psychiatric Association (WPA) posted an alert noting that emails from ExploreEra.info have been contacting their delegates, pretending to arrange accommodation on behalf of the conference. Here is a screenshot of this warning:

Another event also issued a similar warning:


So, is this a scam? Well, in the emails I have received, they never mentioned directly that they work for the conference, but the emails are worded in a way that gives this impression. And based on the above warnings from other conferences, and the apparently inflated price and 30 days cancellation policy, it seems indeed to be a scam. Thus, be warned!

By the way, there are several messages on Twitter warning about similar schemes, although I dont know if it is from the same people:

Posted in Academia | Tagged , , , , | Leave a comment