There is always a debate about the performance of different programming languages, which is one aspect among many for choosing a programming language. As the founder of the SPMF data mining library, implemented in Java and focused on performance, I am always curious about the comparison with other languages.
Today, I was reading the VEPRECO paper by Mordvanyuk et al. (2022), and I found some interesting comparison that they did between the Java implementation of CM-SPAM (from SPMF) and an implementation in Python. I will reproduce the table from the paper showing the comparison in Memory (MB) and Time (s):

The authors conclude that : “The results show that CM-SPAM implemented in java is from 10 to 24 times faster than its implementation in python, and consumes from 2 to 45 times less memory, depending on the dataset. This means, that in java with lower support, we spend much less memory, than in python with higher support.”
I think that this is a very interesting observation. I dont know if the Python implementation is as optimized as the Java version. I cannot verify as it does not seems to be publicly available, and the BitBucket repository from the paper is apparently down. But I would assume that they are more or less equally optimized. So the performance difference is quite big in fact.
By the way, I will include an implementation of VEPRECO in the upcoming version of SPMF 2.77 and a few other very recent tsequential pattern mining algorithms that have been contributed by some researchers such as TriBackClo (2026).




