Analyzing the source code of SPMF (5 years later)

Five years ago, I had analyzed the source code of the SPMF data mining software using an open-source tool called CodeAnalyzer ( http://sourceforge.net/projects/codeanalyze-gpl/ ). This had provided some interesting insights about the structure of the project, especially in terms of lines of codes and code to comment ratio. In 2013, for SPMF 0.93, the results were as follows:

Metric                Value
——————————-    ——–
    Total Files                     280
Total Lines                   53165
Avg Line Length                  32
    Code Lines                   25455
    Comment Lines               23208
Whitespace Lines                5803
Code/(Comment+Whitespace) Ratio        0,88
   Code/Comment Ratio                1,10
Code/Whitespace Ratio            4,39
Code/Total Lines Ratio            0,48
Code Lines Per File                  90
 Comment Lines Per File              82
Whitespace Lines Per File              20

Today, in 2018 I decided to analyze the code of SPMF again to get an overview of how the code has evolved over the last few years. Here are the result for the current version of SPMF (2.35):

Metric Value
——————————- ——–
Total Files 1385
Total Lines 238938
Avg Line Length 32
Code Lines 118117
Comment Lines 91241
Whitespace Lines 32797
Code/(Comment+Whitespace) Ratio 0,95
Code/Comment Ratio 1,29
Code/Whitespace Ratio 3,60
Code/Total Lines Ratio 0,49
Code Lines Per File 85
Comment Lines Per File 65
Whitespace Lines Per File 23

Many numbers remain more or less the same. But it is quite amazing to see that the number of lines of code has increased from 25,455 to 118,117 lines. The project is thus about four times larger now. This is in part due to contributions from many people, in recent years, while at the beginning the software was mainly developed by me. The total number of lines may still not seem very big for a software. However, most of the code is quite optimized and implement complex algorithms. Thus, many of these lines of code took quite a lot of time to write.

The number of comment lines has also increased, from 23,208 to 91,241 lines. But the ratio of code to comment lines has slightly increased. Thus, perhaps that adding some more comments is needed.

What is next for SPMF? Currently, I am preparing to release a new version of SPMF, which will include about 10 new algorithms. It should be released in about 1 or 2 weeks, as I need to finish other things first.

That is all for today! If you have comments or questions, please post them in the comment section below.


Philippe Fournier-Vigeris a full professor working in China and founder of the SPMF open source data mining software.

This entry was posted in Data Mining, Data science, open-source, spmf and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *