Analyzing the source code of the SPMF data mining software

Hi everyone,

In this blog post, I will discuss how I have applied an open-source tool that is named Code Analyzer ( http://sourceforge.net/projects/codeanalyze-gpl/ ) to analyze the source code of my open-source data mining software named SPMF.

I have applied the tool on the previous version (0.92c) of SPMF, and the tool prints the following result:

Metric               Value
——————————-   ——–
    Total Files                     360
Total Lines                   50457
Avg Line Length                  30
    Code Lines                   31901
    Comment Lines               13297
Whitespace Lines                6583
Code/(Comment+Whitespace) Ratio        1,60
Code/Comment Ratio                2,40
Code/Whitespace Ratio            4,85
Code/Total Lines Ratio            0,63
Code Lines Per File                  88
    Comment Lines Per File              36
Whitespace Lines Per File              18

Now, what is interesting is the difference when I apply the same tool on the latest version of SPMF (0.93). It gives the following result:

Metric               Value
——————————-   ——–
    Total Files                     280
Total Lines                   53165
Avg Line Length                  32
    Code Lines                   25455
    Comment Lines               23208
Whitespace Lines                5803
Code/(Comment+Whitespace) Ratio        0,88
   Code/Comment Ratio                1,10
Code/Whitespace Ratio            4,39
Code/Total Lines Ratio            0,48
Code Lines Per File                  90
    Comment Lines Per File              82
Whitespace Lines Per File              20

As you can see by these statistics, I have done a lot of refactoring for the latest version. There is now 280 files instead of 360 files. Moreover, I have shrunk the code from 31901 lines to 25455 lines, without removing any functionnalities!

Also, I have added a lot of comments to SPMF. The “Code/Comment” ratio has thus changed from 2.40 to 1.10, and the “Comment Lines per files” went up from 36 to 82 lines. Totally, there is now around 10,000 more lines of comments than in the previous version (the number of lines of comments has increased from 13297 to 23208).

That’s all I wanted to write for today! If you like this blog, you can subscribe to the RSS Feed or my Twitter account (https://twitter.com/philfv) to get notified about future blog posts. Also, if you want to support this blog, please tweet and share it!

Correlation does not imply causation

Discovering and visualizing sequential patterns in web log data using SPMF and GraphViz

Discovering the Top-K Stable Periodic Patterns in a Sequence of Events

Analyzing the source code of the SPMF data mining software

Related posts:

2 Responses to Analyzing the source code of the SPMF data mining software

Leave a Reply Cancel reply

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Analyzing the source code of the SPMF data mining software

Related posts:

2 Responses to Analyzing the source code of the SPMF data mining software

Leave a Reply Cancel reply

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: