Upcoming in SPMF 2.64b : The “Pattern Diff Analyzer”

Today, I will talk to you about an upcoming feature of SPMF pattern mining software 2.64b, which I think will be very useful to many people. It is a new tool, called the Pattern Diff Analyzer that allows to calculate the contrast between two files containing patterns.

For example, lets say that you extract sequential patterns from two text documents. You can now use this tool to find variations in the patterns found in both document to discover patterns that distinguish each document. Another example is you extract patterns from the genome sequences of two viruses and want to find patterns that differ in the two sequences.

The new Pattern Diff Analyzer tool is very simple to use and looks like this:

In this screen, we can select two files containing patterns found by a pattern mining algorithm. For example, I will use two files called patternsA.txt and patternsB.txt.

After that, we can go to the second tab called “Compute contrast” to find the differences in patterns between these two files. In the picture below, I choose the “SUP” measure (support) for calculating the difference, and I choose “Absolute difference” with the threshold of 10. This means that I want to find all the patterns where the difference in support is more than 10 between the two files. The result is 20 patterns:

I can also choose other contrast methods such “Exclusive in file 1“, which means all patterns that only appear in the first file but not in the second file:

Or similarly, I can choose “Exclusive in file 2“:

There are also other contrast methods available such as the ratio of a pattern’s measure value for file A to that in file B. For example, here I select patterns where the ratio of A to B is at least 1.2:

After discovering the contrast patterns, we can also Export them to a text file for saving these results!

I think this tool will be very useful for classification problems where we want to compare patterns from different classes. Related to classification, note that in SPMF, we also have multiple algorithms for classification using association rules. But this is a different approach.

So today, I just wanted to show you a preview of this upcoming tool in SPMF. I will continue testing and may made some changes before the final release. Also, I will provide an algorithm that could be called from the command line to do the same thing as this Pattern Diff Analyzer tool so that it can be used without the graphical user interface as well.

This entry was posted in Data Mining, Pattern Mining, spmf and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *