In this blog post, I will explain how to call the SPMF data mining library from Python. Generally, there are two ways to call SPMF from Python.
The first way is to use an unofficial Python wrapper for SPMF such as SPMF.py, and you can check the website of that wrapper to know how to use it.
The second way that I will explain below is just to call SPMF as an external program directly from a Python program.
First, you should make sure that you have installed Python and Java on your computer.
Second, you should download the spmf.jar file from the SPMF website and put it in the same directory as your Python program.
Third, you should make sure that your Java installation is correct. In particular, you should be able to execute the java command from the command line of your computer. If you type “java -version” in the command line of your computer, you should see the version of Java:
If you see this, then it is OK.
If you do not see this but instead get an error that java.exe is not found, it means that you have not installed Java, or that the PATH to Java is not setup properly on your computer so you cannot use it from the command line. If you are using the Windows operating System and you have installed Java, you need to make sure that java.exe is in the PATH environment variable. On Windows 11, you can fix this problem as follows: 1) Press WINDOWS + R, 2) Run the command “sysdm.cpl“, 3) Click the Advanced system settings tab. 4) Click Environment Variables. 5) In the section System Variables find the PATH environment variable and select it. 6) Click Edit. Add the path to the folder containing java.exe, which will be something like : C:\Program Files\Java\jdk-17.0.1\bin (depending on your version of Java and where you have installed it). 7) Click OK and close all windows. Then, you can open a new command prompt and try running “java -version” again to see if the problem is fixed. If you are using another version of Windows or the Linux operating system, you can find similar steps online about how to setup Java on your computer.
1) Launching the GUI of SPMF from a Python program
Now that I have explained the basic requirements, I will first show you how to launch the GUI of SPMF from Python. For this, it is very simple. Here I give you the code of a simple Python program that calls the Jar file of SPMF to launch the GUI of SPMF:
import os # Run Apriori os.system( "java -jar spmf.jar")
What this program does? It basically just runs the command
java -jar spmf.jar
By running this Python program, SPMF is successfully launched:
2) Executing an algorithm from SPMF from a Python program
Now, let’s look at something more interesting. How can we run an algorithm from SPMF from Python? We just need to modify the above program a little bit. Let’s say that we want to run the Apriori algorithm on an input file called contextPasquier99.txt (this file is included with SPMF and can be downloaded here).
To do this, we need to first check the documentation of SPMF to see how to run the Apriori algorithm from the command line. The documentation of SPMF is here. How to run Apriori is explained here. We find that we can use this command
java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%
to run Apriori on the file contextPasquier99.txt with the parameter minsup = 40% and to save the result to a file output.txt.
To do this from Python, we can write a Python program like this:
import os # Run Apriori os.system( "java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%")
Then, if we run this program in a folder that contains spmf.jar and contextPasquier99.txt, it will produce the file output.txt as result:
If we open the file “output.txt”, we can see the content:
Each line of this file is a frequent itemset found by the Apriori algorithm. To understand the input and output file format of Apriori, you can see the documentation of the Apriori algorithm.
If you want to call other algorithms that are offered in SPMF beside Apriori, you can lookup the algorithm that you want to call in the SPMF documentation to see how to run it and then change the above program accordingly.
3) Executing an algorithm from SPMF from a Python program and then reading the output file
Now let me show you another example. I will explain how to call an algorithm from an SPMF and then read the output file from a Python program.
Generally, the output of algorithms from SPMF is a text file (such as in the above example). Thus, to read the output of an SPMF algorithm from Python , you just need to know how to read a text file from a Python program.
For example, I modified the previous Python program to read the content of the file “output.txt” that is produced by SPMF to show its content in the console.
This is the modified Python program:
import os # Run Apriori os.system( "java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%") # Read the output file line by line outFile = open("output.txt",'r', encoding='utf-8') for string in outFile: print(string) outFile.close()
If we run the program, it will run the Apriori algorithm and then read the output file and write each line of the output file in the console:
We could further modify this program to do something more meaningful with the content of the output file. But at least, I wanted to show you the basic idea of how to read an output file from SPMF from a Python program.
3) Writing the input file for SPMF from a Python program and then executing an algorithm from SPMF
Lastly, you can also write the input file that is given to SPMF from a Python program by using code to write a text file.
For example, I will modify the example above to write a new text file called “input.txt” that will contain the following data:
1 2 3 4
2 3 4
2 3 4 5 6
1 2 4 5 6
and then I will call SPMF to execute the Apriori algorithm on that file. Then, the program will read the output file “output.txt” from Python. Here is the code:
import os # Write a file f= open("input.txt","w+") f.write("1 2 3 4\r\n") f.write("2 3 4\r\n") f.write("2 3 4 5 6\r\n") f.write("1 2 4 5 6") f.close() # Run Apriori os.system( "java -jar spmf.jar run Apriori input.txt output.txt 40%") # Read the output file line by line outFile = open("output.txt",'r', encoding='utf-8') for string in outFile: print(string) outFile.close()
After running this program, the file “input.txt” is successfully created:
And the content of the output file is shown in the console:
In this blog post, I have shown the basic idea of how to call SPMF from Python by calling SPMF as an external program. This is not something very complicated as it is necessary to only know how to read and write files.
Philippe Fournier-Viger is a full professor and the founder of the open-source data mining software SPMF, offering more than 250 data mining algorithms.