How to call SPMF from R?

In previous blog posts, I have explained how to call SPMF as an external program from Python and how to call SPMF from C#. Today, I will explain how to call SPMF from an R program.

Requirements

Since SPMF is implemented in Java, the first requirement is to make sure that Java is installed on your computer. And of course, you need to also install R on your computer to run R programs.

Second, you should download the spmf.jar file from the SPMF website.

Third, you should make sure that your Java installation is correct. In particular, you should be able to execute the java command from the command line (terminal) of your computer because we will use the java command to call SPMF. If you type “java -version” in the command line of your computer, you should see the version of Java:

If you see this, then it is OK.

If you do not see this but instead get an error that java.exe is not found, it means that you have not installed Java, or that the PATH to Java is not setup properly on your computer so you cannot use it from the command line. If you are using the Windows operating System and you have installed Java, you need to make sure that java.exe is in the PATH environment variable. On Windows 11, you can fix this problem as follows: 1) Press WINDOWS + R, 2) Run the command “sysdm.cpl“, 3) Click the Advanced system settings tab. 4) Click Environment Variables. 5) In the section System Variables find the PATH environment variable and select it. 6) Click Edit. Add the path to the folder containing java.exe, which will be something like : C:\Program Files\Java\jdk-17.0.1\bin (depending on your version of Java and where you have installed it). 7) Click OK and close all windows. Then, you can open a new command prompt and try running “java -version” again to see if the problem is fixed. If you are using another version of Windows or the Linux operating system, you can find similar steps online about how to setup Java on your computer.

1) Launching the GUI of SPMF from R

Now that I have explained the basic requirements, I will first show you how to launch the GUI of SPMF from R. For this, it is very simple. Here I give you the code of a simple R program that calls the Jar file of SPMF to launch the GUI of SPMF.

#Set the working directory to the folder containing spmf.jar
setwd("C:\\Users\\philippe\\Desktop\\")

#Call SPMF as an external program
system("java -jar C:\\Users\\philippe\\Desktop\\spmf.jar")

What this program does? It basically just runs the command
java -jar spmf.jar

By running this program, SPMF is successfully launched:

spmf data mining interface

2) Executing an algorithm from SPMF from a R program

Next, I will explain something more useful, that is how to run an algorithm from SPMF from an R program? We will modify the above program to do this. Let’s say that we want to run the Apriori algorithm on an input file called contextPasquier99.txt (this file is included with SPMF and can be downloaded here).

To do this, we need to first check the documentation of SPMF to see how to run the Apriori algorithm from the command line. The documentation of SPMF is here. How to run Apriori is explained in this page of the documentation. We find that we can use this command

java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%

to run Apriori on the file contextPasquier99.txt with the parameter minsup = 40% and to save the result to a file output.txt.

To do this from R, we can write a simple R program like this:

#Set the working directory to the folder containing spmf.jar and the file contextPasquier99.txt
setwd("C:\\Users\\philippe\\Desktop\\")

#Call SPMF as an external program to run the Apriori algorithm
system("java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%")

Then, it will produce the file output.txt as result in the workind directory:

If we open the file “output.txt”, we can see the content:

Each line of this file is a frequent itemset found by the Apriori algorithm. To understand the input and output file format of Apriori, you can see the documentation of the Apriori algorithm.

If you want to call other algorithms that are offered in SPMF besides Apriori, you can lookup the algorithm that you want to call in the SPMF documentation. An example is provided for each algorithm in the SPMF documentation and explanation of how to run it.

3) Executing an algorithm from SPMF from a R program and then reading the output file

Now, I will explain how to read the output file produce by SPMF from an R program. When running an algorithm of SPMF such as in the previous example, the output is generally a text file. We can easily read an output file from R to obtain the content.

For instance, I modified the previous R program to read the content of the file “output.txt” that is produced by SPMF to show its content in the console. The new R program is below:

#Set the working directory to the folder containing spmf.jar and the file contextPasquier99.txt setwd("C:\\Users\\philippe\\Desktop\\") 

#Call SPMF as an external program to run the Apriori algorithm 
system("java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%")

# Read the output file line by line and print to the console
myCon = file(description = "output.txt", open="r", blocking = TRUE)
repeat{
  pl = readLines(myCon, n = 1) # Read one line from the connection.
  if(identical(pl, character(0))){break} # If the line is empty, exit.
  print(pl) # Otherwise, print and repeat next iteration.
}
close(myCon)
rm(myCon) 

If we execute this R program, it will first call the Apriori algorithm from SPMF. Then, the R program will read the content of the output file output.txt line by line and display the content in the console:

We could further modify this program to do something more meaningful with the content of the output file such as reading the content in R data frames to do further processing. But at least, I wanted to show you the basic idea of how to read an output file from SPMF from an R program.

3) Writing an input file for SPMF from a R program, and then running an algorithm from SPMF

Lastly, you can also write the input file that is given to SPMF from a R program by using code to write a text file.

For example, I will modify the example above to write a new text file called “input.txt” that will contain the following data:

1 2 3 4
2 3 4
2 3 4 5 6
1 2 4 5 6

and then I will call SPMF to execute the Apriori algorithm on that file. Then, the program will read the output file “output.txt” from R. Here is the code:

#Set the working directory to the folder containing spmf.jar and the file contextPasquier99.txt setwd("C:\\Users\\philippe\\Desktop\\") 

#  Write an input file for Apriori
file.create("input.txt")
sink("input.txt")                                  
cat("1 2 3 4\r\n")
cat("2 3 4\r\n")
cat("2 3 4 5 6\r\n")
cat("1 2 4 5 6")
sink()

#Call SPMF as an external program to run the Apriori algorithm 
system("java -jar spmf.jar run Apriori input.txt output.txt 40%")

# Read the output file line by line and print to the console
myCon = file(description = "output.txt", open="r", blocking = TRUE)
repeat{
  pl = readLines(myCon, n = 1) # Read one line from the connection.
  if(identical(pl, character(0))){break} # If the line is empty, exit.
  print(pl) # Otherwise, print and repeat next iteration.
}
close(myCon)
rm(myCon) 

After running this program, the file “input.txt” is successfully created:

And the content of the output file output.txt is shown in the console:

Conclusion

In this blog post, I have shown the basic idea of how to call SPMF from R by calling SPMF as an external program. It is quite simple. It just require to know how to read/write files in R.

Hope that this has been interesting.

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Data Mining, Data science, open-source, spmf | Tagged , , , , , , , , | 6 Comments

How to call SPMF from Python (as an external program)?

In this blog post, I will explain how to call the SPMF data mining library from Python. Generally, there are two ways to call SPMF from Python.

The first way is to use an unofficial Python wrapper for SPMF such as SPMF.py, and you can check the website of that wrapper to know how to use it.

The second way that I will explain below is just to call SPMF as an external program directly from a Python program.

Requirements

First, you should make sure that you have installed Python  and Java on your computer.

Second, you should download the spmf.jar file from the SPMF website and put it in the same directory as your Python program.

Third, you should make sure that your Java installation is correct. In particular, you should be able to execute the java command from the command line of your computer. If you type “java -version” in the command line of your computer, you should see the version of Java:

If you see this, then it is OK.

If you do not see this but instead get an error that java.exe is not found, it means that you have not installed Java, or that the PATH to Java is not setup properly on your computer so you cannot use it from the command line. If you are using the Windows operating System and you have installed Java, you need to make sure that java.exe is in the PATH environment variable. On Windows 11, you can fix this problem as follows: 1) Press WINDOWS + R, 2) Run the command “sysdm.cpl“, 3) Click the Advanced system settings tab. 4) Click Environment Variables. 5) In the section System Variables find the PATH environment variable and select it. 6) Click Edit. Add the path to the folder containing java.exe, which will be something like : C:\Program Files\Java\jdk-17.0.1\bin (depending on your version of Java and where you have installed it). 7) Click OK and close all windows. Then, you can open a new command prompt and try running “java -version” again to see if the problem is fixed. If you are using another version of Windows or the Linux operating system, you can find similar steps online about how to setup Java on your computer.

1) Launching the GUI of SPMF from a Python program

Now that I have explained the basic requirements, I will first show you how to launch the GUI of SPMF from Python. For this, it is very simple. Here I give you the code of a simple Python program that calls the Jar file of SPMF to launch the GUI of SPMF:

import os
# Run Apriori
os.system( "java -jar spmf.jar")

What this program does? It basically just runs the command
java -jar spmf.jar

By running this Python program, SPMF is successfully launched:

2) Executing an algorithm from SPMF from a Python program

Now, let’s look at something more interesting. How can we run an algorithm from SPMF from Python? We just need to modify the above program a little bit. Let’s say that we want to run the Apriori algorithm on an input file called contextPasquier99.txt (this file is included with SPMF and can be downloaded here).

To do this, we need to first check the documentation of SPMF to see how to run the Apriori algorithm from the command line. The documentation of SPMF is here. How to run Apriori is explained here. We find that we can use this command

java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%

to run Apriori on the file contextPasquier99.txt with the parameter minsup = 40% and to save the result to a file output.txt.

To do this from Python, we can write a Python program like this:

import os

# Run Apriori
os.system( "java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%")

Then, if we run this program in a folder that contains spmf.jar and contextPasquier99.txt, it will produce the file output.txt as result:

If we open the file “output.txt”, we can see the content:

Each line of this file is a frequent itemset found by the Apriori algorithm. To understand the input and output file format of Apriori, you can see the documentation of the Apriori algorithm.

If you want to call other algorithms that are offered in SPMF beside Apriori, you can lookup the algorithm that you want to call in the SPMF documentation to see how to run it and then change the above program accordingly.

3) Executing an algorithm from SPMF from a Python program and then reading the output file

Now let me show you another example. I will explain how to call an algorithm from an SPMF and then read the output file from a Python program.

Generally, the output of algorithms from SPMF is a text file (such as in the above example). Thus, to read the output of an SPMF algorithm from Python , you just need to know how to read a text file from a Python program.

For example, I modified the previous Python program to read the content of the file “output.txt” that is produced by SPMF to show its content in the console.

This is the modified Python program:

import os

# Run Apriori
os.system( "java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%")

# Read the output file line by line
outFile = open("output.txt",'r', encoding='utf-8')
for string in outFile:
    print(string)
outFile.close()

If we run the program, it will run the Apriori algorithm and then read the output file and write each line of the output file in the console:

We could further modify this program to do something more meaningful with the content of the output file. But at least, I wanted to show you the basic idea of how to read an output file from SPMF from a Python program.

3) Writing the input file for SPMF from a Python program and then executing an algorithm from SPMF

Lastly, you can also write the input file that is given to SPMF from a Python program by using code to write a text file.

For example, I will modify the example above to write a new text file called “input.txt” that will contain the following data:

1 2 3 4
2 3 4
2 3 4 5 6
1 2 4 5 6

and then I will call SPMF to execute the Apriori algorithm on that file. Then, the program will read the output file “output.txt” from Python. Here is the code:

import os

# Write a file
f= open("input.txt","w+")
f.write("1 2 3 4\r\n")
f.write("2 3 4\r\n")
f.write("2 3 4 5 6\r\n")
f.write("1 2 4 5 6")
f.close()

# Run Apriori
os.system( "java -jar spmf.jar run Apriori input.txt output.txt 40%")

# Read the output file line by line
outFile = open("output.txt",'r', encoding='utf-8')
for string in outFile:
    print(string)
outFile.close()

After running this program, the file “input.txt” is successfully created:

And the content of the output file is shown in the console:

Conclusion

In this blog post, I have shown the basic idea of how to call SPMF from Python by calling SPMF as an external program. This is not something very complicated as it is necessary to only know how to read and write files.

I have also written some blog posts about how to call SPMF as an external program from R,  and how to call SPMF from C#.

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 250 data mining algorithms. 

Posted in Data Mining, Data science, open-source, Pattern Mining, spmf | Tagged , , , , , , , , , , , , | 3 Comments

How to call SPMF from C#?

This blog post is the first of a series of blog post on using SPMF from different programming languages. Today, I will explain how to call the SPMF data mining library from C#. In other blog posts, I explain how to call SPMF from R and how to call SPMF from Python.

Requirements

First, you should make sure that you have installed C# and Java on your computer.

Second, you should download the spmf.jar file from the SPMF website and put it in the same directory as your C# program.

Third, you should make sure that your Java installation is correct. In particular, you should be able to execute the java command from the command line of your computer. If you type “java -version” in the command line of your computer, you should see the version of Java:

If you see this, then it is OK.

If you do not see this but instead get an error that java.exe is not found, it means that you have not installed Java, or that the PATH to Java is not setup properly on your computer so you cannot use it from the command line. If you are using the Windows operating System and you have installed Java, you need to make sure that java.exe is in the PATH environment variable. On Windows 11, you can fix this problem as follows: 1) Press WINDOWS + R, 2) Run the command “sysdm.cpl“, 3) Click the Advanced system settings tab. 4) Click Environment Variables. 5) In the section System Variables find the PATH environment variable and select it. 6) Click Edit. Add the path to the folder containing java.exe, which will be something like : C:\Program Files\Java\jdk-17.0.1\bin (depending on your version of Java and where you have installed it). 7) Click OK and close all windows. Then, you can open a new command prompt and try running “java -version” again to see if the problem is fixed. If you are using another version of Windows or the Linux operating system, you can find similar steps online about how to setup Java on your computer.

1) Launching the GUI of SPMF from C#

Now that I have explained the basic requirements, I will first show you how to launch the GUI of SPMF from C#. For this, it is very simple. Here I give you the code of a simple C# program that calls the Jar file of SPMF to launch the GUI of SPMF:

using System.Diagnostics;

Process myProcess = new Process();
myProcess.StartInfo.UseShellExecute = false;
myProcess.StartInfo.RedirectStandardOutput = true;
myProcess.StartInfo.FileName = "java";
myProcess.StartInfo.Arguments = "-jar spmf.jar"; // You could also put a path to the jar file like C:\\Users\\philippe\\Desktop\\
myProcess.Start();
myProcess.WaitForExit(); // If you want to wait for the program to terminate

What this program does? It basically just runs the command
java -jar spmf.jar

By running this C# program, SPMF is successfully launched:

2) Executing an algorithm from SPMF from a C# program

Now, let’s look at something more interesting. How can we run an algorithm from SPMF from C#? We just need to modify the above program a little bit. Let’s say that we want to run the Apriori algorithm on an input file called contextPasquier99.txt (this file is included with SPMF and can be downloaded here).

To do this, we need to first check the documentation of SPMF to see how to run the Apriori algorithm from the command line. The documentation of SPMF is here. How to run Apriori is explained here. We find that we can use this command

java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%

to run Apriori on the file contextPasquier99.txt with the parameter minsup = 40% and to save the result to a file output.txt.

To do this from C#, we can write a C# program like this:

using System.Diagnostics;

Process myProcess = new Process();
myProcess.StartInfo.UseShellExecute = false;
myProcess.StartInfo.RedirectStandardOutput = true;
myProcess.StartInfo.FileName = "java";
myProcess.StartInfo.Arguments = "-jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%";
myProcess.Start();
myProcess.WaitForExit(); // If you want to wait for the program to terminate

Then, if we run this program in a folder that contains spmf.jar and contextPasquier99.txt, it will produce the file output.txt as result:

If we open the file “output.txt”, we can see the content:

Each line of this file is a frequent itemset found by the Apriori algorithm. To understand the input and output file format of Apriori, you can see the documentation of the Apriori algorithm.

If you want to call other algorithms that are offered in SPMF beside Apriori, you can lookup the algorithm that you want to call in the SPMF documentation to see how to run it and then change the above program accordingly.

3) Executing an algorithm from SPMF from a C# program and then reading the output file

Now let me show you another example. I will explain how to call an algorithm from an SPMF and then read the output file from a C# program.

Generally, the output of algorithms from SPMF is a text file (such as in the above example). Thus, to read the output of an SPMF algorithm from C#, you just need to know how to read a text file from a C# program.

For example, I modified the previous C# program to read the content of the file “output.txt” that is produced by SPMF to show its content in the console.

This is the modified C# program:

using System.Diagnostics;
using System;
using System.IO;

// Call Apriori
Process myProcess = new Process();
myProcess.StartInfo.UseShellExecute = false;
myProcess.StartInfo.RedirectStandardOutput = true;
myProcess.StartInfo.FileName = "java";
myProcess.StartInfo.Arguments = "-jar spmf.jar run Apriori contextPasquier99.txt output.txt 40%";
myProcess.Start();
myProcess.WaitForExit(); // wait for the program to terminate

// Read the output file 
using(StreamReader file = new StreamReader("output.txt")) {  
 string ln;  
  
 while ((ln = file.ReadLine()) != null) {  
  Console.WriteLine(ln);  
 }  
 file.Close();  
} 

If we run the program, it will run the Apriori algorithm and then read the output file and write each line of the output file in the console:

We could further modify this program to do something more meaningful with the content of the output file. But at least, I wanted to show you the basic idea of how to read an output file from SPMF from a C# program.

3) Writing an input file for SPMF from a C# program, and then running an algorithm from SPMF

Lastly, you can also write the input file that is given to SPMF from a C# program by using code to write a text file.

For example, I will modify the example above to write a new text file called “input.txt” that will contain the following data:

1 2 3 4
2 3 4
2 3 4 5 6
1 2 4 5 6

and then I will call SPMF to execute the Apriori algorithm on that file. Then, the program will read the output file “output.txt” from C#. Here is the code:

using System.Diagnostics;
using System;
using System.IO;

// Write a file
using (StreamWriter writer = File.CreateText(@"input.txt"))
{
    writer.WriteLine("1 2 3 4");
    writer.WriteLine("2 3 4");
    writer.WriteLine("2 3 4 5 6");
    writer.WriteLine("1 2 4 5 6");
}

// Call Apriori
Process myProcess = new Process();
myProcess.StartInfo.UseShellExecute = false;
myProcess.StartInfo.RedirectStandardOutput = true;
myProcess.StartInfo.FileName = "java";
myProcess.StartInfo.Arguments = "-jar spmf.jar run Apriori input.txt output.txt 40%";
myProcess.Start();
myProcess.WaitForExit(); // wait for the program to terminate

// Read the output file 
using(StreamReader file = new StreamReader("output.txt")) {  
 string ln;  
  
 while ((ln = file.ReadLine()) != null) {  
  Console.WriteLine(ln);  
 }  
 file.Close();  
} 

After running this program, the file “input.txt” is successfully created:

And the content of the output file is shown in the console:

Conclusion

In this blog post, I have shown the basic idea of how to call SPMF from C# by calling SPMF as an external program. It is quite simple. It just require to know how to read/write files in C#.

Hope that this has been interesting.

I will later post similar tutorials for other programming languages such as Python, and R so as to make it easier to use SPMF from other languages.

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Data Mining, Data science, open-source, spmf | Tagged , , , , , , , , , | 4 Comments

Happy New Year 2023!

To all readers of this blog, I would like to wish you a happy new year 2023!

Recently, I have been very busy due to the end of semester and also having COVID. But now, I am fully recovered.

Two things that I would like to talk about.

  1. Recently, I have learnt the sad news that a researcher from the field of data mining has passed away, who was still very young (around 50 years old). He was a good researcher and also a very friendly person that I have met several times at conferences. This reminds me that life can be short and it is important to enjoy it and also take care of your health. As researchers, we often work very hard. But, we should also think about having more balance between work and other aspects of life so as to be more healthy, and also to do sport, eat well and sleep well. I talk more about success and health for researchers in this blog post.
  2. On a different topic, I have recently released a new version of the SPMF data mining software (SPMF version 2.59, which you can download here). It offers two new tools: a graph viewer and an algorithm explorer (which I previously described on this blog), and also three new algorithms for periodic pattern mining, contributed by Prof. Vincent Nofong. I recommend to check out these new algorithms (PPFP, NPFPM and SRPFPM). They offer several possibilities for further research and applications.

This was just a short blog post to wish you a happy new year, talk to you about life, and tell you about the new version of SPMF.

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in General, Uncategorized | Tagged , , | Leave a comment

Brief report about the BDA 2022 conference

This week, I wasattending the BDA 2022 conference, which is the 10th International Conference on Big Data Analytics. The BDA 2022 conference washeld in Hyderabad, India from the 20th to 22nd December 2022.

The BDA conference is an international conference organized in India, which is quite good and is published by Springer in the Lecture Notes in Computer Sciences series. I have previously attended in 2019 (see my report of BDA 2019). This year, I am attending it again as co-author of a paper and also as a keynote speaker and moderator for a panel.

My report for this conference is a little short because I have been a bit sick during the conference and could not attend all the presentations due to this.

Proceedings

All papers in BDA are published in Springer LNCS which gives good visibility and indexing.

Opening session

I have first attended the opening session. Below are some slides from the opening session that provide some interesting information.

The BDA conference is held every year in different cities in India:

The registration fee is quite reasonable:

About the program of BDA 2022, there was 41 valid submissions, from which 14 were accepted, that is 7 short papers and 7 full papers.

There was several keynotes at the BDA conference. This year, I am one of the keynote speakers. I gave a talk about pattern mining.

There was also several invited talks, some from IBM and the Bank of America, which is quite interesting.

There was also a panel on big data analysis for attaining sustainability. I am one of the two moderators for this panel.

A lot of people are working behind the scene for this conference:

Paper presentations

There was many paper presentations.

My collaborator from Tunisia, Prof. Khaled Belghith presented a paper on using high utility itemsets for transaction embeddings, which may be interesting for those working on pattern mining as it is a kind of bridge between machine learning and pattern mining:


Belghith, K., Fournier-Viger, P., Jawadi, J. (2022). Hui2Vec: Learning Transaction Embedding Through High Utility Itemsets. Proc. of 10th Intern. Conf. on Big Data Analytics (BDA 2022), Springer, to appear.

Panel on data science for sustainable development goals

There was a very interesting panel on data science for sustainable development goals with Prof. Masaru Kitsuregawa (The University of Tokyo), Prof. Longbing Cao (University of Technology of Sydney), Prof. Yun Sing Koh (University of Auckland), and Jaideep Srivastava (University of Minosota).

The panelists brought several interesting perspectives. In particular some cases study was discussed about algae bloom detection and about environmental monitoring. Besides, some other topics were discussed such as the importance of large data centers, health monitoring devices, data collection, and data sovereignty to name a few. The four invited experts also talked about the challenges of interdisciplinary work.

Conclusion

This is a short report about BDA 2022 because I have been sick during the event and I did not attend many activities due to this. But the BDA conference in general is a well-organized conference. The program is good with many excellent guests and speakers, and I also I know several researchers involved in this conference. Thus, I will be looking forward to attending it again.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in Big data, Conference, Data Mining, Data science | Tagged , , , , | Leave a comment

Unethical reviewers in academia (part 3)

Previously, I wrote two blog posts about unethical reviewers in academia (part 1 and part 2). It is not that I like this topic, but today, I will talk again about that. Why? Because, I keep encountering them, unfortunately. It is something very common.

What is an unethical reviewer? As I explained in previous blog posts, there are various types of unethical behaviors that a reviewer may have such as (1) reviewing his own papers, (2) reviewing papers while having some other conflict of interests, or just (3) asking authors to cite his papers to boost his citation count.

Recently, the last case happened again. A collaborator had a paper rejected by two reviewers. And the two reviewers asked to cite between 3 to 5 irrelevant papers. One of the reviewer even put some comments that were unrelated to the paper, which shows that he did not even took the time to do his work seriously, in a hurry to boost his citations. This is some unprofessional behavior and result in wasting time and reduce the quality of the peer-review process.

Personally, every time that this happens, I am a bit angry and because of this phenomenon I think that there are several people in academia that do not care about research and honesty. It is for example, easy to find the profiles of some researchers on Google Scholar who suddenly have thousands of citations but that come from random journals, so it is obvious that they cheat rather than obtaining citations due to the quality of their research work.

So what to do in this situation?

Unfortunately, the balance of power is unequal between authors and reviewers. For the authors who submit a paper to a journal, if the paper is rejected due to unethical reviews, what can he do? He can write an e-mail to the associate-editor or editor-in-chief to complain but from my experience, decisions are almost never reversed in a journal. In fact, I have never seen the option of reversing a decision to even be available in paper management systems from journals. In the best case, maybe the editor could ask to submit the paper again but usually editors are very busy (some journals receive thousands of papers per year!) and I think many editors do not want to take care of authors who argue about the decisions of papers no matter what is the reason.

So what else could be done?

In my opinion, even if has few chances of working, the best is perhaps to send an e-mail to the associate editor and/or editor-in-chief to report the unethical behavior. Maybe that the reviewer could then be blacklisted or that a note could be put in its user profile of the management system as a result. But I would still not bet on this…

In my opinion, if we want something to change about this, the main persons who have power over a journal are the publishers, the societies that are responsible of these journals (e.g. ACM and IEEE), and the companies that take care of impact factors and other academic metrics and rankings of journals.

For example, in a famous case several years ago, an IEEE journal (the IEEE Transactions on Industrial Informatics) lost its impact factor due to citation stacking (artificially increasing the number of self-citations). Losing the impact factor is a serious consequence for the journal that can make things change. So a possibility to make things change is to also complain to the publisher or affiliated societies. This can have some impact although I did not see this happen often.

Another possibility would be to create an online public website where every researcher could upload the potentially unethical reviews that they have received. These reviews could be categorized by journals, and perhaps by authors of papers that reviewers ask to cite. This could show some interesting trends and could perhaps make some things to change. But it would also require to have some moderator to verify such website, and who would take care of this? It would certainly not be a perfect solution and perhaps that people would still find a way to game that system…

Another possibility is to have some external persons that occasionally check what is happening inside the different journals to evaluate them. I think that this is something that does exist. But I do not think that it is for all journals and obviously in some journals nothing is changing over the years.

That is all for today. I just wanted to post my thoughts about this topic once again but this time by discussing also some other solutions.


Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

Posted in Academia | Tagged , , , , | Leave a comment

CFP Special issue on Data Science to Transform How We Work and Live

I co-organize a special issue in the DSE (Data Science and Engineering) journal, published by Springer, about how data science can be used to transform how we work and live.

Guest Editors:

  • Yee Ling Boo, RMIT University, Melbourne, Australia
  • Manik Gupta, BirIa Institute of Technology and Science, Pilani (BITS Pilani), Hyderabad, India
  • Weijia Zhang, Southeast University, Nanjing, China
  • Philippe Fournier-Viger, Shenzhen University, Shenzhen, China

The planned schedule is as follows:

Submission Deadline: 30 April 2023
Expected publication: December 2023

See the special issue webpage for the DSE journal for more details:
Data Science and Engineering | Call for papers: special issue on “the innovative use of data science to transform how we work and live” (springer.com)

Posted in cfp | Tagged , , , , , | Leave a comment

Brief report about the BIBM 2022 conference

This week, I have attended the BIBM 2022 (IEEE International Conference on Bioinformatics and Biomedicine) conference online. I will give a brief report about this conference, although I did not have time to attend many sessions.

BIBM 2022 was held from the 6th to 8th December 2022 in Las Vegas, USA and from Changsha, China. However, due to the COVID situation, attendance in Changsha has been changed to online attendance.

BIBM is an international conference for bioinformatics. Interestingly, it has been held for over 15 years while being most of the times organized either in USA or China.

A bilingual conference

BIBM is actually held partly in English and partly in Chinese. Some sessions from USA were completely in English while some session from China are completely in Chinese. But authors from USA and China could in theory watch the talks from the other location. Talks from USA could be watched using Zoom, while talks from Changsha could be watched using Tencent Meetings.

Proceedings of BIBM

The proceedings of the BIBM conference are published by IEEE. These proceedings contain regular papers, workshop papers as well as poster papers. It will be interesting for some authors that all these papers are published in the same proceedings.

The proceedings were quite large with numerous papers. It was a PDF file of over 1 GB and over 3900 pages! Luckily, the Edge browser of Windows on my computer can open such large file without problem. B

This year, there was also a good range of workshops to choose from, with over 25 workshops hosted at BIBM. So even if a paper was not accepted as regular paper, there was a good choice of workshop to choose from for publishing a paper.

Day 1 – Opening ceremony and first keynote in Las Vegas

On the first day, I stayed up late (until 1:30AM) to watch the conference opening in the USA, which was in the US. Unfortunately, there was some technical problem at the conference site in Las Vegas and we could not hear anything during the first maybe 45 minutes. The sound came back during the first keynote, which was about bioinformatics. Due to the technical problem I did not get much information from the opening. I will try to get the slides from the opening and update this post later with the information.

Day 1 – Opening ceremony and first keynote in Changsha

Then, after waking up, I attended the second opening ceremony for people attending from Changsha China that gave also some general information about the conference. The opening and the following keynote talks from that session were given in Chinese.

The opening was explained that some part of the registration fee will be refunded to attendees as the conference is online, which is good news. The details will be explained after the conference.

And also, from what I understood, a best paper award was announced.

Below, I show some pictures from the opening in Changsha. There was a lot of attendees (over 250 at some point!).

And here are a few slides from the first keynote on “Brain imaging pattern recognition methods and imaging representation of mental disorders” by Huafu Chen, which had some interesting content.

A second keynote was by Min Li about “Computational solutions to explore genomic 3D organization. Here are a few slides:

An interesting paper on sequential patterns for insurance claims

I also noticed a very interesting paper on utilizing sequential patterns with the CM-SPAM algorithm and periodic patterns with the LPP-Growth algorithm to analyze courses of medical treatment to obtain insights about anomalous insurance claims.

Kemp, J., Barker, C., Good, N., Bain, M. (2022) Sequential pattern detection for identifying courses of treatment and anomalous claim behaviour in medical insurance. Proc. of BIBM 2022.

That is a cool topic and a real application. Using the periodic patterns and sequential patterns, previously unknown anomalous claim patterns were found, which confirmed previously suspected anomalous claim pattern. Authors said that up to $486,617.60 in potentially recoverable costs were identified and a benefit of using a pattern mining approach is interpretability.

Analyzing COVID protein structures

My research team also published a paper about the analysis and classification of protein structures from COVID-19. You may read the paper below:

Nawaz, S., Fournier-Viger, P, He, Y. (2022). S-PDB: Analysis and Classification of SARS-COV2 Spike Protein Structures. HPC4COVID-19 workshop at IEEE BIBM 2022, to appear.

Registration fee

Although the program of the BIBM conference is good, the registration fee of BIBM is I would say very expensive at 985 USD to publish a single paper. I think this is the main drawback that I see from this conference, which would make me think twice about publishing there again (since some conferences are much cheaper). But with the refund due to the conference being online (which is expected to be around 300 USD), the price is now more reasonable.

Next year: BIBM 2023

It was announced that BIBM 2023 will be held in Istanbul, Turkey. That is a nice location.

Conclusion

This was a short blog post to talk about the BIBM 2022 conference, which I have attended. Hope it is interesting.

Posted in Bioinformatics, Conference | Tagged , , , , , | Leave a comment

SPMF Upcoming feature: Graph viewer

Today, I will give you a preview of another upcoming feature of SPMF, which will be released in the next version of SPMF (2.59). It is the Graph Viewer tool.

The Graph Viewer is a simple tool for visualizing graphs. The Graph Viewer is designed to display graphs that can be directed or undirected, and have labels. The Graph Viewer can also automatically choose an appropriate layout for visualizing a graph.

Why a Graph Viewer in SPMF? It will be used to allow users to visualize input files containing graphs and output files containing frequent subgraphs. This is useful to visualize the inpu files of frequent subgraph mining algorithms such as gSpan, cgSpan and TKG, as well as the patterns that are discovered by these algorithms (frequent subgraphs).

I have completely implemented the Graph Viewer in Java, without using external libraries so as to avoid dependencies and to make it as lightweight and fast as possible, a long-time design goal of SPMF. In fact, unlike many other data mining libraries and open-source projects, SPMF do not have any external dependencies and the code is well optimized. This ensure the stability of the project and avoid problems that could arise from relying on external libraries.

Let me now show you the current features of the Graph Viewer, which may still be updated or improved in the final release.

Opening a graph file

The first feature is to open an input file containing one or more graphs. This is done by selecting the Graph Viewer tool:

Then, let’s say that we open the example file contextTKG.txt offered in SPMF, which contains three graphs. The Graph viewer will display graphs in a window like this:

Here we see the third graph from the file. At the bottom, there are two buttons < > for navigating to the previous or the next graph. In the above picture, the third graph is shown (Graph 3 of 3). This graph has ID 3, and contains 4 nodes and 4 edges, as indicated in the bottom part of the Window. The nodes are displayed with a text of the form x:y where x is the node ID and y is the node label. Edges are displayed in blue color with their labels.

To display the graph in a pleasant way, I have implemented a forced directed graph layout algorithm, which is the Fruchterman/Reingold (1991) algorithm. It automatically places the nodes in an appropriate location so that the graph can be displayed in a beautiful way.

Opening a pattern file

We can also use the graph viewer tool to display the frequent subgraphs found by an algorithm such as TKG. For example, here I apply the TKG algorithm and select the “Graph Viewer” tool to open the result file.

The result is 16 frequent subgraphs, which are displayed by the Graph Viewer as follows:

In the above picture, we see the frequent subgraph 9. We can use the buttons <> to move to the previous or next frequent subgraph, and thus view all of the 18 subgraphs that have been found. The support of each subgraph is displayed.

Moving the graph nodes with the mouse

Another feature of the Graph Viewer is that we can move the nodes with the mouse by dragging them over the panel:

Running the graph viewer from the command line

It will be also be possible to call the Graph Viewer from the command line, just like almost all algorithms and tools from SPMF. For example, if we put the spmf.jar file in the same folder as the file contextTKG.txt, we can apply this command:

java -jar spmf.jar run Open_graph_database_file_with_graph_viewer contextTKG.txt

Then, this will start the Graph Viewer to display the file:

Displaying other types of graphs

The Graph Viewer is designed in a quite general way so that it could also display other types of graphs and be used for other functions in SPMF in the future. For example, below, I show an example graph that is created programmatically rather than by reading a file.

I use this example to show the display of directed and undirected edges. Also, we can also see that the automatic layout algorithm works quite well and display the graph in a proper way. Here is another example:

In the Java code, we can also change how the nodes are displayed. I did not offer this option in the user interface as I think it is less important though. What do you think?

Update: Choosing different types of graph layout

I had one hour of free time this morning, so I decided to add an option to choose different types of graph layout algorithm. For example, here we see three types of layout:

1) Using the Fruchterman/Reingold (1991) algorithm:

2) Using a random layout:

3) Using a grid layout:

4) Using a circle layout:

I might add more graph layout algorithms later. I think that these algorithms are quite interesting.

Update 2: a few more features

I have fixed some bugs and added a few more improvements. There is now a panel which can show the textual representations of graphs that are displayed (right side on picture below)). There is also a new button to save a graph visualization to PNG. Moreover, there is a button to resize the canvas so as to be able to show larger graphs.

Conclusion

Hope that this blog post has been interesting. My goal was to show you some upcoming feature, which I think will be useful for those working on frequent subgraph mining. If you have some suggestions to improve this tool, you may let me know in the comments below. I will consider them. Also, I might still improve this tool before it is released.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in Data Mining, Data science, open-source, spmf | Tagged , , , , , , | Leave a comment

SPMF upcoming feature: Algorithm Explorer

Today, I want to show you an upcoming feature of SPMF, which is a new tool called the Algorithm Explorer to explore the list of algorithms and tools offered in SPMF. It is actually a simple tool, but I think it can be useful, as there are many algorithms.

Note that this is a preview of the tool, as it will released in the next version of SPMF (2.59).

To open the new Algorithm Explorer, in the GUI of SPMF, we have to choose:

Then, this will open the new tool called Algorithm Explorer, where algorithms are shown as a tree on the left, classified by categories, and we can see information about the selected algorithm on the right:

In the above picture, we selected the AFEM-Rules algorithm. Thus, we can see the name of the algorithm, the category, the authors of the implementation, the link to the example page, the input and output types as well as the parameters.

Searching for similar algorithms

Update: I also added two buttons that allows to search for algorithms that are similar to a selected algorithm. More precisely, we can search for (1) algorithms with the same input, output and mandatory parameters, and (2) algorithms with the same input and output but that may not have the same parameters.

For example, if we select the RuleGrowth algorithm for sequential rule mining on the left and click on this button:

It will highlight all algorithms that have the same input and output types as RuleGrowth:

And if we instead click on this button:

It will highlight the algorithms that not only have the same input and output types as RuleGrowth but also the same mandatory parameters:

In this case, we can notice that TRuleGrowth is not included anymore because although it has the same input type and output type as RuleGrowth, it has an extra parameter that is the window length.

Let me show you one more example. Let’s say that I choose a high utility itemset mining algorithm like EFIM. Then, I can quickly find that many algorithms have the same input and output types and also the same mandatory parameters:

Conclusion

That is the current version of this tool. I will think about other potential improvements. If you have any suggestions, you may tell me in the comments below, either to add more functions or improve the user interface. I will try to take them into account, when I have time.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.


Posted in open-source, spmf | Tagged , , , | 1 Comment