The Data Blog

SPMF 2.60 is released!

Posted on 2024-04-21 by Philippe Fournier-Viger

This is a short message today to announce that the new version of SPMF 2.60 is finally released!

This is a major version as it contains many new things. The full lists of changes can be found on the download page. Some of the main improvements are 18 new algorithms, 21 new tools to visualize different types of data, several improvements to the user interface (some are less visible than others), and also several tools that are added like a workflow editor for running more than one algorithm one after the other, some new tools for data generation and transformation. Here is a picture of a few new windows in the graphical user interface among several：

Besides, for developers of algorithms, a collection of new data structures optimized for primitive types (int, double, etc.) are provided in the package ca.pfv.spmf.datastructures.collections, which can replace several standard Java data structures to speed up algorithms or reduce the memory usage. Here is a screenshot of some of those data structures:

I have also fixed several bugs in the software (thanks to all users who reported them). It is possible that some bugs remain, especially because there is a lot of new code. If you find any problems, please let me know at philfv AT qq DOT com. Also, you can let me know about your suggestions for improvements, if you have some ideas. 🙂 If you also want to contribute code to SPMF, please contact with me (for example, if you want that I integrate your algorithm in the software.

Thanks again to all users of SPMF and the contributors, who support this project and make it better.

Serious issues with Time Series Anomaly Detection Research

Brief report about the SMARTDSC 2022 conference

SPMF 3.0: Towards even more efficiency

Posted in Data Mining, Data science, Java, Pattern Mining, spmf | Tagged algorithms, data mining, data science, fast, implementations, java, open source, pattern mining, software, spmf | Leave a comment

How to download an offline copy of the SPMF documentation?

Posted on 2024-04-15 by Philippe Fournier-Viger

Today, I will show you how to download an offline copy of the SPMF documentation.

In the upcoming version 2.60 of SPMF, you can run this algorithm to open the windows of developpers tools:

Then you can click here to open the tool to download an offline copy of the SPMF documentation:

This will open a window to start the download:

Then, you will have a local copy of the SPMF documentation on your computer and the main page is documentation.html:

If you want to download a copy of the SPMF documentation directly using Java code. Here is how it is done:


import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/*
 * Copyright (c) 2022 Philippe Fournier-Viger
 *
 * This file is part of the SPMF DATA MINING SOFTWARE
 * (http://www.philippe-fournier-viger.com/spmf).
 *
 * SPMF is free software: you can redistribute it and/or modify it under the
 * terms of the GNU General Public License as published by the Free Software
 * Foundation, either version 3 of the License, or (at your option) any later
 * version.
 *
 * SPMF is distributed in the hope that it will be useful, but WITHOUT ANY
 * WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
 * A PARTICULAR PURPOSE. See the GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along with
 * SPMF. If not, see <http://www.gnu.org/licenses/>.
 */
/**
 * This is a tool to download an offline copy of the SPMF documentation.
 * 
 * @author Philippe Fournier-Viger
 *
 */
public class AlgoSPMFDownloadDoc {

	/** The URLs that have been already downloaded */
	Set<String> alreadyDownloaded;

	/** Method to run this algorithm
	 */
	public void runAlgorithm() {
		alreadyDownloaded = new HashSet();
		String mainUrl = "https://philippe-fournier-viger.com/spmf/index.php?link=documentation.php";
		String folderPath = "doc";
		createDirectory(folderPath);
		savePage(mainUrl, folderPath + "/documentation.html", mainUrl);

		BufferedReader br = null;
		try {
			// Download the main documentation page
			URL url = new URL(mainUrl);
			HttpURLConnection conn = (HttpURLConnection) url.openConnection();
			br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
			String inputLine;
			StringBuilder content = new StringBuilder();
			while ((inputLine = br.readLine()) != null) {
				content.append(inputLine);
				content.append(System.lineSeparator());
			}

			// Replace all .php references with .html in the content
			String updatedContent = content.toString().replaceAll("\\.php", ".html");

			// Save CSS files
			Pattern cssPattern = Pattern.compile("href=\"(.*?\\.css)\"");
			Matcher cssMatcher = cssPattern.matcher(updatedContent);
			while (cssMatcher.find()) {
				String cssLink = cssMatcher.group(1);
				savePage(cssLink, folderPath + "/" + cssLink.substring(cssLink.lastIndexOf('/') + 1), mainUrl);
			}

			// Save pages and images that start with "Example"
			Pattern examplePattern = Pattern.compile("<a href=\"([^\"]+)\">Example");
			Matcher exampleMatcher = examplePattern.matcher(updatedContent);
			while (exampleMatcher.find()) {
				String exampleLink = exampleMatcher.group(1);
				savePage(exampleLink, folderPath + "/" + exampleLink, mainUrl);

			}

		} catch (MalformedURLException e) {
			System.err.println("The URL provided is not valid: " + mainUrl);
			e.printStackTrace();
		} catch (IOException e) {
			System.err.println("An I/O error occurred while processing the URL: " + mainUrl);
			e.printStackTrace();
		} finally {
			if (br != null) {
				try {
					br.close();
				} catch (IOException e) {
					System.err.println("An error occurred while closing the BufferedReader.");
					e.printStackTrace();
				}
			}
		}
	}

	/**
	 * Method to create a folder
	 * @param folderPath the path
	 */
	private void createDirectory(String folderPath) {
		try {
			Files.createDirectories(Paths.get(folderPath));
		} catch (IOException e) {
			System.err.println("An error occurred while creating the directory: " + folderPath);
			e.printStackTrace();
		}
	}

	/**
	 * Method to save a webpage
	 * @param urlString the url
	 * @param filePath the filepath where it should be saved
	 * @param baseUri the base URI
	 */
	private void savePage(String urlString, String filePath, String baseUri) {
		if (alreadyDownloaded.contains(urlString)) {
			return;
		}
		alreadyDownloaded.add(urlString);

		BufferedReader reader = null;
		try {
			URL url;
			// Check if the URL is absolute or relative
			if (urlString.startsWith("http://") || urlString.startsWith("https://")) {
				url = new URL(urlString);
			} else {
				// Convert relative URL to absolute URL
				URI base = new URI(baseUri);
				url = base.resolve(urlString).toURL();
			}

			HttpURLConnection conn = (HttpURLConnection) url.openConnection();
			reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
			StringBuilder contentBuilder = new StringBuilder();
			String inputLine;
			while ((inputLine = reader.readLine()) != null) {
				contentBuilder.append(inputLine);
				contentBuilder.append(System.lineSeparator());
			}

			// Change the file extension from .php to .html
			filePath = filePath.replace(".php", ".html");

			// Update links in the content
			String content = contentBuilder.toString();
			content = content.replaceAll("href=\"([^\"]+).php\"", "href=\"$1.html\"");
			content = content.replaceAll("https://www.philippe-fournier-viger.com/spmf/index.php\\?link=documentation\\.html", "documentation.html");

	        // Find and save images
	        Pattern imgPattern = Pattern.compile("src=\"([^\"]+\\.(png|jpg))\"");
	        Matcher imgMatcher = imgPattern.matcher(content);
	        while (imgMatcher.find()) {
	            String imgLink = imgMatcher.group(1);
	            String imgName = imgLink.substring(imgLink.lastIndexOf('/') + 1);
	            saveImage(imgLink, "doc/" + imgName, baseUri);
	        }
	        
			// Save the updated content to file
			Files.write(Paths.get(filePath), content.getBytes(StandardCharsets.UTF_8));
		} catch (URISyntaxException e) {
			System.err.println("The URI provided is not valid: " + urlString);
			e.printStackTrace();
		} catch (MalformedURLException e) {
			System.err.println("A malformed URL has occurred for the URI: " + urlString);
			e.printStackTrace();
		} catch (IOException e) {
			System.err.println("An I/O error occurred while saving the page: " + urlString);
			e.printStackTrace();
		} finally {
			if (reader != null) {
				try {
					reader.close();
				} catch (IOException e) {
					System.err.println("An error occurred while closing the BufferedReader.");
					e.printStackTrace();
				}
			}
		}
	}
	
	/**
	 * Method to save an image
	 * @param urlString the url
	 * @param filePath the filepath where it should be saved
	 * @param baseUri the base URI
	 */
	private void saveImage(String urlString, String filePath, String baseUri) {
		if (alreadyDownloaded.contains(urlString)) {
			return;
		}
		alreadyDownloaded.add(urlString);
		
	    InputStream in = null;
	    try {
	        URL url;
	        // Check if the URL is absolute or relative
	        if (urlString.startsWith("http://") || urlString.startsWith("https://")) {
	            url = new URL(urlString);
	        } else {
	            // Convert relative URL to absolute URL
	            URI base = new URI(baseUri);
	            url = base.resolve(urlString).toURL();
	        }
	        
	        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
	        in = conn.getInputStream();
	        Files.copy(in, Paths.get(filePath), StandardCopyOption.REPLACE_EXISTING);
	    } catch (URISyntaxException e) {
	        System.err.println("The URI provided is not valid: " + urlString);
	        e.printStackTrace();
	    } catch (MalformedURLException e) {
	        System.err.println("A malformed URL has occurred for the URI: " + urlString);
	        e.printStackTrace();
	    } catch (IOException e) {
	        System.err.println("An I/O error occurred while saving the image: " + urlString);
	        e.printStackTrace();
	    } finally {
	        if (in != null) {
	            try {
	                in.close();
	            } catch (IOException e) {
	                System.err.println("An error occurred while closing the InputStream.");
	                e.printStackTrace();
	            }
	        }
	    }
	}

}

Hope that this blog post has been interesting. The new version 2.60 of SPMF will be released in the next few days.

DSSBA 2023, 2nd Special Session on Data Science for Social and Behavioral Analytics @DSAA

An introduction to frequent pattern mining

Brief report about the SMARTDSC 2022 conference

Posted in spmf | Tagged data mining, documentation, java, pattern mining, python, software, spmf | Leave a comment

Some interesting statistics about SPMF

Posted on 2024-04-10 by Philippe Fournier-Viger

While I am preparing the next version of Java SPMF data mining software (2.60), here are some interesting statistics about the project, that I have generated directly from the metadata provided by SPMF. Here it is:

The number of algorithms implemented per person (based on metadata)
Note: this is generated automatically according to the metadata of each algorithm in SPMF using the class DescriptionOfAlgorithm, and some author names are spelled in multiple ways, and may contain some errors. The full list of contributors of SPMF is displayed on the SPMF website.

Philippe Fournier-Viger	206
Yang Peng	12
Antonio Gomariz Penalver	9
Jayakrushna Sahoo	6
Jerry Chun-Wei Lin	5
Lu Yang	5
Chen YangMing	5
Wei Song et al.	5
Yangming Chen	5
Wei Song	4
Ting Li	4
Azadeh Soltani	4
Peng Yang and Philippe Fournier-Viger	4
Nader Aryabarzan	4
Vincent M. Nofong modified from Philippe Fournier-Viger	3
Cheng-Wei Wu et al.	3
Zhihong Deng	3
Prashant Barhate	3
Chaomin Huang et al.	3
Jiaxuan Li	3
Zhitian Li	3
Antonio Gomariz Penalver & Philippe Fournier-Viger	3
Yimin Zhang	2
Chaomin Huang	2
Nouioua et al.	2
Ting Li et al.	2
Philippe Fournier-Viger and Yuechun Li	2
Song et al.	2
Fournier-Viger et al.	2
Saqib Nawaz et al.	2
Chao Cheng and Philippe Fournier-Viger	2
Zevin Shaul et al.	2
Alan Souza	2
Rathore et al.	2
Bay Vo et al.	2
Junya Li	2
Ryan Benton and Blake Johns	2
Siddharth Dawar et al.	2
Yanjun Yang	2
Siddhart Dawar et al.	2
Huang et al.	1
M.	1
C.W. Wu et al.	1
Philippe Fournier-Viger and Cheng-Wei Wu	1
Sacha Servan-Schreiber	1
Dhaval Patel	1
jnfrancis	1
Cheng-Wei. et al.	1
Ganghuan He and Philippe Fournier-Viger	1
Siddharth Dawar	1
Improvements by Nouioua et al.	1
Philippe Fournier-Viger and Chao Cheng	1
Yang Peng et al.	1
Salvemini E	1
Java conversion by Xiang Li and Philippe Fournier-Viger	1
Alex Peng et al.	1
Hoang Thanh Lam	1
Souleymane Zida	1
F.	1
Shifeng Ren	1
Lanotte	1
github: limuhangk	1
Youxi Wu et al.	1
Hazem El-Raffiee	1
Jiakai Nan	1
Ahmed El-Serafy	1
Souleymane Zida and Philippe Fournier-Viger	1
Feremans et al.	1
Han J.	1
Shi-Feng Ren	1
Fumarola F	1
Vikram Goyal	1
P. F.	1
Petijean et al.	1
Srinivas Paturu	1
Malerba D	1
& Malerba	1
Ashish Sureka	1
Fumarola	1
Ying Wang and Peng Yang and Philippe Fournier-Viger	1
Sabarish Raghu	1
Wu et al.	1
D.	1
Srikumar Krishnamoorty	1
Siddharth Dawar et al	1
Ceci	1
Wu	1

The number of algorithms per category

HIGH-UTILITY PATTERN MINING	83
FREQUENT ITEMSET MINING	54
SEQUENTIAL PATTERN MINING	48
TOOLS – DATA VIEWERS	22
TIME SERIES MINING	16
ASSOCIATION RULE MINING	16
TOOLS – DATA TRANSFORMATION	15
PERIODIC PATTERN MINING	13
EPISODE MINING	10
EPISODE RULE MINING	10
CLUSTERING	10
SEQUENTIAL RULE MINING	10
GRAPH PATTERN MINING	6
TOOLS – DATA GENERATORS	5
TOOLS – STATS CALCULATORS	4
TOOLS – SPMF GUI	4
TOOLS – RUN EXPERIMENTS	1
PRIVACY-PRESERVING DATA MINING	1

The number of algorithms per type

DATA_MINING	259
DATA_PROCESSOR	30
DATA_VIEWER	25
DATA_GENERATOR	5
OTHER_GUI_TOOL	4
DATA_STATS_CALCULATOR	4
EXPERIMENT_TOOL	1

The number of algorithms for each input data type

Transaction database (194)
Simple transaction database (80)
Transaction database with utility values (77)
Sequence database (73)
Simple sequence database (48)
Transaction database with timestamps (17)
Time series database (16)
Sequence database with timestamps (9)
Database of double vectors (8)
Labeled graph database (6)
Graph database (6)
Transaction database with utility values and time (5)
Multi-dimensional sequence database with timestamps (4)
Text file (4)
Multi-dimensional sequence database (4)
Time interval sequence database (3)
Sequence database with utility values (3)
Transaction database with utility values and taxonomy (3)
Transaction database with shelf-time periods and utility values (3)
Transaction database with utility values (HUQI) (3)
Sequence database with cost and binary utility (3)
Simple time interval sequence database (3)
Frequent closed itemsets (3)
Sequence database with cost and numeric utility (2)
Transaction database with utility values skymine format (2)
Transaction database with profit information (2)
Uncertain transaction database (2)
ARFF file (2)
Transaction database with utility and cost values (2)
Sequence database with strings (2)
Dynamic Attributed Graph (2)
Simple sequence database with strings (2)
Sequence database with utility and probability values (2)
Cost sequence database (2)
Sequential patterns (1)
Set of text documents (1)
Sequence database in non SPMF format (1)
Clusters (1)
Sequence (1)
Single sequence (1)
Transaction database with utility values (MEMU) (1)
Transaction database in non SPMF format (1)

The number of algorithms for each output data type

High-utility patterns (91)
High-utility itemsets (60)
Frequent patterns (56)
Sequential patterns (51)
Frequent itemsets (37)
Frequent sequential patterns (30)
Database of instances (22)
Association rules (16)
Episodes (15)
Time series database (14)
Periodic patterns (13)
Transaction database (12)
Periodic frequent patterns (12)
Sequential rules (11)
Episode rules (10)
Frequent closed itemsets (9)
Frequent closed sequential patterns (8)
Simple transaction database (8)
Sequence database (8)
Closed itemsets (8)
Top-k High-utility itemsets (7)
Closed high-utility itemsets (7)
Simple sequence database (6)
Frequent Sequential patterns (6)
Clusters (6)
Closed patterns (6)
Frequent episodes (6)
Frequent sequential rules (5)
Rare itemsets (5)
Rare patterns (5)
High average-utility itemsets (5)
Frequent episode rules (5)
Skyline patterns (4)
Subgraphs (4)
Generator patterns (4)
High-Utility episodes (4)
Generator itemsets (4)
Cost-efficient Sequential patterns (3)
Skyline High-utility itemsets (3)
Frequent sequential generators (3)
Frequent subgraphs (3)
Frequent itemsets with multiple thresholds (3)
Local Periodic frequent itemsets (3)
Correlated patterns (3)
Quantitative high utility itemsets (3)
Maximal patterns (2)
Cross-Level High-utility itemsets (2)
Multi-dimensional frequent closed sequential patterns (2)
Maximal itemsets (2)
High-utility probability sequential patterns (2)
Frequent maximal sequential patterns (2)
Density-based clusters (2)
Periodic frequent itemsets common to multiple sequences (2)
On-shelf high-utility itemsets (2)
Multi-dimensional frequent closed sequential patterns with timestamps (2)
Top-k frequent sequential rules (2)
Frequent maximal itemsets (2)
Frequent closed and generator itemsets (2)
Closed association rules (2)
Frequent time interval sequential patterns (2)
Perfectly rare itemsets (2)
Sequence Database with timestamps (2)
Top-k frequent sequential patterns (2)
Top-k High-Utility episodes (2)
Transaction database with utility values (2)
Closed and generator patterns (2)
Periodic high-utility itemsets (2)
Minimal rare itemsets (2)
Trend patterns (2)
Correlated High-utility itemsets (2)
Generator high-utility itemsets (2)
Association rules with lift and multiple support thresholds (2)
Rare correlated itemsets common to multiple sequences (1)
Productive Periodic frequent itemsets (1)
Peak high-utility itemsets (1)
Indirect association rules (1)
Top-k non-redundant association rules (1)
Top-k class association rules (1)
Database of double vectors (1)
Uncertain frequent itemsets (1)
Non-redundant Periodic frequent itemsets (1)
Transaction database with utility values and time (1)
Top-k Stable Periodic frequent itemsets (1)
Rare correlated itemsets (1)
Frequent sequential rules with strings (1)
Top-k frequent episodes (1)
Minimal itemsets (1)
High-utility association rules (1)
Uncertain patterns (1)
Ordered frequent sequential rules (1)
Top-k association rules (1)
High-utility itemsets with length constraints (1)
High-utility generator itemsets (1)
Multi-dimensional frequent sequential patterns with timestamps (1)
Local high-utility itemsets (1)
High-utility sequential rules (1)
Periodic frequent itemsets (1)
Density-based cluster ordering of points (1)
Frequent sequential patterns with occurrences (1)
Frequent sequential patterns with timestamps (1)
Frequent closed sequential patterns with timestamps (1)
Minimal high-utility itemsets (1)
Significant Trend Sequences (1)
Frequent fuzzy itemsets (1)
Periodic rare patterns (1)
Top-k Frequent subgraphs (1)
Minimal patterns (1)
Stable Periodic frequent itemsets (1)
Maximal high-utility itemsets (1)
Self-Sufficient Itemsets (1)
Text clusters (1)
Multi-dimensional frequent sequential patterns (1)
Top-k frequent non-redundant sequential rules (1)
Cost transaction database (1)
Hierarchical clusters (1)
Top-k frequent sequential patterns with leverage (1)
Compressing sequential patterns (1)
Minimal non-redundant association rules (1)
Sporadic association rules (1)
High-utility rules (1)
Locally trending high-utility itemsets (1)
Skyline Frequent High-utility itemsets (1)
High-utility sequential patterns (1)
Progressive Frequent Sequential patterns (1)
Erasable itemsets (1)
Attribute Evolution Rules (1)
Generators of high-utility itemsets (1)
Multiple Frequent fuzzy itemsets (1)
Correlated itemsets (1)
Multi-Level High-utility itemsets (1)
Association rules with lift (1)
Top-k sequential patterns with quantile-based cohesion (1)
Erasable patterns (1)
Frequent generator itemsets (1)
Frequent high-utility itemsets (1)

Conclusion

Hope that this is interesting 🙂 If you have any comments, please leave them below.

A brief report about the IEA AIE 2020 conference

How to Detect and Classify Metamorphic Malware with Sequential Pattern Mining (MalSPM)

Serious issues with Time Series Anomaly Detection Research

Posted in Data Mining, Data science, spmf | Tagged big data, data mining, data science, itemset mining, pattern mining, spmf | Leave a comment

Sneak peak at the new user interface of SPMF (part 3)

Posted on 2024-04-03 by Philippe Fournier-Viger

Today, I would like to talk to you about another upcoming feature of the next version of SPMF (2.60), which will be released soon. It will be a Workflow Editor that will allow the user to select multiple algorithms from the user interface and run them one after the other such that each algorithm will take as input the output of the preceding algorithm. This will solve one limitation of SPMF, which is that the user can only run one algorithm from the user interface.

Here is a brief overview of this new feature. The user interface looks like this:

On the left, there is a space to visualize a workflow consisting of multiple algorithms and on the right details are displayed about the current algorithm.

To use it, first, we click on “Add an algorithm”. This will create a new node for an algorithm like this:

Then, on the right, we need to select the algorithm and set its parameters. For example, I will choose Eclat and set its parameter to 0.8:

Then, as you observe on the left, two orange boxes have appeared that symbolize the input and output of the algorithm. I can click on the input box and then choose an input file:

Then, I could also set the output file name in the same way. Now after that, I can also add another algorithm to be run after Eclat. I can click again on “Add an algorithm” and choose some algorithm:

This means that after running Eclat, the output file will be open by the system text editor.

Now, the workflow has two algorithms. I can click on another button called “Run” (which I did not show until now to execute the workflow and information about the execution will be displayed in a console:

This is just a preview of some new feature of SPMF called the Workflow Editor. It works already but there are still a few bugs that need to be fixed before it can be released. The user interface may change in the final release and if you have any suggestions please leave your comments below!

UDML 2022 workshop on utility mining and learning is back at ICDM

UDML 2020 – Utility Driven Mining and Learning Workshop

SPMF 3.0: Towards even more efficiency

Posted in Data Mining, open-source, spmf | Tagged algorithm, association rule, data, data mining, graph mining, gui, itemset, pattern, pattern mining, sequential pattern mining, spmf | Leave a comment

ChatGPT, LLMs and homework

Posted on 2024-03-29 by Philippe Fournier-Viger

Today, I want to talk briefly about ChatGPT and similar large language models (LLMs) and how they are used by students in universities. From what I observe, I believe that many students are using LLMs in universities nowadays. Among this, some students use LLMs to get ideas and suggestions on their work. But other students will rather use LLMs to avoid working and quickly generate reports and essays, as well as to write code for their assignments. These students often believe that text generated by LLMs cannot be detected by teachers.

But this is false. From my experience, it is quite easy to know which documents submitted by students have been generated by an LLM because of three mains factors:

First, there is the writing style. Text written by LLMs will often be written too well, which will raise suspicions. Then, after that, the teacher might look more closely at the document to see if there are other problems.
Second, texts generated by LLMs may look real but when a teacher look at them closely, the teacher can find that they often contain fake information and other inconsistencies, which makes the teacher realize that the content is all fake. For example, I know a professor in another university who asked students to write project reports in a course, and then he found that several reports contained a reference section with research papers that did not exist. It is then obvious that the text was generated by an LLM and that the fake bibliography was a so-called “hallucination” of the LLM. Such signs are clear indicators that a LLM was used.
Third, text generated by LLMs will often not follow the requirements. For example, a student may use a LLM to generate a very convincing essay, but that essay may still fail to meet all the homework’s requirements. Thus, the student may still lose points for not following the requirements.

Thus, what I want to say is that students using LLMs to do their homework are taking risks as LLMs can easily generate fake, inconsistent and incorrect content, which may also not meet the requirements.

This was just a short blog post to talk about this topic. Hope it has been interesting. Please share your perspective, opinions, or comments in the comment section, below.

Some funny or interesting websites related to research

Writing a research paper (3) – the abstract

How to improve the quality of your research papers?

Posted in Academia, Plagiarism | Tagged academia, chatgpt, llm, plagiarism, university | Leave a comment

When ChatGPT is used to write papers…

Posted on 2024-03-17 by Philippe Fournier-Viger

Today, I want to share with you something funny but also alarming. It is that some papers published in academic journals contains text indicating that parts were apparently written by LLMs.

The first example is this paper “The three-dimensional porous mesh structure of Cu-based metal-organic-framework – aramid cellulose separator enhances the electrochemical performance of lithium metal anode batteries” in the journal Surfaces and Materials of Elsevier. The first sentence of the introduction is “Certainly, here is a possible introduction for your topic:“

It is quite surprising that authors and reviewers did not see this!

A second example of such problem is case report “Successful management of anlatrogenic portal vein and hepatic artery injury in a 4-month-oldfemale patient: A case report and literature review published in the open-access Elsevier journal Radiology and Case Reports:

Again, it is surprising that this has passed through the review process unnoticed by the editor, reviewers or authors.

Have you found other similar cases? If so please share in the comment section!

ChatGPT, LLMs and homework

Posted in Academia, Machine Learning | Tagged chatgpt, gpt, large language models, llm, text generation | Leave a comment

Sneak peak at the new user interface of SPMF (part 2)

Posted on 2024-03-14 by Philippe Fournier-Viger

Today, I will continue to show you some upcoming features of SPMF 2.60, on which some work is ongoing. This new version of SPMF should be released in the coming weeks. The new feature that I will talk about today is the Timeline Viewer. It is a powerful tool for visualizing temporal data. Let me now show you this in more details.

The Timeline Viewer can first display event sequences, which are files taken as input by episode mining algorithms, among others. For example, we can use the TimeLineViewer to see a visual representation of this input file:

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
1|1
1|2
1 2|3
1|6
1 2|7
3|8
2|9
4|11

To do that, we first select the input file “contextEMMA.txt” using SPMF (1) and then click on the new “view dataset” button (2):

This open a table representation of the dataset:

Then, we click on the “View with Timeline Viewer” button (3) to see the visual representation:

The Timeline Viewer provides several options such as exporting to image, changing the tick interval, and the minimum and maximum timestamps, as well as applying a scaling ratio on the X axis. Moreover, the Timeline Viewer has a built-in custom algorithm to automatically determine the best parameters to ensure a good visualization. Here are some of the options available:

The second feature of the Timeline Viewer is to view time-interval datasets such as those taken as input by the FastTIRP and VertTIRP algorithms (to be released in SPMF 2.60). To use this feature, we again select an input file (1) and click on the “View dataset” button (2) :

Then, we obtain a Table representation of the dataset and click on the “View with Timeline Viewer” button (3) to see the visual representation:

The result is like this:

At the bottom, we have the timeline. On the left side, we can see the sequence IDs (S0, S1, S2, S3…) and we can see the time intervals from each sequences depicted using a different color for easier visualization. We can also adjust various parameters to customize the visualization and export the picture as a PNG file.

Here is another example with a smaller data file containing three time interval sequences:

OK, so that’s all for today. I just wanted to give you a preview of upcoming features in SPMF. Hope that it is interesting. There are still some bugs to be fixed and other improvements to be made, so that feature may still change a bit before it is released.

By the way, the Timeline Viewer is completely built from scratch to ensure efficiency (which is an important design goal of SPMF). Building a time line viewer was quite challenging. There are many special cases to consider and tricky aspects to ensure a good visualization.

If you have any comments or suggestions about this feature or what you would like to have in SPMF, please leave a comment below or send me a message.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

UDML 2024 @ PAKDD 2024 (deadline extended)

Report about the DEXA 2018 and DAWAK 2018 conferences

Test your knowledge of sequential rule mining!

Posted in spmf | Tagged algorithm, data mining, episode, episode mining, event sequence, open source, pattern mining, patterns, spmf, time interval data, timeline, timeline viewer, tirp, visualization | Leave a comment

Sneak peak at the new user interface of SPMF (part 1)

Posted on 2024-02-27 by Philippe Fournier-Viger

I am currently working on the next version of SPMF, which will be called 2.60. There will be several improvements to the user interface of SPMF. Here is an overview of some of the improvements to give you a sneak peak at what is coming. Note that, more changes may still occur before the next version is released ;-P

The new VIEW button is one of the most important new features of the upcoming SPMF 2.60. It provides many different views of various types of input files. For example, if we open an input file for high utility itemset mining, the view is like this:

There are also many other viewers that are integrated in the new version of SPMF, that cover all the main types of data available in SPMF.

Hope that this is interesting. This is just to give you a preview of what is coming in SPMF. Of course, this might still be a little different when it is released as I am still thinking about other possible improvements.

Posted in Big data, Data Mining, Data science, spmf | Leave a comment

UDML 2024 Accepted papers

Posted on 2024-02-18 by Philippe Fournier-Viger

Today, I want to talk to you about the upcoming UDML 2024 workshop at the PAKDD 2024 conference. This year is the 6th edition of the UDML workshop. I am happy to say that this year, we received a record number of submissions (23 submissions), which shows that the workshop and this research direction of utility mining and learning is going well.

As a result of the number of submissions, the selection process has been quite competitive, with many good papers, and some could not be accepted even if they were actually very good.

The list of the 10 accepted papers is as follows:

This will be certainly a very interesting workshop this year at PAKDD.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Brief report about the IEA AIE 2016 conference

A brief report about the IEA AIE 2020 conference

New version of SPMF (2.44): 4 new algorithms, datasets and features

Posted in Conference, Data Mining, Data science, Pattern Mining, Utility Mining | Tagged data mining, itemset mining, pakdd, pattern mining, udml, workshop | Leave a comment

SPMF 2.60 is coming soon!

Posted on 2024-02-05 by Philippe Fournier-Viger

Today, I want to talk a little bit about the next version of SPMF that is coming very soon. Here is some highlights of the upcoming features:

1) A Memory Viewer to help monitor the performance of algorithms in real-time:

Also, the popular MemoryLogger class of SPMF is also improved to provide the option of saving all recorded memory values to a file when it is set in recording mode and a file path is provided. This is done using two new methods “startRecordingMode” and “stopRecordingMode”. The MemoryLogger will then write the memory usage values to a file every time that an algorithm calls the checkMemory method. You can stop the recording mode by calling the stopRecordingMode method.

2) A tool to generate cluster datasets using different data distributions such as Normal and Uniform distribution. Here some screenshots of it:

3) A simple tool to visualize transactions datasets. This tool is simple but can be useful for quickly exploring a datasets and see the content. It provides various information. This is an early version. More features will be considered.

The tool has two visualization features, to viewthe frequency distribution of transaction according to their lengths, as well as the frequency distribution of items according to their support:

4) A simple tool to visualize sequence datasets. This is similar to the above tool but for sequence datasets.

5) A new tool to visualize the frequency distribution of patterns found by an algorithm. To use this feature, when running an algorithm select the “Pattern viewer” for opening the output file. Then, select the support #SUP and click “View”. This will open a new window that will display the frequency distribution of support values, as show below. This feature also works with other measures besides the support such as the confidence, and utility.

6) A tool to compute statistics about graph database files in SPMF format. This is a feature that was missing in previous version of SPMF but is actually useful when working with graph datasets.

7) Several new data mining algorithm implementations. Of course, several algorithms for data mining will be added. Some that are ready are FastTIRP, VertTIRP, Krimp, and SLIM. Others are under integration.

8) A new set of highly efficient data structures implemented using primitive types to further improve the performance of data mining algorithms by replacing standard collection classes from Java. Some of those are visible in the picture below. Using those structure can improve the performance of algorithm implementations. It actually took weeks of work to develop these classes and make it compatible with comparators and other expected features of collections in the Java language.

Conclusion

This is just to give you an overview about the upcoming version of SPMF. I hope to release it in the next week or two. By the way, if anyone has implemented some algorithms and would them to be included in SPMF, please send me an e-mail at philfv AT qq DOT com.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Analyzing the source code of SPMF (5 years later)

Brief report about IEEE ICDM 2021

A Brief Report about ACIIDS 2021 (13th Asian Conference on Intelligent Information and Database Syst...

Posted in Data Mining, Pattern Mining, spmf | Tagged algorithm, association rule, data mining, data science, efficient algorithm, free software, graph, implementation, itemset, itemset mining, open source, pattern mining, software, spmf | Leave a comment

SPMF 2.60 is released!

Related posts:

How to download an offline copy of the SPMF documentation?

Related posts:

Some interesting statistics about SPMF

Related posts:

Sneak peak at the new user interface of SPMF (part 3)

Related posts:

ChatGPT, LLMs and homework

Related posts:

When ChatGPT is used to write papers…

Related posts:

Sneak peak at the new user interface of SPMF (part 2)

Related posts:

Sneak peak at the new user interface of SPMF (part 1)

UDML 2024 Accepted papers

Related posts:

SPMF 2.60 is coming soon!

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: