Category Archives: Big data

Subgraph mining datasets

In this post, I will provide links to standard benchmark datasets that can be used for frequent subgraph mining. Moreover, I will provide a set of small graph datasets that can be used for debugging subgraph mining algorithms. The format of graph datasets A graph dataset is a text … Continue reading

Posted in Big data, Data Mining | Tagged , , , , | Leave a comment

On the correctness of the FSMS algorithm for frequent subgraph mining

In this blog post, I will explain why the FSMS algorithm for frequent subgraph mining is an incorrect algorithm.  I will publish this blog post because I have found that the algorithm is incorrect after spending a few days to implement the algorithm in 2017 and wish to save time to other researchers … Continue reading

Posted in Big data, Data Mining, Pattern Mining | Tagged , , , , | 2 Comments

Introduction to the Apriori algorithm (with Java code)

This blog post provides an introduction to the Apriori algorithm, a classic data mining algorithm for the problem of frequent itemset mining. Although Apriori was introduced in 1993, more than 20 years ago, Apriori remains one of the most important data mining algorithms, not because it is the fastest, but because it has … Continue reading

Posted in Big data, Data Mining, Pattern Mining, Programming | Tagged , , , , , , | 11 Comments

This is why you should visualize your data!

In the data science and data mining communities, several practitioners are applying various algorithms on data, without attempting to visualize the data.  This is a big mistake because sometimes, visualizing the data greatly helps to understand the data. Some phenomena are obvious when visualizing the data. In this blog post, I will give a few … Continue reading

Posted in Big data, Data Mining, Data science | Tagged , , , , | Leave a comment

An Introduction to Sequential Pattern Mining

In this blog post, I will give an introduction to sequential pattern mining, an important data mining task with a wide range of applications from text analysis to market basket analysis.  This blog post is aimed to be a short introduction. If you want to read a more … Continue reading

Posted in Big data, Data Mining, Pattern Mining | Tagged , , , , | 20 Comments

We are launching a new data mining journal

In this blog post, I will discuss one of my recent and current project. I have been recently working with my colleague Chun-Wei Lin on launching a new journal, titled “Data Science and Pattern Recognition“. This is a new open-access journal, … Continue reading

Posted in Big data, Data Mining, Data science, Research | Tagged , , , | Leave a comment

Introduction to clustering: the K-Means algorithm (with Java code)

In this blog post, I will introduce the popular data mining task of clustering (also called cluster analysis).  I will explain what is the goal of clustering, and then introduce the popular K-Means algorithm with an example. Moreover, I will briefly explain how an open-source Java implementation of … Continue reading

Posted in Big data, Data Mining, Data science, open-source | Tagged , , , , , , | 1 Comment

Introduction to time series mining with SPMF

This blog post briefly explain how time series data mining can be performed with the Java open-source data mining library SPMF (v.2.06).  It first explain what is a time series and then discuss how data mining can be performed on time series. What is … Continue reading

Posted in Big data, Data Mining, open-source, spmf, Time series | Tagged , , , , , , , , | 1 Comment

The KDDCup 2015 dataset

The KDD cup 2015 dataset is about MOOC dropout prediction. I have  had recently found that the dataset had been offline on the official website. Thus, I have uploaded a copy of the KDD cup 2015 dataset on my website. You can … Continue reading

Posted in Big data, Data Mining, Data science, Research | 6 Comments

Discovering hidden patterns in texts using SPMF

This tutorial will explain how to analyze text documents to discover complex and hidden relationships between words.  We will illustrate this with a Sherlock Holmes novel. Moreover we will explain how hidden patterns in text can be used to recognize the author of a … Continue reading

Posted in Big data, Data Mining, Data science, open-source, spmf | Tagged , , , | 9 Comments