The Data Blog

频繁子图挖掘算法介绍

Posted on 2022-10-17 by Philippe Fournier-Viger

This post is a Chinese introduction to frequent subgraph mining (English here).

在这篇博文中，我将介绍一种有趣的数据挖掘算法，叫频繁子图挖掘，它主要用来在图中挖掘有用模式。这一算法非常重要，因为数据在很多领域自然地以图来表示(比如社交网络、化学分子、国家路网)。因此，通过分析图形数据来发现有趣、意外、有用的模式是有必要的，它们可以用来帮助理解数据或者做决策。

什么是图？一点理论…

在讨论图分析之前，我们先介绍几个定义。图是一组有一些标签的顶点和边。用一个例子来说明一下。参考下图：

这个图包含四个顶点（描绘成黄色的圆圈）。这些顶点都有“10”、“11”等标签。这些标签提供了有关顶点的信息。例如，把这张图想象成一个化学分子。标签10和11可以分别代表氢和氧这两种化学元素。标签不需要是唯一的。换句话说，同一个标签可以用来描述同一个图中的几个顶点。例如，如果上图表示化学分子，则标签”10″和”11″可分别用于代表氧和氢的所有顶点。

除了顶点，图中也包含边。边是顶点之间的连线，这里用粗黑线表示。边也有一些标签。在本例中，使用了四个标签，即”20″、”21″、”22″和”23″。这些标签代表了顶点之间不同类型的关系。边的标签不唯一。

图的类型：连通图和非连通图

现实生活中可以找到许多不同类型的图。图分为连通图或非连通图。让我们用一个例子来解释一下。参考以下两个图：

左边的图称为连通图，因为沿着边可以从任何顶点到其他顶点。例如，想象一下顶点代表着城市，边是城市之间的道路。这是一个可以沿着道路从一个城市到任何其他城市的连通图。如果图没有连通，则称它是一个非连通图。例如，右边的图是断开的，因为不能沿着边从其他顶点到达顶点A。在下面的文章中，我们将使用术语“图”来表示连通图。因此，我们下面讨论的所有图都是连通图。

图的类型：有向图和无向图

区分有向图和无向图也是很有用的。在无向图中，边是双向的，而在有向图中，边可以是单向的也可以是双向的。让我们用一个例子来说明一下。

左边是无向图，而右边是有向图。在现实生活中有向图的例子有哪些呢？例如，考虑一个图, 它的顶点表示位置，边为道路。有些道路可以双向行驶，而有些道路只能单向行驶（城市中的“单行道”）。

一些数据挖掘算法被设计成只处理无向图、有向图或两者都支持。

分析图

我们已经介绍了一些关于图的理论，那么我们可以做什么样的数据挖掘任务来分析图？这个问题有很多答案。答案取决于我们的目标是什么，也取决于我们正在分析的图的类型（有向图/无向图、连通图/非连通图、单个图或多个图）。

在这篇博文中，我将阐述一个被广泛研究的挖掘任务，称为频繁子图挖掘。子图挖掘的目的是在一组图（图形数据库）中发现令人感兴趣的子图。但我们如何判断一个子图是否令人感兴趣呢？这取决于应用场景。兴趣度可以用各种方式来定义。传统上，如果一个子图在一组图中出现多次，它就被认为是令人感兴趣的。换句话说，我们希望发现多个图共有的子图。例如，找出几种化学分子共有的化学元素, 这种类型的联系是很有用的。

在一组图中查找频繁子图的做法称为频繁子图挖掘。作为输入者，用户必须提供：
▪图数据库(一组图)
▪一个称为最小支持阈值的参数(minsup)。

然后，频繁子图挖掘算法将枚举输出所有的频繁子图。频繁子图是至少在图数据库中出现minsup次的子图。下面，让我们看一下包含以下三个图的图数据库：

现在，假设我们要发现至少出现在三个图中的所有子图。因此，我们将把最小参数设置为3。通过应用频繁子图挖掘算法，我们将得到至少出现在三个图中的所有子图的集合：

参考一下第三个子图（“频繁子图3”）。这个子图是频繁的，有3个支持度（频度），因为它出现在三个输入图中。这些出现以红色标出，如下：

现在一个重要的问题是如何设置minsup参数？在实际应用中，一般是通过试错法来确定参数。如果此参数设置得太高，则会找到很少的子图，而如果设置得太低，则会根据输入数据库找到数百万的子图。

现在，在实践中，哪些工具或算法可以用来查找频繁的子图？有各种频繁子图挖掘算法。其中最著名的是GASTON, FSG和GSPAN。

在单个图中挖掘频繁子图

除了发现几个图共有的子图外，频繁子图挖掘的问题也有一些变体，它包括在单个图中查找所有频繁子图而不是在图数据库中查找。方法几乎是一样的。目的也是发现频繁出现或令人感兴趣的子图。唯一的区别是支持度（频度）是如何计算的。对于这个变化，子图的支持度是它在单个输入图中出现的次数。例如，参考以下输入图：

这个图包含七个顶点和六个边。如果我们通过将minsup参数设置为2，在这个单图上执行频繁子图挖掘，可以发现以下五个频繁的子图：

这些子图是频繁的，因为它们在输入图中至少出现过两次。例如，参考“频繁子图5”。这个子图有2个支持，因为它在输入图中有两次出现。这两种情况分别用红色和蓝色突出显示在下面。

在图数据库中发现模式的算法通常可以用来发现单个图中的模式。

结论

在这篇博文中，我们介绍了频繁子图挖掘的问题，包括发现在一组图中频繁出现的子图。这个数据挖掘问题已经被研究了15年之多，并提出了许多算法。有些算法是精确算法（会找到正确答案），而有些则是近似算法（不保证找到正确答案，但可能更快）。

一些算法也被设计来处理有向图或无向图，或者在单个图或图数据库中挖掘子图，或者同时做这两种。此外，子图挖掘问题还有其他几种变体，如在图中发现频繁路径，或在图中发现频繁树。

此外，在一般数据挖掘中，还研究了与图有关的许多其他问题，如优化问题、社交网络中的社群检测、关系分类等。

对比其它类型的数据, 一般来说，与图有关的问题是相当复杂的。子图挖掘困难的原因之一是算法通常需要检查“子图同构”，即比较子图以确定它们是否等价。尽管如此，我认为这些问题是相当有趣的，因为有一些研究上的挑战。

希望你喜欢这篇博文。如果对这个话题感兴趣，我将来可能会在图挖掘上再写一篇博文。

—
Philippe Fournier-Viger是计算机科学教授，也是提供了200多种数据挖掘算法的开源数据挖掘软件SPMF的创始人，。

Report about the KDD 2018 conference

Brief Report about IEEE ICDM 2020

Discovering and visualizing sequential patterns in web log data using SPMF and GraphViz

Posted in Chinese posts, Data Mining | Tagged data mining, data science, pattern mining | Leave a comment

A brief report about the IEEE DSAA 2022 conference

Posted on 2022-10-12 by Philippe Fournier-Viger

In this blog post, I will talk about the IEEE DSAA 2022 conference, which was held from the 13th to the 16th October 2022.

What is IEEE DSAA?

DSAA 2022 is the 9th edition of the IEEE International Conference on Data Science and Advanced Analytics (DSAA). DSAA is an international conference that has been held in many countries, and focuses on data mining, data science, big data, machine learning and relatedtopics.

DSAA is a relatively young conference compared to some top data mining and machine learning conferences, but DSAA has become more and more successful over the years.

Location

This year, the DSAA conference was planned to be held in the city of Shenzhen, China. But due to the COVID-19 pandemic, the conference was held in online mode (local organization by Shenzhen University), and using Zoom as online videoconferencing platform.

Proceedings of DSAA 2022

The proceedings of DSAA are published by IEEE.

Authors could submit a paper to the main track of DSAA. But besides that, the DSAA conference hosts several special sessions on emerging topics. It is interesting for authors that papers accepted in the special sessions are published as regular papers in the proceedings of DSAA. This year, I have been co-organizer of a special session called DSSBA (1st Special Session on Data Science for Social and Behavioral Analytics), that has been quite successful with 5 papers accepted (more on that later).

Another interesting aspect about DSAA is that there are two special issues, respectively organized in the International Journal on Data Science and Analytics (JDSA) journal and the Machine Learning Journal (MLJ). Authors could submit papers to these special issues and then present the articles at the conference.

Keynote speaker from the main conference

This year, there was a good line up of keynote speakers:

Conference opening

The conference opening was on the 13th October. Several interesting information were given about the DSAA conference. Here are some slides from the opening:

Country distribution for application track:

Country distribution for the research track:

The paper acceptance rate statistics:

The main topics of papers published in DSAA:

Some awards were given with some of them receiving a cash prize of 1000$ USD.

The DSSBA special session

At DSAA 2022, I co-organized a special session called DSSBA 2022 (1st Special session on Data Science for Social and Behavioral Analytics). This special session received many papers, among which 5 have been accepted for publications as regular papers in the conference.

A keynote talk was given in this special session by Prof. Yun Sing Koh from the University of Auckland, New Zealand. She presented some of her latest research work related to machine learning to tackle environmental science challenges. In particular, she presented two recent research projects published in the Machine Learning journal and in AAAI 2022, which are about air quality index inference and about algal bloom monitoring, respectively. Below, I share a few slides from her talk:

For more details about these two research projects, you can see these two papers:

Olivier Graffeuille, Yun Sing Koh, Jörg Wicker, Moritz K. Lehmann: Semi-supervised Conditional Density Estimation with Wasserstein Laplacian Regularisation. AAAI 2022: 6746-6754
Ben Halstead, Yun Sing Koh, Patricia Riddle, Russel Pears, Mykola Pechenizkiy, Albert Bifet, Gustavo Olivares, Guy Coulson: Analyzing and repairing concept drift adaptation in data stream classification. Mach. Learn. 111(10): 3489-3523 (2022)

In the DSSBA special session, we also had five paper presentations, where the last three papers are about pattern mining.

SA-FGDEM: A Self-adaptive E-Learning Performance Prediction Model
Wang, Liping; Ye, Mingtao; Zhang, Guodao; Sheng, Xin; Zhang, Jingran
Heterogeneous Drift Learning: Classification of Mix-Attribute Data with Concept Drifts
Zhao, Lang; Zhang, Yiqun; Ji, Yuzhu; Zeng, An; Gu, Fangqing; Luo, Xiaopeng
Fast Mining RFM Patterns for Behavioral Analytics
Wan, Shicheng; Deng, Jieyin; Chen, Jiahui; Gan, Wensheng; Yu, Philip S
Constraint-based Sequential Rule Mining
Yin, Zhaowen; Gan, Wensheng; Huang, Gengsen ; Wu, Yongdong; Fournier-Viger, Philippe Discovering Geo-referenced
Periodic-Frequent Patterns in Geo-referenced Time Series Databases Ravikumar,
Penugonda; Palla, Likhitha; T, Chandrasekhar; RAGE, Uday Kiran; Watanobe, Yukata; Zettsu, Koji

Other paper presentations

There was also several interesting paper presentations at the conference and talks but due to my busy schedule, I was not able to attend many of them. An interesting paper that I saw related to periodic pattern mining is:

Discovering Periodicity in Locally Repeating Patterns
Alfred Krzywicki (University of Adelaide); Ashesh Mahidadia (Rich Data Corporation); Michael Bain (University of New South Wales)

Conclusion

Overall, the conference has been very interesting and well-organized. I will try to participate again to IEEE DSAA, next year.

—
Philippe Fournier-Viger is a distinguished professor of computer science

Report about the ICGEC 2018 conference

频繁子图挖掘算法介绍

Correlation does not imply causation

Posted in artificial intelligence, Big data, Data Mining, Data science, Machine Learning | Tagged artificial intelligence, big data, conference, data mining, dsaa, dsaa 2022, ieee dsaa, machine learning | Leave a comment

SPMF 2.55 is released (10 new algorithms!)

Posted on 2022-10-05 by Philippe Fournier-Viger

Hi everyone,

This is a short blog post to let you know that a new version of the SPMF data mining library has been released (version 2.55) with 10 new algorithms for pattern mining. SPMF is by far the most complete library for pattern mining and can be used from various languages. Thanks again to all contributors and users who made this project a success and supported it through the years.

If you are a researcher and would like your algorithms to be included in SPMF to provide more visibility to your work, feel free to send me an e-mail.

Hope you will enjoy this new version of SPMF!

—
Philippe Fournier-Viger is a distinguished professor of computer science

Posted in spmf, Uncategorized | Leave a comment

How to update the Cloudera VM in 2022? (solved)

Posted on 2022-09-28 by Philippe Fournier-Viger

In this blog post, I provide instructions about how to update the Cloudera VM, even though the CentOS version used by Cloudera has reached the end-of-life, and the Cloudera VM is no longer updated. I have spent quite a lot of time on this and finally found how to do based on various sources from the Internet.

1. Open a terminal window

2. Enter the command

sudo gedit /etc/yum.repos.d/CentOS-Base.repo

to open a text editor with elevated rights.

3. Replace the content of that file with this:

[base]
name=CentOS-$releasever – Base
# mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
# baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/os/x86_64/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

# released updates
[updates]
name=CentOS-$releasever – Updates
# mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra
# baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/updates/x86_64/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

# additional packages that may be useful
[extras]
name=CentOS-$releasever – Extras
# mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
# baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/extras/x86_64/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever – Plus
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus
#baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/centosplus/x86_64/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

#contrib – packages by Centos Users
[contrib]
name=CentOS-$releasever – Contrib
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=contrib
#baseurl=http://mirror.centos.org/centos/$releasever/contrib/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/contrib/x86_64/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

Then save the file and close the application

4. Next, go back to the terminal to type the following commands:

sudo yum-config-manager –disable epel

sudo yum-config-manager –disable cloudera-cdh5

sudo yum-config-manager –disable cloudera-manager

5. Type this command in the terminal to open the configuration file of Yum:

sudo gedit /etc/yum.conf

6.Add this entry at the end of the [main] section:

sslverify=false

Then, save the file and close the application.

7. After that, we will type some additional commands to update and install additional software. During this step, you will be asked several times if you want to continue. When this happens, press “y” and “enter”. The commands are:

sudo yum update ca-certificates

sudo yum update

sudo yum install epel-release

sudo yum –enablerepo=extras install epel-release

8. Let’s say we want to install some Python libraries “mathplotlib” and “imaging”:

sudo yum install python-matplotlib

sudo yum install python-imaging

We can also install PIP:

sudo yum install -y –enablerepo=”epel” python-pip

And we can also install other software like LibreOffice to use some software like Oocalc:

sudo yum install libreoffice

That is all!

Posted in Big data | Tagged cloudera vm, end-of-life, how to update | 5 Comments

Upcoming SPMF 2.55 + UDML 2022 + BDA 2022

Posted on 2022-08-26 by Philippe Fournier-Viger

I have not written on the blog in the past few weeks. This is because I was quite busy and also it was the summer vacation. I had to temporarilly focus on other things. Today, I want to give you some quick news:

There is about one more week to submit your paper to a workshop that I co-organize at the IEEE ICDM 2022 conference. The workshop is UDML 2022 (5th International Workshop on Utility-Driven Mining and Learning). You are welcome to submit any papers related to machine learning or pattern mining. The deadline is : 2nd September 2022
A new version of SPMF (v 2.55) is under preparation. There will be about 10 new algorithms, including 5 high utility itemset mining algorithms and 3 episode mining algorithms. I am very excited to release this new version soon. If you have some algorithm implementation that you would like to include in SPMF, feel free to contact with me at philfv8 AT yahoo DOT COM.
You may also want to consider the BDA 2022 conference on Big Data Analytics for submitting your papers. It is an international conference held in India and published by Springer. The deadline is September 5th.

That is all for today. I will be back soon with more content for the blog and my Youtube channel. Thanks for reading! 🙂

New version of SPMF (2.58)

Towards SPMF v3.0...

Discovering the Top-K Stable Periodic Patterns in a Sequence of Events

Posted in General, spmf | Tagged blog, spmf | Leave a comment

Turnitin, a smart tool for plagiarism detection?

Posted on 2022-08-12 by Philippe Fournier-Viger

Plagiarism is a serious issue in academia. In this blog post, I will talk about Turnitin, a service used for plagiarism-checking by some journals and conferences in academia.

I already wrote a blog post about this, which you can read here:

How journal paper similarity checking works? (CrossCheck) | The Data Mining Blog (philippe-fournier-viger.com)

Today, I just want to show you that this service is in my opinion not very “smart”. Although, this service is useful to detect plagiarism, I notice that it also sometimes flag some very generic text that in my opinion should not be considered for evaluating plagiarism. I will show you seven excerpts from a Turnitin report for a conference paper that I submitted as example:

(1) In a sentence with 23 words, Turnitin flagged that there is a similarity with another source because I have used six words in the same order as another source, even though there is no more than three words that appear consecutively

(2) At another place in the paper, Turnitin flags a similarity because I have used a same keyword as in another paper, while all other keywords are different.

(3) At another place, the submitted paper is considered similar with another paper because I say that in this paper I propose something novel (!):

(4) Here is another example that shows how not “smart” this tool can sometimes be (in my opinion). Having a section called “Experimental evaluation” and saying that we will assess the performance of something is considered similar (!).

(5) Another example of very generic sequences of words that are deemed similar:

(6) And another example of paragraph, where some sentences are said to be similar to four different sources but actually, all of this is just some very generic text used to describe experiments with a same dataset as in another paper:

(7) And in the conclusion a few words are said to be matching with another source but this is not relevant:

By the above examples, I just want to show that Turnitin can sometimes flag some text as similar but the similarity can be due to using some very generic text. Sometimes, this is due because an author tend to the same writing style between different papers and sometimes it is also because there are just not so many different ways of explaining sometimes. For example, here is one more example:

In the last sentence, if I want to say that the next section will talk about some new algorithm, there are not so many ways that I can say that. I could way, “the … algorithm will be presented/described/introduced in the next section” or “The next section will present/describe/introduce the … algorithm”. But I do not see many other ways of explaining this.

Conclusion

In this blog post, I have shown some examples that I consider as “not smart” produced in a Turnitin report. But to be fair, I should say that Turnitin will also flag some text that is very relevant or somewhat relevant. Here I just want to show some examples that do not look relevant to highlight that there still a lot of room for improvement in this tool.

As for the use of Turnitin, it is certainly useful for plagiarism detection, but like any other tools, it also depends on how the results are used by humans. Unfortunately, I notice that many conferences and journals do not take the time to read the reports and instead just fix some thresholds to determine what is acceptable. For example, some conference stated that “similarity index should not be greater than 20% in general and not more than 3% with any single source”. This may seem reasonable but in practice this is quite strict as there is always some similarity with other papers. Ideally, someone would manually check the report to determine what is acceptable.

How to get citations for your research papers?

Why researchers should make their research papers available on internet?

Competitiveness in academia

Posted in Academia, Research | Tagged academia, plagiarism, Research, research papers, writing | Leave a comment

Brief report about the SMARTDSC 2022 conference

Posted on 2022-07-20 by Philippe Fournier-Viger

This week, besides IEA AIE 2022, I am also participating to the SMARTDSC 2022 conference (5th international conference on Smart Technologies in Data Science and Communication) as general co-chair and keynote speaker. I will give a brief report about this conference in this post.

What is SMART-DSC?

SMARTDSC is a conference organized by the KL (deemed to be) University in India in collaboration with several international researchers. This is the fifth edition of the conference. The conference focuses on data science, communication and smart technologies and the quality is good. This year, over 150 papers have been received and less than 20% have been accepted for oral presentation, which makes this conference competitive. The proceedings are also published by Springer, which ensures indexing and a good visibility for papers.

The accepted papers are from oven ten different states in India and also from 5 other countries. There is also an excellent line-up of eight keynote speakers for the conference from various countries including Turkey, Egypt, China, France, and Malaysia.

The first keynote talk was by Shumaila Javaid affliated to Shanghai Research Institute for Intelligent Autonomous Systems in China. The talk was about medical sensors and their integration for pervasive healthcare. This was a quite interesting topic has it has the real-life applications that may change lives.

Then, I gave a keynote talk about the automatic discovery of interesting patterns in data. Here are a few slides of my talk where I introduced various topics.

There was then several paper presentations followed by other keynote talks. I will try to add more details about these presentations later, in this blog post. The SMART DSC conference is held for three days. I am attending the conference at different moments during these three days as I have to participate to two conferences at the same time (SMARTDSC 2022 and IEA AIE 2022).

On overall, SMARTDSC 2022 is an interesting conference. It is especially great for participants in India for the convenience of travelling but it is also international with several participants, and keynote speakers from abroad. I am happy to participate to it.

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Brief report about the PRICAI 2021 conference

Brief report about the 2019 World Conference on the Virtual Reality Industry (WCWRI 2019)

Conference reviewers procrastinate?

Posted in Conference, Data Mining, Data science | Tagged communication, conference, data mining, data science, india, smart technologies, smartdsc, smartdsc2022 | Leave a comment

Brief report about the IEA AIE 2022 conference

Posted on 2022-07-19 by Philippe Fournier-Viger

This week, I am attending the 35th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems conference (IEA AIE 2022). I will give a brief report about the conference

Program

This year, the conference received 127 paper submissions, from which 65 have been accepted as full papers, and 14 as short papers. All the papers have been reviewed by at least 3 members from the program committee.

Opening ceremony

The IEA AIE 2022 conference was held in Japan in hybrid mode. I think the majority of attendants were online but there was still many people attending in person. On the first day, there was the opening ceremony. The conference was introduced including the program and other aspects.

Paper presentations

There was several paper presentations, covering many different topics such as: industrial applications, health informatics, optimization, video and image processing, natural language processing, agent and group-based systems, pattern recognition, security.

Here is screenshots from some presentations, that I have attended.

This is a paper about air pollution, which use image processing combined with a periodic pattern mining algorithm to obtain good detection:

Below is a paper from my collaborators about parallel high utility itemset mining based on Spark. In that paper some good results are obtained where a parallel version of EFIM and d2HUP provides some good speed-up (up to 20 times) over the sequential versions of those algorithms for mining high utility itemsets.

There was also an interesting paper about weighted sequential pattern mining in uncertain data:

There was also many other papers that I have listened too. I will not report on all of them.

Keynote talks

At IEA AIE 2022, there was two keynote talks. The first keynote was by Prof. Tao Wu from Shanghai University of Medicine & Health Sciences about health informatics.

The second keynote talk was by Prof. Sebastian Ventura from University of Cordoba, Spain about Improving Predictive Maintenance with Advanced Machine Learning. He talked about how to build models and systems to prevent failure from happening in industrial systems by doing maintenance in advance (predictive maintenance – PdM). He explained that various techniques can be used such as for outlier detection and classification. Prof. Ventura told that he is doing a project for the maintenance of military vehicles. Following the talk, there was a good discussion with conference participants. Prof. Ventura explained that building simple models is good but it is not necessarily the most important. A complex model can be acceptable if it is explainable. In fact, he said that it is more important to have explainable models because in real-applications, models often need to be verified by domain experts. Here are a few slides from that talk about the introduction:

Here are some slides about potential data mining techniques that can be used:

And here are some techniques that have been used in the specific project for predictive maintenance of vehicles:

Here are some challenges and open problems and the conclusion from the talk:

If you are interested by this topic, you may also check the survey paper published recently by Prof. Ventura:

A. Esteban, A Zafra & S. Ventura. Data Mining in Predictive Maintenance Systems. WIREs DMKD. https://doi.org/10.1002/widm.1471

Conclusion

That’s all for this. You can see some of my reports about previous editions of that conference here: IEA AIE 2016, IEA AIE 2018, IEA AIE 2019 , IEA AIE 2020, and IEA AIE 2021.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

UDML 2024 Workshop program @ PAKDD 2024

SPMF’s architecture (2) The Main class and the Command Processor

A Tool to Generate Reviews of Academic Papers

Posted in artificial intelligence, Conference, Machine Learning | Tagged applications, applied artificial intelligence, artificial intelligence, conference, iea aie, ieaaie2022, machine learning | 4 Comments

How to draw subgraphs side-by-side in Latex? (with TIKZ)

Posted on 2022-07-09 by Philippe Fournier-Viger

Today, I will give an example of how to draw a figure containing three subgraphs that appear side by side in Latex using the TIKZ library, and where each subgraph has a caption. This can be useful when writing research papers, where we want to discuss different types of subgraphs.

The result will be like this:

And here is the Latex code:

\documentclass{article} 
\usepackage{caption}
\usepackage{subcaption}
\usepackage{tikz}
\usetikzlibrary{automata,arrows,positioning,calc}

\begin{document}


\begin{figure}
  \begin{subfigure}[b]{0.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.5cm]
        \node[state] (v) {$A$};
        \node[state] (w) [right of=v] {$B$};
        \node[state] (t) [right of=w] {$C$};

		\path[->] (v)  edge node {0} (w);
		\path[->] (w) edge   node  {1}(t);
      \end{tikzpicture}
    \caption{Subgraph 1}
  \end{subfigure}
  \begin{subfigure}[b]{.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.5cm]
        \node[state] (x) {$E$};
        \node[state] (y) [below of=x] {$F$};
        \node[state] (n) [right of=x] {$G$};
        \node[state] (z) [below of=y] {$H$};


		\path[->] (x)  edge node {0} (y);
		\path[->] (x)  edge node {0} (n);
		\path[->] (n)  edge node {0} (z);
		\path[->] (y) edge   node  {1}(z);
      \end{tikzpicture}
    \caption{Subgraph 2}
  \end{subfigure}
  \begin{subfigure}[b]{.30\textwidth}
    \centering
      \begin{tikzpicture}[> = stealth,  shorten > = 1pt,   auto,   node distance = 1.7cm]
        \node[state] (g) {$A$};
        \node[state] (h) [above of=g] {$B$};
        \node[state] (e) [above right= 0.3 cm and 0.3 cm of g] {$C$};
		\path[->] (g)  edge node {} (e);
		\path[->] (g)  edge node {} (h);
		\path[->] (h) edge   node  {}(e);
      \end{tikzpicture}
    \caption{Subgraph 3}
  \end{subfigure}
\caption{Three subgraphs}
\end{figure}

In this code, I use the automata package of TIKZ, which is great for drawing graphs. You could also use other packages and tweak the above example.

Hope this is useful.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

New videos about pattern mining

How to draw an FP-Tree in Latex? (using TIKZ)

Discovering and visualizing sequential patterns in web log data using SPMF and GraphViz

Posted in Latex | Tagged figure, graph, latex, subgraph, tikz | Leave a comment

(videos) Introduction to sequential rule mining + the CMRules algorithm

Posted on 2022-07-07 by Philippe Fournier-Viger

I have made two new videos to explain interesting topics about pattern mining. The first video is an introduction to sequential rule mining, while the second video explains in more details how the CMRules algorithm for sequential rule mining works!

You can watch the videos here:

An Introduction to Sequential Rule Mining (pdf / ppt / video – 33 min )
The CMRules algorithm (pdf / ppt / video – 32 min)

And you can also find them on my Youtube Channel.

If you want to try these algorithms, you can check the SPMF open-source software, which offers fast implementations of these algorithms.

Hope you will enjoy the videos. I will make more videos about pattern mining soon. By the way, you can also check my website about The Pattern Mining course. It gives videos and slides for a free online course on pattern mining. It explains all the main topics about pattern mining and is good for students who are starting to do research in this area. But this course is in beta version, which means that I am still updating it. More videos and content will be added over time.

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Test your knowledge about sequential pattern mining!

UDML 2024 @ PAKDD 2024 (deadline extended)

(video) The EFIM algorithm

Posted in Data Mining, Pattern Mining, Video | Tagged cmrules, data mining, pattern mining, sequential rule mining, video | 2 Comments

频繁子图挖掘算法介绍

Related posts:

A brief report about the IEEE DSAA 2022 conference

Related posts:

SPMF 2.55 is released (10 new algorithms!)

How to update the Cloudera VM in 2022? (solved)

Upcoming SPMF 2.55 + UDML 2022 + BDA 2022

Related posts:

Turnitin, a smart tool for plagiarism detection?

Related posts:

Brief report about the SMARTDSC 2022 conference

Related posts:

Brief report about the IEA AIE 2022 conference

Related posts:

How to draw subgraphs side-by-side in Latex? (with TIKZ)

Related posts:

(videos) Introduction to sequential rule mining + the CMRules algorithm

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: