How to package SPMF as an EXE file with JPackage?

In this blog post, I will explain how to use the jpackage tool that is provided with Java to (1) package the JAR file of SPMF into an EXE file for Windows, and (2) to create an installer for SPMF.

Requirements

It is necessary to have:

  • A computer with Windows (as I will give instructions for this platform, but you may do something similar on other operating system)
  • A recent version of the Java JDK (at least version 14) so that you have the jpackage command available.
  • The JAR file of SPMF. You can download it here: spmf.jar or from the website of SPMF.
  • The ICO file of SPMF (if you want to make an application that has an icon): SPMF.ico

How to create an EXE application of SPMF for Windows

Step 1. On a windows computer, create two folders /input/ and /output/ on the desktop.

Step 2. Put the files spmf.jar and SPMF.ico in the folder /input/ that you have just created.

Step 3. Open the command line of Windows and execute this command:

jpackage --input C:\Users\philippe\Desktop\input\ --dest C:\Users\philippe\Desktop\output\ -n “SPMF” --main-jar spmf.jar --main-class ca.pfv.spmf.gui.Main --type app-image --icon C:\Users\philippe\Desktop\input\SPMF.ico

where C:\Users\philippe\Desktop\ should be replaced by the path to your desktop on your computer.

After executing this command, an EXE file will have been created in the output directory \output\

Step 4. To launch the software, you can now click on SPMF.exe.

How to create an installer for SPMF for Windows?

You can follow the same steps as above but use this command:

jpackage --input C:\Users\philippe\Desktop\input\ --dest C:\Users\philippe\Desktop\output5\ -n “SPMF” --main-jar spmf.jar --main-class ca.pfv.spmf.gui.Main --type exe --icon C:\Users\philippe\Desktop\input\SPMF.ico --win-dir-chooser --win-menu --win-shortcut-prompt

Then, this will create an installer, which looks like this:

This is the installation process on Windows 11:

And those are the files after installation:

Customization and generating installers for other platforms

There are also many other options offered by jpackage, including generating packages for other platforms. For more information see the documentation of the jpackage command.

Conclusion

This was just a short blog post to show how to package SPMF.jar into a native application. I think the process of using jpackage is quite simple. In the past, I had used some other commercial tools to create EXE files for Java programs but the process was more complicated. I am thus happy to have found jpackage.


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in Java, open-source, spmf | Tagged , , , | Leave a comment

New version of SPMF (2.58)

This is to announce that a new version of SPMF has been released on the 27th November 2022. This version has 7 new pattern mining algorithms:

  • the HUCI-Miner algorithm to mine closed high utility itemsets and generators at the same time (thanks to Jayakrushna Sahoo et al. for the original code )
  • the FHIM algorithm to mine all high utility itemsets (thanks to Jayakrushna Sahoo et al. for the original code)
  • the HGB algorithm to mine non redundant high utility association rules (thanks to Jayakrushna Sahoo et al. for the original code)
  • the HGB-all algorithm to derive all high utility association rules from the non redundant high utility association rules (thanks to Jayakrushna Sahoo et al. for the original code)
  • algorithms for mining sequential patterns with flexible constraints in a time-extended sequence database (eg. MOOC data)
    • the SPM-FC-L algorithm fi (Thanks to Wei Song et al. for the original code)
      • the SPM-FC-P algorithm (Thanks to Wei Song et al. for the original code)

Besides, it has several new features such as:

(1) An integrated text editor to open output file (to give an alternative to the system’s default text editor)

(2) Some improvements to the graphical user interface, such as shown below, such as colors to highlight algorithm categories and a window icon:

And some bugs have been fixed.

Besides a new MOOC.txt dataset of sequences of courses with timestamps has been added to the dataset page of SPMF.

Thanks again to all users and contributors to SPMF!


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in Data Mining, Data science, open-source, spmf | Tagged , , , , , , | Leave a comment

Brief report about the MEDI 2022 conference

In this blog post, I will talk about the 11th International Conference on Model & Data Engineering (MEDI 2022), which I have attended this week. It was held from the 21st to 23rd November 2022, in Cairo, Egypt, and also online.

MEDI 2022 conference

MEDI is a good conference for modelling, database, and related topics. MEDI is ranked C in the CORE 2021 ranking. It has been held in various countries over the years such as: Egypt, Estonia, France, Morocco, Spain, Cyprus, Italy and Portugal.

The local organization of the 2022 edition of MEDI is by the Nile University.

Nile University

This year, I am glad to play the role of Program Committee (PC) Chair at this conference. Below, I give an overview of the event.

Opening ceremony

The opening ceremony was at 10:00 AM. The conference was introduced, and an overview of the program was given. Below, I present some slides from the opening ceremony with some more details.

Countries were MEDI was held:

MEDI conference by countries

Organizations. Here is an overview of the committees behind MEDI 2022:

MEDI Conference committees

Paper selection. About the program, this year 65 papers have been submitted to the main conference, and from that 18 were accepted for long presentation (28%) and 11 for short presentation (17%). The program committee consisted of 60 researchers from 23 countries, which provided 200 reviews, for an average of 3.5 reviews per paper.

MEDI 2022 statistics

Proceedings. Full presentation papers are published in a Springer LNCS volume, while short presentation papers are published together with workshop papers in a Springer CCIS volume.

MEDI 2022 conference proceedings

Totally, there was 190 authors and submissions were made from 18 countries on five main topics: (1) Modelling, (2) Image processing and diagnosis, (3) Machine learning and optimization, (4) Natural language processing and (5) Database systems.

Workshop. A workshop called DETECT was co-organized with MEDI 2022, and 4 papers were accepted in that workshop (44%).

Special issues. Two special issues are organized for best papers of MEDI 2022

MEDI 2022 special issues

Keynote talks

There was three keynote talks.

The first keynote talk was by Prof. Vincent S. Tseng from National Yang Ming Chiao Tung University about “Broad and Deep Learning of Big Heterogeneous Health Data for Medical AI: Opportunities and Challenges“.

Vincent S. Tseng keynote talk

The second talk was “A service-based approach to drone service delivery in Skyway networks” by Prof. Athman Bouguettaya from The University of Sydney, Australia

Athman Bouguettaya keynote talk

The third talk was “Safety and security are key considerations in the design of critical systems” by Dr. Colin Snook from University of Southampton, United Kingdom.

Best paper awards of MEDI 2022

The best paper awards were announced during the closing ceremony. These papers were selected based on the peer reviews and also based on the presentations given at the MEDI conference.

Conclusion

This is all for this blog post! Hope this has been interesting. Looking forward to MEDI 2023!
In about 1 week, I will talk to you about ICDM 2022.


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in Conference, Database | Tagged , , , , , | Leave a comment

Brieft report about the MIWAI 2022 conference

In this blog post, I will talk about the 15th International Conference on Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2022), which was held as a virtual event on November 18th, 2022. I have attended this conference to present two papers related to pattern mining.

MIWAI is a conference that has been held every year in the pacific/Asia region, for 15 years. In the past, MIWAI has been held in countries such as Vietnam, India, Thailand, China, Malaysia and Brunei. I have also attended MIWAI in Malaysia in 2019 (see my blog post about MIWAI 2019 here).

Proceedings of MIWAI 2022

This year, the MIWAI conference received 42 papers from which 19 were accepted, which includes 14 full papers (acceptance rate of 33% for full papers) and 5 short papers (acceptance rate of 45% for full + short papers).

The proceedings of the conference are published by Springer in a book from the LNAI series. Hence, the papers are well-indexed in various publication databases such as DBLP, which is good.

Schedule of the conference

The conference was organized during a single day (November 18) and online using the Zoom platform. It started with an opening session, followed by a keynote talk by Prof. Rapeepan Pitakaso from Thailand about “Artificial Multiple Intelligence System (AMIS) and its Applications”. Then, there was paper presentations organized as parallel sessions during the rest of the day.

Opening

During the opening, the organizers talked about the program, the review process, the organization, etc. It was nice to see several people that I knew already from MIWAI 2019. The organizers are very friendly and professional.

It was announced that next year, MIWAI 2023 will be in Hyderabad, India. The submission deadline is planned for 10th January 2023.

Then, MIWAI 2024 will be in Chongqing, China.

Awards were also announced. I am please that the best paper award was given to “LCIM: Mining Low Cost High Utility Itemsets“, which is my paper. There was also another paper who received an award.

Keynote talk

There was a keynote talk “AMIS – Artificial Multiple Intelligence System: Theory and Application” by Prof. Rapeepan Pitakaso from Ubon Ratchathani Unviersity, Thailand. The talk was about a system called Artificial Multiple Intelligent System (AMIS), that aims to combine multiple types of intelligence, just like humans do (verbal, linguistic intelligences, etc.). The system is based on heuristic algorithms and also combines CNN (Convolutional Neural Networks) and other techniques. Here are some pictures and slides from this presentation.

Papers on pattern mining

As readers of this blog knows, I am interested in the field of pattern mining. At the conference, there was two papers about pattern mining (my papers):

5121Philippe Fournier-Viger, M. Saqib Nawaz, Yulin He, Youxi Wu, Farid Nouioua and Unil YunMaxFEM: Mining Maximal Frequent Episodes in Complex Event Sequences [paper]
[source code] [ppt]
6226M. Saqib Nawaz, Philippe Fournier-Viger, Naji Alhusaini, Yulin He, Youxi Wu and Debdatta BhattacharyaLCIM: Mining Low Cost High Utility Itemsets [paper]
 [ppt][source code and data]

These papers are about episode mining and high utility pattern mining.

Affordable registration fee

A good thing about this conference is that the registration fee is cheap. For one paper, it costs 250$ USD and for two papers, I spent only 350$ USD. This is much cheaper than many other conferences published by Springer and also IEEE. For example, an alternative that I considered would have been to publish in some European conferences published by Springer such as DAWAK or DEXA but registering two papers for those conferences would have cost me over 1240 euros instead of 350$ USD! This is a big difference. There are also many IEEE top conferences that are very expensive and have an increasing price in recent years. For example, this year, IEEE ICDM has a registration price of 1300 USD, while the price 10 years ago was about only 500$ USD. Thus, I appreciate that MIWAI offers registration at a very reasonable price. I think it can allow researchers from all around the world to more easily publish their papers, especially when research funding is limited.

Conclusion

MIWAI 2022 was a good conference. It is not a very large conference but there are some interesting papers of good quality, and the conference is well-organized. I will try to attend again for MIWAI 2023.
Later this month, I will talk to you about the MEDI 2022 and ICDM 2022 conference.


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in artificial intelligence, Conference, Data Mining | Tagged , , , | Leave a comment

How to change the language of Tableau Desktop from the registry?

This is a short post about how to change the language of Tableau if the option is not available in the Help Menu.

  1. Close Tableau
  2. On Windows, open Regedit
  3. Search for “Language code” to find the keys containing the languages options of Tableau.
  4. Change the three following keys: “Language code”, “Repository language” and “SamplesLanguage” to your preferred language. For example “en_US” is for English and “zh_CN” is for Chinese.

5. Start Tableau again and the language will have been changed.

Posted in Data science | Tagged , , | Leave a comment

An integrated text editor in SPMF

Today, I will talk about some upcoming feature for the next release of SPMF (2.58), which is a simple integrated text editor. This new release should happen in about 1 or 2 weeks (as I am very busy recently) and will contain some new algorithms. But also, it will contain an integrated text editor. This may seem very strange? Why a text editor? I will explain briefly the idea in this blog post to give an overview of this feature as I am testing it now. If you have comments, you may leave them in the comment section below

Why a text editor in SPMF?

Well, in previous version of SPMF, there was a few options to open the output files produced by the algorithms: using the system’s default text editor (e.g. Notepad on Windows) or using the Pattern Viewer tool of SPMF. Although using the system’s text editor can be good, I was thinking that having a customized text editor could be interesting and it is actually not difficult to implement. So, I will explain how it works below.

The SPMF text editor

In the next release of SPMF, there will be a new option to open the output file using the SPMF text editor instead of the default system text editor.

The SPMF text editor looks like this:

It has some useful features such as showing the line count and the line and column numbers in the status bar at the bottom. This bar can also be hidden.

Also a useful feature is to always highlight the current line (which is not done by NotePad, for example):

Besides, it is possible to activate a “night mode” from the menu:

And there is also a search bar that is very convenient for highlighting some keywords in a file, and works like the search bar in some Web browsers:

Also, there is of course the possibility of changing the font and the font size.

Besides, all the user-defined preferences are saved. So next time that you open SPMF, the same font, font size, night mode preference, window size and location, and other preferences of the SPMF text editor are kept.

The SPMF text editor can also display the size of the last file that is opened:

And here is an overview of the menus:

It has the basic important functions such as “Line wrap” and “Word wrap”.

Conclusion

Interesting? You will be able to try it in the next version of SPMF to be released soon.

For now, the features are quite simple because the aim is to provide an alternative to the system text editor specialized for opening the output files of SPMF. It is not designed to compete with a more complex text editor and to be used as a general-purpose text editor (although it could).

A limitation of the SPMF text editor is that text files are loaded in memory so it is restricted to opening files that are not too big.

Many other features could be added like highlighting keywords in the output file. So there is something to think about. Which features would be useful? You may let me know what you think. If some features are not too hard to implement and useful, I may add them. I will also do some more debugging before it can be released.

If you have any suggestions or ideas, you can let me know in the comment section or by e-mail at philfv AT qq.com

Posted in Data Mining, open-source, spmf | Tagged , , , | Leave a comment

频繁子图挖掘算法介绍

This post is a Chinese introduction to frequent subgraph mining (English here).

在这篇博文中,我将介绍一种有趣的数据挖掘算法,叫频繁子图挖掘,它主要用来在图中挖掘有用模式。这一算法非常重要,因为数据在很多领域自然地以图来表示(比如社交网络、化学分子、国家路网)。因此,通过分析图形数据来发现有趣、意外、有用的模式是有必要的,它们可以用来帮助理解数据或者做决策。

什么是图?一点理论

在讨论图分析之前,我们先介绍几个定义。是一组有一些标签顶点。用一个例子来说明一下。参考下图:

graph

这个图包含四个顶点(描绘成黄色的圆圈)。这些顶点都有“10”、“11”等标签。这些标签提供了有关顶点的信息。例如,把这张图想象成一个化学分子。标签10和11可以分别代表氢和氧这两种化学元素。标签不需要是唯一的。换句话说,同一个标签可以用来描述同一个图中的几个顶点。例如,如果上图表示化学分子,则标签”10″和”11″可分别用于代表氧和氢的所有顶点。

除了顶点,图中也包含。边是顶点之间的连线,这里用粗黑线表示。边也有一些标签。在本例中,使用了四个标签,即”20″、”21″、”22″和”23″。这些标签代表了顶点之间不同类型的关系。边的标签不唯一。

图的类型:连通图和非连通图

现实生活中可以找到许多不同类型的图。图分为连通图非连通图。让我们用一个例子来解释一下。参考以下两个图:

connected graph and disconnected graph

左边的图称为连通图,因为沿着边可以从任何顶点到其他顶点。例如,想象一下顶点代表着城市,边是城市之间的道路。这是一个可以沿着道路从一个城市到任何其他城市的连通图。如果图没有连通,则称它是一个非连通图。例如,右边的图是断开的,因为不能沿着边从其他顶点到达顶点A。在下面的文章中,我们将使用术语“图”来表示连通图。因此,我们下面讨论的所有图都是连通图。

图的类型:有向图和无向图

区分有向图和无向图也是很有用的。在无向图中,边是双向的,而在有向图中,边可以是单向的也可以是双向的。让我们用一个例子来说明一下。

directed and undirected graphs

左边是无向图,而右边是有向图。在现实生活中有向图的例子有哪些呢?例如,考虑一个图, 它的顶点表示位置,边为道路。有些道路可以双向行驶,而有些道路只能单向行驶(城市中的“单行道”)。

一些数据挖掘算法被设计成只处理无向图、有向图或两者都支持。

分析图

我们已经介绍了一些关于图的理论,那么我们可以做什么样的数据挖掘任务来分析图?这个问题有很多答案。答案取决于我们的目标是什么,也取决于我们正在分析的图的类型(有向图/无向图、连通图/非连通图、单个图或多个图)。

在这篇博文中,我将阐述一个被广泛研究的挖掘任务,称为频繁子图挖掘。子图挖掘的目的是在一组图(图形数据库)中发现令人感兴趣的子图。但我们如何判断一个子图是否令人感兴趣呢?这取决于应用场景。兴趣度可以用各种方式来定义。传统上,如果一个子图在一组图中出现多次,它就被认为是令人感兴趣的。换句话说,我们希望发现多个图共有的子图。例如,找出几种化学分子共有的化学元素, 这种类型的联系是很有用的。

在一组图中查找频繁子图的做法称为频繁子图挖掘。作为输入者,用户必须提供:
图数据库(一组图)
▪一个称为最小支持阈值的参数(minsup)。

然后,频繁子图挖掘算法将枚举输出所有的频繁子图。频繁子图是至少在图数据库中出现minsup次的子图。下面,让我们看一下包含以下三个图的图数据库:

a graph database

现在,假设我们要发现至少出现在三个图中的所有子图。因此,我们将把最小参数设置为3。通过应用频繁子图挖掘算法,我们将得到至少出现在三个图中的所有子图的集合:

frequent subgraphs

参考一下第三个子图(“频繁子图3”)。这个子图是频繁的,有3个支持度(频度),因为它出现在三个输入图中。这些出现以红色标出,如下:

occurrences of frequent subgraphs

现在一个重要的问题是如何设置minsup参数?在实际应用中,一般是通过试错法来确定参数。如果此参数设置得太高,则会找到很少的子图,而如果设置得太低,则会根据输入数据库找到数百万的子图。

现在,在实践中,哪些工具或算法可以用来查找频繁的子图?有各种频繁子图挖掘算法。其中最著名的是GASTON, FSG和GSPAN。

在单个图中挖掘频繁子图

除了发现几个图共有的子图外,频繁子图挖掘的问题也有一些变体,它包括在单个图中查找所有频繁子图而不是在图数据库中查找。方法几乎是一样的。目的也是发现频繁出现或令人感兴趣的子图。唯一的区别是支持度(频度)是如何计算的。对于这个变化,子图的支持度是它在单个输入图中出现的次数。例如,参考以下输入图:

a single large graph
frequent subgraphs

这个图包含七个顶点和六个边。如果我们通过将minsup参数设置为2,在这个单图上执行频繁子图挖掘,可以发现以下五个频繁的子图:

这些子图是频繁的,因为它们在输入图中至少出现过两次。例如,参考“频繁子图5”。这个子图有2个支持,因为它在输入图中有两次出现。这两种情况分别用红色和蓝色突出显示在下面。

在图数据库中发现模式的算法通常可以用来发现单个图中的模式。

结论

在这篇博文中,我们介绍了频繁子图挖掘的问题,包括发现在一组图中频繁出现的子图。这个数据挖掘问题已经被研究了15年之多,并提出了许多算法。有些算法是精确算法(会找到正确答案),而有些则是近似算法(不保证找到正确答案,但可能更快)。

一些算法也被设计来处理有向图或无向图,或者在单个图或图数据库中挖掘子图,或者同时做这两种。此外,子图挖掘问题还有其他几种变体,如在图中发现频繁路径,或在图中发现频繁树

此外,在一般数据挖掘中,还研究了与图有关的许多其他问题,如优化问题、社交网络中的社群检测、关系分类等。

对比其它类型的数据, 一般来说,与图有关的问题是相当复杂的。子图挖掘困难的原因之一是算法通常需要检查“子图同构”,即比较子图以确定它们是否等价。尽管如此,我认为这些问题是相当有趣的,因为有一些研究上的挑战。

希望你喜欢这篇博文。如果对这个话题感兴趣,我将来可能会在图挖掘上再写一篇博文。


Philippe Fournier-Viger是计算机科学教授,也是提供了200多种数据挖掘算法的开源数据挖掘软件SPMF的创始人,。

Posted in Chinese posts, Data Mining | Tagged , , | Leave a comment

A brief report about the IEEE DSAA 2022 conference

In this blog post, I will talk about the IEEE DSAA 2022 conference, which was held from the 13th to the 16th October 2022.

What is IEEE DSAA?

DSAA 2022 is the 9th edition of the IEEE International Conference on Data Science and Advanced Analytics (DSAA). DSAA is an international conference that has been held in many countries, and focuses on data mining, data science, big data, machine learning and relatedtopics.

DSAA is a relatively young conference compared to some top data mining and machine learning conferences, but DSAA has become more and more successful over the years.

Location

This year, the DSAA conference was planned to be held in the city of Shenzhen, China. But due to the COVID-19 pandemic, the conference was held in online mode (local organization by Shenzhen University), and using Zoom as online videoconferencing platform.

Proceedings of DSAA 2022

The proceedings of DSAA are published by IEEE.

Authors could submit a paper to the main track of DSAA. But besides that, the DSAA conference hosts several special sessions on emerging topics. It is interesting for authors that papers accepted in the special sessions are published as regular papers in the proceedings of DSAA. This year, I have been co-organizer of a special session called DSSBA (1st Special Session on Data Science for Social and Behavioral Analytics), that has been quite successful with 5 papers accepted (more on that later).

Another interesting aspect about DSAA is that there are two special issues, respectively organized in the International Journal on Data Science and Analytics (JDSA) journal and the Machine Learning Journal (MLJ). Authors could submit papers to these special issues and then present the articles at the conference.

Keynote speaker from the main conference

This year, there was a good line up of keynote speakers:

Conference opening

The conference opening was on the 13th October. Several interesting information were given about the DSAA conference. Here are some slides from the opening:

Country distribution for application track:

Country distribution for the research track:

The paper acceptance rate statistics:

The main topics of papers published in DSAA:

Some awards were given with some of them receiving a cash prize of 1000$ USD.

The DSSBA special session

At DSAA 2022, I co-organized a special session called DSSBA 2022 (1st Special session on Data Science for Social and Behavioral Analytics). This special session received many papers, among which 5 have been accepted for publications as regular papers in the conference.

A keynote talk was given in this special session by Prof. Yun Sing Koh from the University of Auckland, New Zealand. She presented some of her latest research work related to machine learning to tackle environmental science challenges. In particular, she presented two recent research projects published in the Machine Learning journal and in AAAI 2022, which are about air quality index inference and about algal bloom monitoring, respectively. Below, I share a few slides from her talk:

For more details about these two research projects, you can see these two papers:

  • Olivier Graffeuille, Yun Sing Koh, Jörg Wicker, Moritz K. Lehmann: Semi-supervised Conditional Density Estimation with Wasserstein Laplacian Regularisation. AAAI 2022: 6746-6754
  • Ben Halstead, Yun Sing Koh, Patricia Riddle, Russel Pears, Mykola Pechenizkiy, Albert Bifet, Gustavo Olivares, Guy Coulson: Analyzing and repairing concept drift adaptation in data stream classification. Mach. Learn. 111(10): 3489-3523 (2022)

In the DSSBA special session, we also had five paper presentations, where the last three papers are about pattern mining.

  • SA-FGDEM: A Self-adaptive E-Learning Performance Prediction Model
    Wang, Liping; Ye, Mingtao; Zhang, Guodao; Sheng, Xin; Zhang, Jingran
  • Heterogeneous Drift Learning: Classification of Mix-Attribute Data with Concept Drifts
    Zhao, Lang; Zhang, Yiqun; Ji, Yuzhu; Zeng, An; Gu, Fangqing; Luo, Xiaopeng
  • Fast Mining RFM Patterns for Behavioral Analytics
    Wan, Shicheng; Deng, Jieyin; Chen, Jiahui; Gan, Wensheng; Yu, Philip S
  • Constraint-based Sequential Rule Mining
    Yin, Zhaowen; Gan, Wensheng; Huang, Gengsen ; Wu, Yongdong; Fournier-Viger, Philippe Discovering Geo-referenced
  • Periodic-Frequent Patterns in Geo-referenced Time Series Databases Ravikumar,
    Penugonda; Palla, Likhitha; T, Chandrasekhar; RAGE, Uday Kiran; Watanobe, Yukata; Zettsu, Koji

Other paper presentations

There was also several interesting paper presentations at the conference and talks but due to my busy schedule, I was not able to attend many of them. An interesting paper that I saw related to periodic pattern mining is:

Discovering Periodicity in Locally Repeating Patterns
Alfred Krzywicki (University of Adelaide); Ashesh Mahidadia (Rich Data Corporation); Michael Bain (University of New South Wales)

Conclusion

Overall, the conference has been very interesting and well-organized. I will try to participate again to IEEE DSAA, next year.


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in artificial intelligence, Big data, Data Mining, Data science, Machine Learning | Tagged , , , , , , , | Leave a comment

SPMF 2.55 is released (10 new algorithms!)

Hi everyone,

This is a short blog post to let you know that a new version of the SPMF data mining library has been released (version 2.55) with 10 new algorithms for pattern mining. SPMF is by far the most complete library for pattern mining and can be used from various languages. Thanks again to all contributors and users who made this project a success and supported it through the years.

If you are a researcher and would like your algorithms to be included in SPMF to provide more visibility to your work, feel free to send me an e-mail.

Hope you will enjoy this new version of SPMF!


Philippe Fournier-Viger is a distinguished professor of computer science

Posted in spmf, Uncategorized | Leave a comment

How to update the Cloudera VM in 2022? (solved)

In this blog post, I provide instructions about how to update the Cloudera VM, even though the CentOS version used by Cloudera has reached the end-of-life, and the Cloudera VM is no longer updated. I have spent quite a lot of time on this and finally found how to do based on various sources from the Internet.

1. Open a terminal window

2. Enter the command

sudo gedit /etc/yum.repos.d/CentOS-Base.repo

to open a text editor with elevated rights.

3. Replace the content of that file with this:

[base]
name=CentOS-$releasever – Base
# mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os&infra=$infra
# baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/os/x86_64/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

# released updates
[updates]
name=CentOS-$releasever – Updates
# mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates&infra=$infra
# baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/updates/x86_64/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

# additional packages that may be useful
[extras]
name=CentOS-$releasever – Extras
# mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras&infra=$infra
# baseurl=http://mirror.centos.org/centos/$releasever/extras/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/extras/x86_64/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever – Plus
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus
#baseurl=http://mirror.centos.org/centos/$releasever/centosplus/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/centosplus/x86_64/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

#contrib – packages by Centos Users
[contrib]
name=CentOS-$releasever – Contrib
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=contrib
#baseurl=http://mirror.centos.org/centos/$releasever/contrib/$basearch/
baseurl=https://mirror.nsc.liu.se/centos-store/6.10/contrib/x86_64/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

Then save the file and close the application

4. Next, go back to the terminal to type the following commands:

sudo yum-config-manager –disable epel

sudo yum-config-manager –disable cloudera-cdh5

sudo yum-config-manager –disable cloudera-manager

5. Type this command in the terminal to open the configuration file of Yum:

sudo gedit /etc/yum.conf 

6.Add this entry at the end of the [main] section: 

sslverify=false

Then, save the file and close the application.

7. After that, we will type some additional commands to update and install additional software. During this step, you will be asked several times if you want to continue. When this happens, press “y” and “enter”. The commands are:

sudo yum update ca-certificates

sudo yum update

sudo yum install epel-release

sudo yum –enablerepo=extras install epel-release

8. Let’s say we want to install some Python libraries “mathplotlib” and “imaging”:

sudo yum install python-matplotlib

sudo yum install python-imaging

We can also install PIP:

sudo yum install -y –enablerepo=”epel” python-pip

And we can also install other software like LibreOffice to use some software like Oocalc:

sudo yum install libreoffice

That is all!

Posted in Big data | Tagged , , | 5 Comments