Datasets of 30 English novels for pattern mining and text mining

Today, I want to announce that I have just made public datasets of 30 novels from English Novels from 10 authors of the XIX century. These datasets can be used for testing algorithms for sequential pattern miningsequential rule mining, as well as for some text mining applications such as authorship attribution (guessing the authors of an anonymous text) and sequence prediction.

All the datasets  were public domain texts that have been prepared and converted to a suitable format for text analysis by Jean-Marc Pokou et al. (2016) so that they can be used with the SPMF library. 

Each dataset has two versions: (1) sequences of words and (2) sequences of Part-of-Speeches (POS) tags.

The authors and total number of words/sentences in the corpus of each author is as follows: Catharine Traill (276,829/ 6,588), Emerson Hough (295,166/ 15,643), Henry Addams (447,337/ 14,356), Herman Melville (208,662/ 8,203), Jacob Abbott (179,874/ 5,804), Louisa May Alcott (220,775/ 7,769), Lydia Maria Child (369,222/ 15,159), Margaret Fuller (347,303/ 11,254), Stephen Crane (214,368/ 12,177), and Thornton W. Burgess (55,916/ 2,950).

AuthorDatasets (books) in SPMF formatDatasets in SPMF format (with item names)
– can be used with the GUI of SPMF
Original books as text
Catharine Traill– A Tale of The Rice Lake Plains
(words / POS)
-Lost in the Backwoods (words / POS)
– The Backwoods of Canada (words / POS)
– A Tale of The Rice Lake Plains
(words / POS)
-Lost in the Backwoods (words / POS)
– The Backwoods of Canada (words / POS)
– A Tale of The Rice Lake Plains
(words / POS)
-Lost in the Backwoods (words / POS)
– The Backwoods of Canada (words / POS)
Emerson Hough– The Girl at the Halfway House (words / POS)
– The Law of the Land (words / POS)
– The Man Next Door (words / POS)
– The Girl at the Halfway House (words / POS)
– The Law of the Land (words / POS)
– The Man Next Door (words / POS)
– The Girl at the Halfway House (words / POS)
– The Law of the Land (words / POS)
– The Man Next Door (words / POS)
Henry Addams– Democracy, an American novel (words / POS)
– Mont-Saint-Michel and Chartres (words / POS)
– The Education of Henry Adams (words / POS)
– Democracy, an American novel (words / POS)
– Mont-Saint-Michel and Chartres (words / POS)
– The Education of Henry Adams (words / POS)
– Democracy, an American novel (words / POS)
– Mont-Saint-Michel and Chartres (words / POS)
– The Education of Henry Adams (words / POS)
Herman Melville– I and My Chimney (words / POS)
-Israel Potter (words / POS)
-The Confidence-Man His Masquerade (words / POS)
– I and My Chimney (words / POS)
-Israel Potter (words / POS)
-The Confidence-Man His Masquerade (words / POS)
– I and My Chimney (words / POS)
-Israel Potter (words / POS)
-The Confidence-Man His Masquerade (words / POS)
Jacob Abbott– Alexander the Great (words / POS)
– History of Julius Caesar (words / POS)
– Queen Elizabeth (words / POS)
– Alexander the Great (words / POS)
– History of Julius Caesar (words / POS)
– Queen Elizabeth (words / POS)
– Alexander the Great (words / POS)
– History of Julius Caesar (words / POS)
– Queen Elizabeth (words / POS)
Louisa May Alcott– Eight Cousins (words / POS)
– Rose in Bloom (words / POS)
– The Mysterious Key and What Opened (words / POS)
– Eight Cousins (words / POS)
– Rose in Bloom (words / POS)
– The Mysterious Key and What Opened (words / POS)
– Eight Cousins (words / POS)
– Rose in Bloom (words / POS)
– The Mysterious Key and What Opened (words / POS)
Lydia Maria Child– A Romance of the Republic (words / POS)
-Isaac THoppe (words / POS)
-Philothea (words / POS)
– A Romance of the Republic (words / POS)
-Isaac THoppe (words / POS)
-Philothea (words / POS)
– A Romance of the Republic (words / POS)
-Isaac THoppe (words / POS)
-Philothea (words / POS)
Margaret Fuller– Life Without and Life Within (words / POS)
-Summer on the Lakes, in 1843 (words / POS)
– Woman in the Nineteenth Century (words / POS)
– Life Without and Life Within (words / POS)
-Summer on the Lakes, in 1843 (words / POS)
– Woman in the Nineteenth Century (words / POS)
– Life Without and Life Within (words / POS)
-Summer on the Lakes, in 1843 (words / POS)
– Woman in the Nineteenth Century (words / POS)
Stephen Crane– Active Service (words / POS)
– Last Words (words / POS)
– The Third Violet (words / POS)
– Active Service (words / POS)
– Last Words (words / POS)
– The Third Violet (words / POS)
– Active Service (words / POS)
– Last Words (words / POS)
– The Third Violet (words / POS)
Thornton WBurgess– The Adventures of Buster Bear (words / POS)
– The Adventures of Chatterer the Red Squirrel (words / POS)
-The Adventures of Grandfather Frog (words / POS)
– The Adventures of Buster Bear (words / POS)
– The Adventures of Chatterer the Red Squirrel (words / POS)
-The Adventures of Grandfather Frog (words / POS)
– The Adventures of Buster Bear (words / POS)
– The Adventures of Chatterer the Red Squirrel (words / POS)
-The Adventures of Grandfather Frog (words / POS)
ALL THE 30 ABOVE BOOKSwords / POSwords / POSwords POS

If you use the above book datasets, you may want to cite this paper:

Pokou J. M., Fournier-Viger, P., Moghrabi, C. (2016). Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams. Proc. 29th Intern. Florida Artificial Intelligence Research Society Conference (FLAIRS 29), AAAI Press, pp. 86-91

In that paper, we have discovered skip-grams (sequential patterns) and n-grams (consecutive sequential patterns) of part-of-speech tags to guess the authors of books.

More datasets can also be found on the dataset webpage of the SPMF software.


Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

Posted in Data Mining, Data science | Tagged , , | Leave a comment

The PAKDD 2020 conference (a brief report)

In this report, I will talk about the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2020), from the 11th to 14th May 2020.

The PAKDD conference

PAKDD is a top international conference on data mining / big data in the Pacific-Asia part of the world. I have attended this conference times and written reports about several editions of the conference. If you are interested, you can read these reports here: PAKDD 2014PAKDD 2015, PAKDD 2017,  PAKDD 2018 and PAKDD 2019.

PAKDD Proceedings

As usual, the conference proceedings of PAKDD 2020 are published by Springer in the Lectures Notes on Artificial Intelligence (LNAI) series. This ensures that the proceedings are indexed in DBLP and other major indexes, and gives good visibility to papers.

This year, there was 628 submissions to PAKDD 2020. From those, 135 papers have been accepted, which means an acceptance rate of 21.5%.

The conference went online

This year, the PAKDD 2020 conference was planned to be held in Singapore. But due to the unforeseen COVID-19 virus pandemic around the world, the PAKDD 2020 conference was held online instead. Part of the registration fee was re-imbursed to the authors because organizers saved money by doing the conference online. And of course, since the conference was online, all social events like banquet, reception were cancelled.

All authors were asked to submit a pre-recorded 13 minute video of their paper in 720p resolution with their slides, before the conference. Then during the conference, authors had to be available to answer questions online after the presentation of their paper. Thus, each paper was alloted a total of 17 minutes. This is somewhat less than previous years where long presentations had about 30 minutes, if I remember well.

The conference could be accessed through the Zoom online meeting system. To attend the different sessions, a password was required, which was made available to registered attendees.

Some video ettiquette tips were given to authors

As for proceedings, since the conference was online, proceedings were made for download from the conference website in PDF format.

Day 1 – Tutorials and workshop day

On the first day, there was 5 workshops and 2 tutorials.

I first went to have a look at the literature based discovery workshop using Zoom. There was about 22 persons in that workshop at 9:26 AM, watching this presentation about using evolutionary algorithms for matching biodemical ontologies.

Then, I popped in the Data Science for Fake News workshop at 9:40 AM to see how it was. Although, it was supposed to start at 9 AM, the workshop had not started. Using the chatroom, I asked and was answered that it was delayed until 10 AM (perhaps some technical problem or someone missing due to time zones?).

Thus, I went next to check the Game Intelligence & Informatics workshop at 9:50. There was about 11 persons watching the presentations at 9:47 AM. Game intelligence is a quite interesting topic. Here is a screenshot from that workshop, where game strategies were analyzed:

Then, at 9:57 AM I went to have a quick look at the Tutorial on Deep Explanations in Machine Learning via Interpretable Visual Methods, which was in the fourth parallel session. There was about 44 persons watching it, so it seemed to be the most popular session. This topic is interesting as neural networks can be very effective but are mostly black-box models . In that tutorial, they talked about how to interpret such models, and they also discussed some other ways of interepreting knowledge in data mining such as how to visualize association rules (screenshot below).

So far, all of this was quite interesting. And there was some good questions in the sessions that I have attended.

In the afternoon at 2PM, I attended the 9th Workshop on Biologically Inspired Data Mining (BDM 2020). This is a workshop that has been running for many years at PAKDD, that I personally like as it cover various topics such as genetic algorithms, particle swarm optimization (PSO), ant colony optimization, and also applications of such algorithms. There was about 18 persons attending the workshop at 2:11 PM. First, the organizer Shafiq Alam gave an overview of the motivations for biologically inspired data mining by explaining that optimization algorithms like genetic algorithms can be used to quickly find an approximate solutions to hard problems, if we can accept to lose a little bit about the accuracy. Then, some results were about using PSO for clustering and recommendation. Then, there was some paper presentations, and a discussion about current trends.

At the same time in the afternoon, there was a Tutorial on deep Bayesian network that had about 31 attendees at 2:19 PM, and a workshop on Learning Data Representations for clustering, which had about 14 attendees at 14:21 PM. Overall, it seems that the tutorials were the most popular sessions during this first day.

Day 2

At 8:30 to 9:00 AM, there was the conference opening. There was about 59 persons in that session at 8:58 AM. Some awards were announced:

It was followed by a keynote from Prof. Bing Liu about open-world AI and “continual learning”, which discusssed the need for software that can continuously learn. Here are a few slides:

This was followed by two Industry talks, one by Ussama Fayyad and another by Ankur Teredesai. Below is a few slides from the talk of A. Teredesai about AI for health, which was watche. He discussed how data mining and AI can help for healthcare. In particular, he talked about epidemiological models for diseases such as COVID-19. At 11:18 AM, there was about 27 persons in that session. That talk interesting but there was some internet connection problems at some point such that the audio was hard to hear for a few minutes. But then, it was OK.

Then, in the afternoon, there was paper presentations.

Day 3

On the morning 8:30 AM, there was a keynote talk by Inderjit S. Dhillon about multi-output prediction. There was about 42 persons watching at 8:51 AM. Here is a screenshot of that talk:

In the afternoon, there was a keynote talk by Prof. Samuel Kaski titled “Data Analysis with Humans” about how humans can participate in the machine learning process. There was about 34 persons attending the talk at 2:08 PM. He first illustrated that different problems (and method) require different levels of human intervention.

Generally, the user can participate in different ways in the machine learning of data mining process.

First the user can be a passive data source. Second the user can participate more actively in the process of machine learning or data mining to guide the software program.

Here is a slide from approach 1).

Then, there was more slides and details but I did not take note of everything.

Then, after that there was more paper presentations.

Day 3

On Day 3, there was the most influential paper talk, a keynote talk by Prof. Jure Leskovec in the afternoon, and more paper presentations.

Papers about pattern mining

Now I will talk a little bit about papers related to pattern mining, as it is one of my topics of interest. I presented a paper about a new algorithm named LTHUI-Miner to discover high utility itemsets that are trending in non predefined time periods in customer transaction databases. This is the work of my master degree student:

Fournier-Viger, P., Yang, Y., Lin, J. C.W., Frnda, J. (2020). Mining Locally Trending High Utility Itemsets. Proc. 24th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD 2020), Springer, LNAI [video]

You can watch the video of my presentation here.

Also another paper related to pattern mining that was published in PAKDD this year is about discovering frequent subsequences in a set of sequences using an algorithm called Tree-Miner:

Tree-Miner: Mining sequential patterns from SP-Tree. Redwan Ahmed Rizvee (University of Dhaka), Chowdhury Farhan Ahmed (University of Dhaka), Mohammad Fahim Arefin (University of Dhaka)

Is this online format a success?

Overall, the online format of this conference is fine. But I miss the social activities of an offline conference like the coffee breaks, where we can talk with other researchers to exchange ideas and meet new people. For me, this is perhaps the most interesting parts of a conference. For me, this is one of the most interesting aspects of a conference.

Also, as a suggestion, it would have been nice if there was a playback feature to watch presentations that we have missed. In my case, I am in the same time zone as Singapore so it was convenient for me to watch the presentations, but I can imagine that people from some other countries (e.g. some part of Canada with a 12 hours time difference) would have a harder time to watch some presentations.

Special journal issues

Some papers were invited for a special issue in the JDSA journal. This is always interesting to be invited in a special issue. However, although this journal is published by Springer, a problem is that this journal is still quite new, and as such it is to my knowledge not indexed in databases like SCI or EI. In some countries like where I work, this is important and papers not indexed do not have so much value. So for this reason I had to decline the invitation to extend my paper. I would have prefered to be invited in a special issue in a more established journal like some other conferences do.

In the call for papers, there was also a mention that some papers would be invited for an issue in the KAIS journal. This is a quite good journal, but apparently it was only for the few very best papers.

Conclusion

Overall, it was an interesting conference. Due to the virus situation, the conference was held online. The organizers manage to organize the conference very well in this situation. Looking foward to PAKDD 2021 next year.


Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

Posted in Conference, Data Mining, Data science | Tagged , , , | Leave a comment

“Pattern Mining :Theory and Practice” (textbook in Thai, with SPMF)

Hi all, this is to announce that a new textbook in Thai has been published about pattern mining, which includes many examples using the SPMF software. The textbook named “Pattern Mining: Theory and Practice” is written by teacher Panida Songram from Mahasarakham University (Thailand) and can be used for teaching or self-learning, for students or practitionners. I have known the auhor for many years and I am very happy that she let me host a copy of the book that you can download from this link:
Pattern Mining: Theory and Pratice (PDF, 14.2 MB),

The book gives a good coverage of pattern mining. It explains algorithms but also contains many practical examples about how to use SPMF. Some key topics in the book are itemset miningsequential pattern mining and multi-dimensional sequential pattern mining.

That is all I wanted to share for today. If you can read Thai, I highly recommend to download this book. 😉


Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

Posted in Data Mining, Data science, Pattern Mining | Tagged , , | Leave a comment

(video) Mining Locally Trending High Utility Itemsets

Today, I want to share with you the video presentation that I have prepared for my paper at PAKDD 2020. It presents a new problem where we want to discover locally trending high utility itemsets (LTHUIs). A LTHUI is a set of items purchased by customers that are trending (generate money that follows an upward or downward trend during some non predefined time periods. It is a variation of the popular high utility itemset mining problem.

VIDEO LINK: http://philippe-fournier-viger.com/spmf/videos/pakdd720p.mp4

Hope you will enjoy this video! If you want more details about this topic, you can read this paper:

Fournier-Viger, P., Yang, Y., Lin, J. C.W., Frnda, J. (2020). Mining Locally Trending High Utility Itemsets. Proc. 24th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD 2020), Springer, LNAI, 12 pages.

The source code will be released soon in the SPMF data mining software.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

Posted in Data Mining, Data science, Pattern Mining, Video | Tagged , , , , , | Leave a comment

Success and Health for Researchers

Many researchers or students want to be successful researchers in their field. For this they make many sacrifices such as working long hours at the lab every day from morning to the evening. This is important because honestly, success comes with hard work. But it is important to still keep a good life balance to stay healthy. In this blog post, I will talk about the importance of having good life and work habits for researchers.

First let me tell you a bit about my story. Since the start of my graduate studies, I have worked countless hours to improve myself. For example, during my master degree and Ph.D. studies, I would basically not take any rest during the whole year, and work maybe 12 hours every day. That has allowed me to be successful in my field, receive big grants during my studies, publish many papers, and then to land some good jobs in academia. Nowadays, as I have a familly, I cannot work as much as when I was a student, but I still work hard, and I am much more efficient that I was before due to the skills that I have gained. For example, I can write a paper much more quickly. I still work very late at night almost every day.

Health is important

Now, what I have learnt over the year is that working is not everything. Health is also very important. Working for long hours at the lab can eventually bring several health problems like pains in the wrist, neck, back problem, and eye problems. Luckily, I do not have any major problems, but it is something to be awared of, as problems will typically appear later down the road.

My advices

First, it is important to eat healty food.

Second, it is important to have a good posture while working. For example, it is worthy to find a good chair for working and to adjust the height of the table, screen and to have some appropriate mouse and keyboard, to be comfortable.

Third, it is important to avoid sitting for a too long time, and to sometimes rest your eyes. Several studies have shown that sitting for long periods of time may lead to various diseases. Thus, every hour, it is good to stand up and go for a walk for a few minutes, for example.

Fourth, it is equally important to do some exercise every week. Even doing a few hours of exercise every like running, swimming or playing badminton can make you feel better. I personally like to go run for 30 minutes to an hour every day.

Also, if you are tired or are always siting on a chair, you may consider working in a standing position. I have recently started to do this, and it really feels great. I even wonder why I have not done this before! It is very good for the posture and the back. Here is a picture of my setup at home:

working in a standing position

Some people recommend to alternate between a standing and sitting position to avoid getting tired. But personally, I have no problem working for several hours in a standing position. If you dont have a support like mine on the picture, you could as well use some boxes to raise your computer higher.

Another good advice is that if you are working on a laptop, you should consider using an external screen or external keyboard. The reason is that if you put your laptop low, then the keyboard will be perhaps at an appropriate height but the screen will be too low and you will have to bend your neck. But on the other hand, if you put your laptop higher the screen will be at an appropriate height for your eyes but the keyboard will be too high. Thus, using an external screen or keyboard can solve this problem.

Conclusion

In this blog post, I have discussed about the importance of having some good life habits to be a healthy researcher and avoid health problems later in life. If you have some other suggestions related to this, please post them in the comment section below!

Posted in Academia, Research | Tagged , , | Leave a comment

A few errors to avoid in research papers

Today, I will write a short blog post just to give a list of some common errors that I observed recently in some journal and conference research papers:

  • Using a reference number as the subject of a verb. For example, “[12] proposed an algorithm” should be written as “Smith et al. [12] proposed an algorithm”.
  • When there is a shorter way of writing something, it should be used. For example, “in order to” should be replaced by “to“. Another example: “this new type of algorithm is” can be replaced by “this new algorithm type is“. Similarly, “A is an extension of B” can be replaced by “A extends B“. One should write concisely.
  • The title of a paper is too long. I recommend to not have more than 10 words, and preferably less. I recently read a paper having a title with more than 20 words!
  • Using too much the word “we”. Generally, it is better to avoid using “we” as much as possible.
  • Using the words “you” or “I”. These words should never appear in a research paper.

I could say much more about this. Indeed, you can look at my other blog posts about writing research papers for more information. But my goal was just to remind you about some common errors!

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

Posted in Academia, General, Research | Tagged , , | Leave a comment

UDML 2020 – Utility Driven Mining and Learning Workshop

Hi all, This is to let you know the good news that the UDML workshop on Utility Driven Mining and Learning will be back this year, at IEEE ICDM 2020, for the third edition (UDML 2020).

This is a good venue to submit your papers about data mining and machine learning, especially given that all accepted papers will be published in the IEEE ICDM workshop proceedings, just like last year! Also, we are planning to have a special issue in a good SCI/EI journals for the best papers of the workshop (to be confirmed).

open-source data mining software

In particular, if you have some papers about high utility pattern mining (including topics such as high utility itemset mininghigh utility episode mining or high utility sequential pattern mining), this is a perfect place to submit your papers 😉

But we are also looking for papers on other more general topics related to the concept of utility, such as to analyze/learning the important factors (eg, economic factors) in the data mining or machine learning process. Here is a non exhaustive list of some potential topics:

  • Theory and core methods for utility mining and learning
  • Utility patterns mining in large datasets, e.g., high-utility itemset mining, high-utility sequential patterns/rules mining, high-utility episode mining, and other novel patterns
  • Analysis and learning of novel utility factors in mining and learning process
  • Predictive modeling/learning, clustering and link analysis that incorporate utility factors
  • Incremental utility mining and learning
  • Utility mining and learning in streams
  • Utility mining and learning in uncertain systems
  • Utility mining and learning in big data
  • Knowledge representations for utility patterns
  • Privacy preserving utility mining/learning
  • Visualization techniques for utility mining/learning
  • Open-source software/libraries/platform
  • Innovative applications in interdisciplinary domains, like finance, biomedicine, healthcare, manufacturing, e-commerce, social media, education, etc.
  • New, open, or unsolved problems in utility-driven mining

The website of the UDML 2020 workshop is here:
http://www.philippe-fournier-viger.com/utility_mining_workshop_2020/

Submissions are limited to 10 pages, and must be formatted according to the IEEE 2-column format(link) Papers will be evaluated based on the evaluation criteria of the main IEEE ICDM 2020 conference for research papers. In particular, papers must present original research that is not under consideration in other journals, conferences and workshops.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

Posted in Big data, Data Mining, Pattern Mining, Utility Mining | Tagged , , , , , | Leave a comment

(video) Mining Cost-Effective Patterns

In this blog post, I will share another talk that I have recorded recently. This time, I will explain a new paper from my team about discovering cost-effective patterns using some algorithms called CEPB and CEPN. Mining cost-effective patterns is a new topic in pattern mining that combines the concept of utility with that of cost.

VIDEO LINK: http://www.philippe-fournier-viger.com/spmf/videos/cost_video.mp4

Hope you will enjoy this video! If you want more details about this topic, you can read this paper:

Fournier-Viger, P., Li, J., Lin, J. C., Chi, T. T., Kiran, R. U. (2019). Mining Cost-Effective Patterns in Event Logs. Knowledge-Based Systems (KBS), Elsevier

Moreover, you can also download these algorithms, the source code and dataset from the SPMF data mining library.

That is all for today.
==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

Posted in Big data, Data Mining, Pattern Mining, Utility Mining, Video | Tagged , , , , | Leave a comment

(video) Discovering interpretable high utility patterns in databases

Today, I will share a short keynote talk (28 min) about discovering interpretable high utility patterns in data that I have presented at the CCNS 2020 conference. This talk gives an overview of techniques for finding interesting and useful patterns that can help to understand data.

VIDEO LINK: http://www.philippe-fournier-viger.com/spmf/videos/ccns_small.mp4

Hope you will enjoy this video! If you want to know more about how to find interesting and useful patterns in data, I have written a series of blog posts on this topic.

I have also published various videos that you can find on this blog. Moreover, to apply this in your projects, you can use the SPMF open-source data mining sofware (which I am the founder). It provides more than 150 algorithms for identifying useful patterns in data.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

Posted in Big data, Data Mining, Data science, Pattern Mining, Video | Tagged , , , , | Leave a comment

(video) Identifying Stable Periodic Frequent Patterns using SPP-Growth

Today, I present a video about finding stable periodic patterns in data, and discuss a new algorithm named SPP-Growth for this task.

VIDEO LINK: http://www.philippe-fournier-viger.com/spmf/videos/SPPGrowth.mp4

The  SPP-Growth algorithm and datasets for evaluating its performance are available in the SPMF software, which is open-source and programmed in Java.

Source code and datasets:

The source code of SPP-Growth and datasets are available in the SPMF software.

The research paper:

Fournier-Viger, P., Yang, P., Lin, J. C.-W., Kiran, U. (2019). Discovering Stable Periodic-Frequent Patterns in Transactional Data. Proc. 32nd Intern. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA AIE 2019), Springer LNAI, pp. 230-244

If you want to watch more videos about data mining algorithms that I have recorded, you can click on the “video” category of this blog.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

Posted in Big data, Data Mining, Data science, Video | Tagged , , , , , , | Leave a comment