The Data Blog

An Introduction to Episode Mining

Posted on 2021-08-22 by Philippe Fournier-Viger

In this blog post, I will talk about pattern mining (finding patterns in data) data mining task called episode mining. It aim at discovering interesting patterns in a long sequence of symbols or events. Sequence data is an important type of data found in many domains. For instance, a sequence can encode data such as a sequence of alarms produced by a computer system, a sequence of words in a text, and a sequence of purchases made by a customer. Using episode mining, patterns can be discovered in a sequence. These patterns can then be used to understand the data, and support decision making.

I will first define what is an event sequence. Then, I will introduce the basic problem of episode mining. Then, I will discuss some extensions and software for episode mining.

What is an event sequence?

An event sequence (also called a discrete sequence) is a list of events, where each event has a distinct timestamp. For example, consider the following event sequence:

<(t1, {a,c}), (t2, {a}), (t3, {a,b}), (t6,{a}),(t7,{a,b}), (t8, {c}), (t9, {b}), (t11, {d})>

In that sequence, each letter (a,b,c,d) is an event type and t1, t2, … t11 are timestamps. The above sequence indicates that event “a” and “c” occurred at time t1, was followed by event “a” at time t2, then by events “a” and “b” at time 3, then followed by event “a” at time t6, and so on. A visual representation of this sequence is shown below:

a complex event sequence for episode mining

If an event sequence contains some events that occur simultaneously (have the same timestamp), it is said to be a complex event sequence. Otherwise, it is said to be a simple sequence.

As previously said, sequence data is found in many domains. For example, the above sequence could represent the purchases made by a customer in a store over time where “a”, “b”, “c” and “d” represents different types of products such as “apple”, “bread”, “cake” and “dattes”. Another example is a sequence of locations visited by a person in a city, where “a”, “b”, “c” and “d” could represent the home, a restaurant, the university and the gym. Another example is a sequence of words in a text, where “a”, “b”, “c” and “d” represent different words.

Frequent Episode mining

To analyze such sequence of events or symbols, a popular data mining task is frequent episode mining. The aim is to find all the episodes (subsequences of events) that appear frequently in a sequence over time. An episode is said to be frequent if it appears at least minsup times, where minsup is a parameter set by the user.

The task of finding these frequent episodes is called frequent episode mining. It was proposed by Toivonen and Mannila in 1995 to analyze alarm sequences in this paper:

Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining (1995)

Toivonen and Manila proposed two algorithms called WINEPI and MINEPI to discover frequent episodes (sets of events frequently appearing in an event sequences). These two algorithms have the same goal of finding episodes but these algorithms adopt different ways of counting how many times an episode appear. Thus, the output of the two algorithms is different. After the WINEPI and MINEPI algorithms were proposed, several other algorithms have been introduced to discover episodes more efficiently such as MINEPI+ and EMMA. Here I will explain the problem as it is defined for the EMMA algorithm. The paper proposing the EMMA algorithm is:

Huang, K., Chang, C.: Efficient mining of frequent episodes from complex sequences. Inf. Syst. 33(1), 96–114 (2008)

The goal of episode mining is to find frequent episodes. An episode E is a sequence of event sets of the form E = <X1, X2, . . . , Xp>. This notation indicates that a set of events X1 occurred, was followed by another set of events X2, and so on, and finally followed by a set of events Xp.

If we look at the sequence of Fig. 1 an example of episode is <{a},{a,b}> which contains two event sets: {a} and {a,b}. This episode <{a},{a,b}> means that the event “a” appears and is followed by the events “a” and “b” at the same time. This episode can observed many times in the example sequence of Fig. 1. Below, I have highlighted the six occurrences of that episode:

In the EMMA algorithm, the user must set two parameters to find frequent episodes. The first parameter is the window length (winlen). This parameter is used to filter out occurrences that are spanning over a too long period of time. For example, if the parameter winlen is set as winlen = 3, it means that we are only interested in counting the occurrences of episodes that last no more than 3 time units. In the above example, only three occurrences of episode <{a},{a,b}> satisfy this criterion:

For instance the first occurrence (1) has a duration of 3 time units., while the occurrence (3) has a duration of two time units.

The second parameter that must be set to find frequent episodes in a sequence is called the minimum support threshold (minsup). This parameter indicates the minimum number of occurrences that an episode must have to be considered as a frequent episode. Let say that this parameter is set to 3. Then the goal of episode mining is to find all frequent episodes that have at least three occurrences having a duration that is no more than winlen. The number of occurrences of an episode is called its support.

Before showing an example, I should also say that the EMMA algorithm has a particular way of counting occurrences. It is that if two or more occurrences start at the same timestamp, EMMA will count this as only one occurrence. For example, the occurrences (1) and (2) of episode <{a},{a,b}> are counted as only one occurrence because they both start at the timestamp t1. It was argued in the paper of EMMA that this counting strategy is more appropriate than the counting methods used by MINEPI and WINEPI for real applications.

Now, let me show you what is the result of frequent episode mining with EMMA. For instance, if we apply EMMA on the above sequence with winlen = 2 and minsup = 2, the seven frequent episodes are discovered:

These frequent episodes indicate sequential relationships between events (or symbols) in the sequence. If we assume that the sequence of Fig. 1 is a sequence of events on a computer network, these patterns may be useful for someone doing the maintenance of the network, as it indicates for example that event “a” is often followed by events “a” or “b” within a short period of time.

An improvement: Top-K Frequent Episode Mining

Although frequent episode mining is useful, a problem is that it is not always intuitive to set the minsup parameter. For example, for some sequences setting minsup = 2 may result in finding only a few episodes while on other sequences, millions of episodes may be found for minsup = 2. But the user, typically dont have time to look at millions of episodes, and tuning the minsup parameter to just find enough episodes can be time-consuming (it is done by trial and error).

To address this issue, the task of frequent episode mining was recently redefined as the task of top-k frequent episode mining. Rather than using the minsup parameter, the user can direclty indicate the number of patterns k to be found, which is much more convenient and intuitive. The TKE algorithm is the first algorithm for discovering the top-k most frequent episodes in a sequence. It adopts the same definition of episode as the EMMA algorithm, which was presented above. But it let the user directly specify how many episodes must be discovered.

For example, if we run the TKE algorithm with k = 3 and winlen = 2, it is found that the top 3 most frequent episodes in the sequence of Fig. 1 are <{a},{a}>, <{a}> and <{b}>.

The TKE algorithm was presented in this paper:

Fournier-Viger, P., Wang, Y., Yang, P., Lin, J. C.-W., Yun, U. (2020). TKE: Mining Top-K Frequent Episodes. Proc. 33rd Intern. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA AIE 2020), pp. 832-845. [source code] [ppt]

An extension: Episode Rules

Although frequent episodes are interesting for some applications, it is sometimes more desirable for decision making to find patterns that have the form of a rule such as IF something happens THEN something else will happen.

To address this issue, several algorithms have been designed to find episode rules. The concept of episode rule is based on the concept of episode, as traditionally, episode rules are derived from frequent episodes. An episode rule was first defined by Manila and Toivonen (1995) as having the form X –> Y indicating that if an episode X is observed, it is likely to be followed by another episode Y.

To find interesting episode rules, several measures can be used. The two most popular are the support and confidence. The support of an episode rule X–> Y is the number of times that the rule appeared. The confidence of a rule X–> Y is the number of times that X is followed by Y, divided by the number of times that X was observed. Then, the task of episode rule mining is to find all the episode rules that appear at least minsup times and have a confidence that is no less than minconf, where minsup and minconf are parameters set by the user. Besides, as for frequent episode mining, it is expected that rules appear whithin a maximum window length (time duration) called winlen.

For instance, a rule can be <{a,b}> –> <{c}> indicating that event “a” and “b” appear together and are followed by “c”. If winlen = 2, then this rule has only one occurrence, shown below:

Thus, the support of that episode rule <{a,b}> –> <{c}> is said to be 1. Now, if we want to calculate the confidence of that episode rule, we need to also count the number of times that its antecedent, episode {a,b}, appears. There are two occurrences of episode {a,b}, displayed below:

The confidence of the episode rule <{a,b}> –> <{c}> is the number of times that <{a,b}> –> <{c}> appears divided by the number of times that <{a,b}> appears, that is 1 / 2 = 0.5. This means that 50 % of the times <a,b> is followed by <{c}>.

This is just a brief description of traditional episode rules. There is also another type of episode rules that has been recently proposed called partially-ordered episode rules and was argued to be more useful. The idea of partially-ordered episode rules comes from the observation that traditional episode rules require a very strict ordering between events. For example, the rule <{a,b}> –> <{c}> is viewed as different from <{a},{b}> –> <{c}>, or the rule <{b},{a}> –> <{c}>. But in practice, these three rules only have some small variations in the order between events, and may thus be viewed as describing the same relationship between events “a”, “b” and “c”. Based on this observation, the concept of partially-ordered episode rules was proposed. The aim is to simplify the rule format such that the ordering constraint between events in the left part of a rule is removed, as well as the ordering constraint between events in the right part of the rule. Thus, an episode rule has the form X –> Y where X and Y are two sets of events, but the events in X are unordered and the events in Y are also unoredered. For example, a partially ordered episode rule is {a,b} –> {c} which means that if “a” and “b” appear in any order, they will be followed by “c”. This rule replaces the three traditional rules mentioned previously and can thus provide a good summary to the user.

To discover episode rules, there are two ways. To discover traditional episode rules, the main approach is to first apply a frequent episode mining algorithms such as TKE, EMMA, WINEPI and MINEPI. Then, the episode rules are derived from these episodes by combining pairs of episodes. To discover partially-ordered episode rules, the process is slightly different. It is not required to first discover frequent episodes. Instead, rules are extracted directly from a sequence using a tailored algorithm such as POERM or POERMH. The POERM algorithm is presented in this paper:

Fournier-Viger, P., Chen, Y., Nouioua, F., Lin, J. C.-W. (2021). Mining Partially-Ordered Episode Rules in an Event Sequence. Proc. of the 13th Asian Conference on Intelligent Information and Database Systems (ACIIDS 2021), Springer LNAI, pp 3-15 [video] [software] [ppt]

High utility episode mining

Another popular extensions of the episode mining problem is high utility episode mining. This problem is a generalization of frequent episode mining where each event can be annotated with a utility value (a number indicating the importance of the event), and the goal is to discover episodes having à high utility (importance). An example of application of high utility episode mining is to find sequence of purchases made by customers that yield a high profit in a sequence of customer transactions.

If you are interested by this topic, you may check the HUE-SPAN algorithm which was shown to outperform previous algorithm. It is described in this paper:

Fournier-Viger, P., Yang, P., Lin, J. C.-W., Yun, U. (2019). HUE-SPAN: Fast High Utility Episode Mining. Proc. 14th Intern. Conference on Advanced Data Mining and Applications (ADMA 2019) Springer LNAI, pp. 169-184. [ppt] [software]

Software for episode mining

If you want to try episode mining with your data or do research on episode mining, the most complete software for episode mining is by far the SPMF data mining library.

SPMF is an open-source software implemented in Java, which has a simple user interface and can be called from the command line. Morevoer, there exists some unofficial wrappers to call SPMF from other programming languages such as Python. The SPMF library offers more than 200 algorithms for pattern mining, including a dozen algorithms for discovering frequent episodes such as TKE, MINEPI and EMMA, and also several algorithms for episode rule generation such as POERM. Besides, several algorithms for high utility episode mining are implemented such as HUE-SPAN.

SPMF also offers algorithms to find other types of patterns in sequences. For example, it is possible to find sequential patterns and sequential rules (patterns appearing in multiple sequences instead of only one sequence), and to find periodic patterns (patterns that are periodically appearing in a sequence). Other interesting tasks that can be done with a sequence is sequence prediction.

Conclusion

That is all for today. I just wanted to give a brief introduction about episode mining, as it is a fundamental task for analyzing sequences of events or symbols. If you have comments, please share them in the comment section below. I will be happy to read them.

—

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Key Papers about Episode Mining

Posted on 2021-08-16 by Philippe Fournier-Viger

This post presents the key papers about episode mining. If you are not familiar with what is episode mining, it is a data mining task, which aims at finding patterns in a sequence of events or symbols . A short introduction to episode mining is posted on my blog. Finding patterns in a sequence can be useful to discover regularities that may provide insights about the data, and even support decision making.

Below, I list the most important episode mining papers as a table with my comments on each paper to allow to get a quick overview of the field.

Note: that list is based on my analysis of the papers and is thus subjective and I may have missed some papers that other researchers would have deemed important.

Author and date	Paper title	Algorithm(s)	Key idea
Mannila et al. (1995)	Discovering frequent episodes in sequences	WINEPI, MINEPI	– This paper proposed the problem of episode mining. It is thus a key paper in that field. – A first algorithm called WINEPI finds episodes by performing a breadth-first search and using a sliding window. It counts the support (occurrence frequency) of an episode as the number of windows where the episode appears. However, WINEPIhas the problem that an occurrence may be counted more than once. – To address this issue, a second algorithm called MINEPI find frequent episodes by only considering the minimal occurrences of each episode. – This paper also presents a basic algorithm to generate episode rules by combining pairs of episodes. This is done as post-processing after applying WINEPI or MINEPI.
Huang et a. (2008)	Efficient mining of frequent episodes from complex sequences	EMMA, MINEPI+	– It is observed that window-based algorithms for episode mining such as MINEPI and WINEPI sometimes produce some strange results. – To address this issue, a novel measure is proposed to count the support (number of occurrences) of an episode called the head support or head frequency. – Two algorithms are designed to find frequent episodes using the head frequency: EMMA (a depth-first search algorithm) and MINEPI+ (a modified version of MINEPI)
Fournier-Viger et al. (2019)	TKE: Mining Top-K Frequent Episodes	TKE	– This paper makes the observation that it is difficult for users to set the minsup parameter for frequent episode mining. As a result users often have to spend considerable time fine tuning the parameters and may find too many or too few episodes. – As a solution this paper redefine the task of episode mining as top-k frequent episode mining, where the user can directly choose the number of patterns to be discovered (which is more intuitive). – The TKE algorithm is defined to efficiently find the top-k most frequent episodes.
Ao et al. (2018)	Online frequent episode mining		– This paper extended the concept of episode mining
Zhou et al. (2010)	Mining closed episodes from event sequences efficiently	Clo_episode	– This paper designed an algorithm to find closed episodes. The goal is to reduce the number of episodes presented by only showing some episodes that are said to be closed. This gives sometimes a small set of episodes that summarizes the frequent episodes. – An algorithm is presented to find closed episodes called Clo_episode. This algorithm adopts a breadth-first search and use the concept of minimal occurrences. – A limitation: this paper cannot handle the case of multiple events happening at the same time (parallel episodes). while most algorithms can handle this case.
Tati and Cule (2011)	Mining closed episodes with simultaneous events		– This paper presents algorithms to find closed episodes where events may be simultaneous. – This fixes the main limitation of the paper of Zhou et al. (2010).
Laxman et al.(2007)	Discovering frequent episodes and learning Hidden Markov Models: A formal connection		– Laxman proposes to only count the non overlapping occurrences of episodes. – This counting method is called the non overlapping support or non overlapping frequency, and was then used in many other papers after.
Laxman et al. (2007)	A Fast Algorithm For Finding Frequent Episodes In Event Streams	Algorithm 1 and Algorithm 2	– This paper introduces an algorithms to find frequent episodes in a potentially infinite sequence of events (a data stream)
Oualid et al. (2021)	Mining Episode Rules from Event Sequences Under Non-overlapping Frequency	NONEPI	– This paper presents an algorithm named NONEPI for episode rule mining using the concept of non overlapping frequency. – The goal is to find rules that are easier to interpret as occurrences must be non overlapping.
Fournier-Viger et al (2021)	Mining Partially-Ordered Episode Rules in an Event Sequence	POERM	– This paper makes the observation that traditional episode rules have a very strict ordering between events. – This paper defines are more general type of episode rules called partially-ordered episode rules. These rules loosen the ordering constraint between events. As a result, a partially-ordered episode rule can summarize multiple traditional episode rules. – The POERM algoriths is defined to find partially-ordered episode rules. It finds rules directly without first having to apply a frequent episode mining algorithm. – Another version of POERM called POERM_H was proposed in a subsequent paper “Mining Partially-Ordered Episode Rules with the Head Support“, where occurrences are counted using the head support of EMMA.
Fahed et al. (2018)	DEER: Distant and Essential Episode Rules for early prediction	DEER	– This paper presented an algorithm named DEER to find episore rules that can predict distant events rather than events that appear closely after other events. These rules are called essential rules. – The algorithm is based on the concept of minimal occurrences. – Limitation: the algorithm does not handle the case of simultaneous events.
Wu et al. (2013)	Mining high utility episodes in complex event sequences	US-SPAN	– A problem with traditional episode mining algorithm is that all event types are considered as equally important. – To address this issue, the concept of utility is added to episode mining to define a new problem of high utility episode mining. – In that problem each event occurrence can have a quantity as well as a weight. This allows for example to model the purchases of customers with quantities and unit prices to find episodes that yield the most money (the highest utility). – An efficient algorithm called US-SPAN is proposed for this problem, which is based on the concept of minimal occurrences.
Fournier-Viger et al. (2019)	HUE-Span: Fast High Utility Episode Mining	HUE-SPAN	– This paper makes the important observation that the previous algorithm for high utility episode mining US-SPAN can underestimate the utilitiy of episodes by not taking into account all timestamps of minimal occurrences for utility calculations. As a result, some high utility episodes can be missed. – To address this issue, the definition of utility is modified. – Moreover, a new and more efficient algorithm named HUE-SPAN is proposed for high utility episode mining. – This algorithm is based on the concept of minimal occurrences.
Ao et al. (2018)	Large-scale Frequent Episode Mining from Complex Event Sequences with Hierarchies	LA-FEMH	– A big data algorithm for episode mining called LA-FEMH is proposed using the Spark architecture. – The algorithm can find closed and maximal episodes and also consider that events are organized as a taxonomy. – Limitation: This algorithm does not handle the case of simultaneous events. In other words, the algorithm can only find serial episodes.
Fournier-Viger et al. (2022)	MaxFEM: Mining Maximal Frequent Episodes in Complex Event Sequences	MaxFEM, AFEM	– An algorithm called MaxFEM to find maximal episodes in a complex event sequence (the general case). – A version called AFEM to find all frequent episodes. – Those extends the EMMA algorithm with new optimizations and use the head support definition.

Implementations

There are very few software programs and source code available online for episode mining. The most complete software, which offers a dozen episode mining algorithms and is open-source is the SPMF data mining software (which I am the founder). It provides implementations of many algorithms such as MINEPI, EMMA, TKE， US-SPAN, POERM, and HUE-SPAN.

Besides episode mining, SPMF also offers algorithms for many other data mining tasks such as sequence prediction, high utility itemset mining, sequential pattern mining, periodic pattern mining, sequential rule mining and subgraph mining.

Survey paper on episode mining

If you want to read a detailed survey paper about episode mining, you can also check this survey paper:

Ouarem, O., Nouioua, F., Fournier-Viger, P. (2023). A Survey of Episode Mining. WIREs Data Mining and Knowledge Discovery, Wiley, to appear.

Conclusion

In this blog post, I have given a list of key papers about episode mining. Of course, making such list is subjective. But I believe that this list can be useful for those who wants to learn quickly about episode mining by having a quick summary.

If you have enjoyed this blog post, you may also check other content of this blog. There are many posts and resources related to pattern mining.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Big data, Data Mining, Data science, Pattern Mining | Tagged big data, data mining, data science, episode, literature, Minepi, papers, pattern mining, patterns, references, winepi | Leave a comment

What is Machine Learning?

Posted on 2021-08-09 by Philippe Fournier-Viger

In recent years, machine learning has become a popular research area of computer science. In this blog post, I will talk about what is machine learning.

What is machine learning?

Machine learning is computer programs (algorithms) that can learn to do a task by doing it or by learning it from some data. For example, a computer program can be designed to learn to play Chess by trying various moves and strategies against human players to select the most effective one (learning by experience). Or a program could be designed to learn the best Chess tactics from historical records of matches between human players (data).

What is the difference with artificial intelligence?

Another term that is often talked about in the media is artificial intelligence. So what is the difference between artificial intelligence and machine learning? Generally, artificial intelligence refers to computer programs that can do some task that requires intelligence (e.g. translating a document from English to French, writing a summary of a text, playing a game, composing some music). Some artificial intelligence program will be designed to learn from experience or data, and can thus be viewed as using machine learning. But there are also some other artificial intelligence programs that do not require learning. For example, one can build a program to play the game of Tic Tac Toe by explicitly writing the optimal rules for playing that game in the program. In this case, the program can use that knowledge base (rules) to play and does not need to learn. Thus, not all artificial intelligence programs need to learn, and machine learning can be viewed as a subset of artificial intelligence that represents the programs learn.

Why is machine learning popular?

There exists many machine learning techniques for making computer programs that can learn such as artificial neural networks, support vector machines and clustering. These techniques are popular because they can be used to create programs that learn to do some complicated tasks that would be otherwise very hard to build as a computer program by a human. For example, while it is extremely hard to program a computer to recognize objects in a video by hand or to translate a text accurately, these tasks can now be learned using machine learning techniques.

What are the applications of machine learning?

There are many applications of machine learning such as to play games, process video and audio data, translate documents, and recommend songs or movies to users of a website. Generally, each machine learning program is built to solve a specific task (e.g. playing the game of Go) rather than to solve many tasks. It is a major challenge to design machine learning programs that could learn many tasks.

What are some good books about machine learning?

Nowadays, many young researchers will directly focus on popular techniques such as deep learning. But in my opinion, one should try to have a broader picture of the field of machine learning as there are many other techniques. Some good book on machine learning in general are:

Pattern Recognition and Machine Learning by C. Bishop
Machine Learning by T. Mitchell
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Hastie et al.
Machine Learning: A Probabilistic Perspective – K. Murphy
Artificial Intelligence: A Modern Approach – Russell & Norvig

If you are a young researcher, you may also want to read about how to find a good machine learning research topic.

What is the relationship with data mining, big data and data science?

Another subfield of computer science that is popular nowadays is data mining, data science and big data. These terms generally refer to the use of algorithms to analyze data. There are generally two main goals to analyze the data: (1) understanding the data to learn something useful from the data (e.g. understanding the past, like why a tsunami occurred) and (2) predicting the future (e.g. predicting when the next tsunami will hit a country).

The data mining techniques that aim at making predictions using data can be viewed as a form of machine learning, while other techniques may just be viewed as method to analyze data. Thus, data mining can be viewed as intersecting with machine learning. If you want to know more about this, I wrote a blog post on the relationship between data mining and machine learning.

Conclusion

Hope that this blog post has been interesting! If you have any comments, please leave them in the comment section below.

—-

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Machine Learning | Tagged artificial intelligence, data mining, machine learning, ml | 8 Comments

Why Journal Special Issues are Popular?

Posted on 2021-08-02 by Philippe Fournier-Viger

In this blog post, I will talk about journal special issues. I will talk about how special issues of journals are organized and why it can be good to submit to or organize academic journal special issues.

What is a special issue of a journal?

First, I will explain some basic concepts about journals. An academic journal publishes papers written by researchers. These papers are generally published as part of an issue and a volume. An issue is a collection of multiple papers that may not be on the same topic, while a volume is a collection of one or more issues.

For example, a journal may publish an issue every month and four volumes per year. Thus, the first volume would contains the issues of January, February, March and April. Then, the second volume would contain the issues of May, June, July and August, and so on.

Volumes and issues are generally numbered as 1,2,3…. But it is to be noted that some journals do not follow this. For example, some journal will name volumes according to years such as volume 2021 for the year 2021. Moreover, some journals only group papers by volumes and don’t have issue numbers.

Having explained that, what is a special issue? A special issue is a group of papers that are generally on the same topic. For example, a journal may publish a special issue on pattern mining where all the papers are about this topic. There are also some conferences that will organize special issues in a journal that will contain the best papers of the conference. In that case, the paper of the special issue may not be on the same topic but are grouped based on another criterion.

Each journal has one or more editors that take care of managing the review process of papers by doing some tasks such as inviting reviewers, reading the recommendations of reviewers and deciding to accept, revise or reject papers. For special issues, the papers are generally handled by some guest editors rather than the regular editor(s) of the journal. This means that some researchers are responsible of the special issue, and those researchers are called “guest” because they do not usually work for the journal.

How are the special issues organized?

Generally, the special issues are organized by researchers that talk with the main editor of the journal and ask to organize a special issue in that journal on a specific topic. Then, if the editor accepts, the special issue will be created and there will be some advertisement to invite authors to submit papers to the special issue. A special issue may be open for several months, which means that authors may have several months to write a paper and submit it to a given special issue. When a paper is received for a special issue, it is the guest editors of the special issue that will organize its review.

To propose a special issue, several journals will require to submit a special issue proposal explaining the topic of the special issue, the reasons why the topic is timely, the background of the guest editors, etc. Some journals will receive many such proposals and will only organize a few special issues.

Why organizing special issues?

Journals typically organize special issues to attract papers on emerging or trending topics, as this is what will bring more papers and also citations. For example, a journal may organize a special issue about machine learning for analyzing the COVID19 genome as COVID is now a popular topic. Generally, journals do not like to organize special issues on old topics.

For a journal, special issues can help to bring more papers. This is the reason why some young journals will organize many special issues, while some top journals will rarely organize special issues. Thus a young journal is more likely to accept a proposal for making a special issue than a popular journal, who may just ignore them.

Why researchers want to organize special issues?

There are several reasons. First, it shows that a researcher is able to organize things and it gives editorial experience to a researcher about how to manage papers in a journal. This may latter help a researcher when applying to organize other things such as a workshop, a book or a position in a journal. Second, it gives visibility to a researcher and may help build connections with other researchers.

There are also some researchers that will try to abuse the concept of special issue by organizing many special issues and accepting the papers of their friends in exchange for their papers being in turn accepted in other special issues. This is something unethical that some people do. I have noticed this on websites like DBLP. I will not give any names but I found that some researchers always publish in some special issues of their friends. This does not look good and should be avoided. A researcher should always avoid conflicts of interest and be honest.

Why researchers publish in special issues?

A main reason for a researcher to publish in a special issue is to be part of an issue where all papers are on the same main topic. This will thus give visibility to the paper. For example, a special issue on periodic pattern mining may publish numerous papers on that topic.

Another reason why a researcher may choose to publish in a special issue is that often the review process is faster and sometimes there is a higher chance for a paper to be accepted than if the paper was submitted as a regular paper to the journal. This is for example often the case for special issues about the best papers of a conference. In that case, maybe 14 papers may be invited and 10 or more will be accepted.

Conclusion

In this blog post, I talked about the concept of special issue for academic journals. I gave an overview about why publish in and organize special issues can be interesting for researchers. Hope it has been interesting. Feel free to leave some comments below!

—

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, General, Research | Tagged academia, editor, guest editor, journal, Research, special issue | Leave a comment

Useful Latex tricks for Writing Research Papers

Posted on 2021-07-27 by Philippe Fournier-Viger

In this blog post, I will talk about some useful latex tricks for researchers writing research papers using Latex. This blog post is aimed at those who knows already how to use Latex but maybe do not know these tricks.

1.Reducing the length of your paper with \vspace

A common problem when writing a research paper is that the paper is too long. Besides rewriting the text to make it shorter, a solution is to use some special Latex commands to reduce the space. WARNING: But be aware that it is sometimes forbidden to use these commands, so use them at your own risk!

The main command is \vspace. It allows to reduce the vertical space between elements on a page. For example using \vspace{-0.5cm} before a figure will reduce the space before that figure of 0.5 cm. This is a very useful command. But it is recommended to use it after finishing writing a paper as this command can easily mess up the layout of your paper if the content is then changed.

2. Reducing the length of an algorithm written using algorithm2e

Another way of reducing the space in a paper is to reduce the size of an algorithm. A command that can be used is \scriptsize after \begin{algorithm}. This will reduce the font size of the algorithm and thus the space.

If you are using the algorithm2e package for your algorithms, another way of reducing the length of an algorithm is to use an inline IF instead of a regular IF. This is done by replacing \if{} by \lIf{}. The result is:

This can save a few lines. Similarly, it is possible to replace a \forEach{} loop by the inline version \lForEach{}. Oher algorithm2e commands can also be used as inline such as \else and \lElse.

Another useful command to reduce the size of an algorithm written with algorithm2e is to use \SetAlgoNoEnd after \begin{algorithm}. This will remove the “END” labels for all the IF, ELSE and FOR EACH parts. For example, the below picture show the effect:

3. Check if your paper contains uncited references with \refcheck

If you want to quickly find all the references that are not cited in your paper, you just need to add this: \usepackage{refcheck}. It will higlight the references that are not used from your bibliography. For example:

4. Comparing two versions of your LaTeX document with Latexdiff

Another very useful tool is LatexDiff. Many journals will ask authors to highlight the differences between two versions of their papers. I previously wrote a detailled blog post about using LatexDiff. Please see that blog post for details. The result is like this:

5. Adding TODO notes

Another useful tool is the TODONOTES package. It allows to add TODO comments on a latex document. This works well with the IEEE template. For example, by adding \usepackage{todonotes}, we can add comments in the document such as \todo{Error!} and it will appear like this:

6. Adding color to your Latex document

Another useful package is the color package. It allows to change the color of some part of your document. This can be useful to highlight what remains to be done in your paper or what has should be revised.

7. Converting Latex to HTML

Sometimes, you may want to convert your Latex paper to an HTML document. You may have a look at my previous blog post on this topic to see how to do it with HTLATEX.

Conclusion

In this blog post, I wanted to share a few useful Latex commands. If you think I have missed some other important commands (surely!), please share in the comment section below. I might then add them to the blog post.

—

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Latex, Research | Tagged latex, Research, research paper, writing | Leave a comment

Brief report about the IEA AIE 2021 conference

Posted on 2021-07-26 by Philippe Fournier-Viger

This week, it is the IEA AIE 2021 conference (34th Intern. Conf. on Industrial, Engineering & Other Applications of Applied Intelligent Systems), which is held from 26th to 28th June 2021. This year, the conference is held online due to the COVID pandemic situation around the world.

In this blog post, I will give an overview of the conference.

About IEA AIE 2021

The IEA AIE conference is a conference that focuses on artificial intelligence and its applications. I have attended this conference several times over the year. I have written some blog posts also about IEA AIE 2016, IEA AIE 2018, IEA AIE 2019 and IEA AIE 2020.

This year, there has been 145 papers submitted. From this, 87 papers were accepted as full papers, and 19 as short papers.

Special sessions

This year, there was eight special sessions organized at IEA AIE on some emerging topics. A special session is a special track for submitting papers, organized by some guest researchers. All accepted papers from special sessions are published in the same proceedings as regular papers.

Special Session on Data Stream Mining: Algorithms and Applications
(DSMAA2021)
Special Session on Intelligent Knowledge Engineering in Decision Making Systems
(IKEDS2021)
Special Session on Knowledge Graphs in Digitalization Era (KGDE2021)
Special Session on Spatiotemporal Big Data Analytics (SBDA2021)
Special Session on Big Data and Intelligence Fusion Analytics (BDIFA2021)
Special Session on AI in Healthcare (AIH2021)
Special Session on Intelligent Systems and e-Applications (iSeA2021)
Special Session on Collective Intelligence in Social Media (CISM2021).

Opening ceremony

On the first day, there was the opening ceremony. It was announced that IEA AIE 2022 will be held in Japan next year.

Keynote speakers

There was two keynote speakers: (1) Prof. Vincent Tseng from National Yang Ming Chiao Tung University, (2) Prof. Francisco Herrera from University of Granada.

Paper presentations

I have attended several paper presentations through the conference. There was some high quality papers on various topics related to artificial intelligence. There was four rooms with paper presentations. Here is a screenshot of one of the rooms:

In particular, this year, there was six papers on pattern mining topics such as high utility pattern mining, sequential pattern mining and periodic pattern mining:

Oualid Ouarem, Farid Nouioua, Philippe Fournier-Viger: Mining Episode Rules from Event Sequences Under Non-overlapping Frequency. 73-85
Comment: This paper presents a novel algorithm for episode rule mining called NONEPI. The idea is to find rules using the non-overlapping frequency in a sequence of events.
Sumalatha Saleti, Jaya Lakshmi Tangirala, Thirumalaisamy Ragunathan: Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds. 86-97
Comment: This paper presents a new algorithm DHUTISP-MMU for mining high utility time interval sequential patterns with multiple minimum utility thresholds. A key idea in this paper is to add information about the time intervals between items of sequential patterns. Besides, the algorithm is distributed.
Xiangyu Liu, Xinzheng Niu, Jieliang Kuang, Shenghan Yang, Pengpeng Liu: Fast Mining of Top-k Frequent Balanced Association Rules. 3-14
Comment: This paper presents an algorithm named TFBRM for mining the top-k balanced association rules. There has been a few algorithms for top-k association rule mining in the bast. But here a novelty is to combine support, kulczynski (kulc) and imbalance ratio (IR) as measures to find balanced rules.
Penugonda Ravikumar, Likhitha Palla, Rage Uday Kiran, Yutaka Watanobe, Koji Zettsu: Towards Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases. 28-40
Comment: This paper presents an Eclat-based algorithm for periodic pattern mining called PF-Eclat. From the presentation it seems to me that this algorithm is very similar to the PFPM algorithm (2016) that I proposed 5 years ago. The difference seems to be that the vertical representation is a list of timestamps instead of list of TIDs, and it has two less constraints. That is the user can only use maxPer and minSup(minAvg) as constraints but PFPM also offers two more constraints: minPer and maxAvg. By the way, there exists also another Eclat based algorithm for a similar task (mining top-k periodic frequent patterns) called MTKPP (2009).
Sai Chithra Bommisetty, Penugonda Ravikumar, Rage Uday Kiran, Minh-Son Dao, Koji Zettsu: Discovering Spatial High Utility Itemsets in High-Dimensional Spatiotemporal Databases. 53-65
Tzung-Pei Hong, Meng-Ping Ku, Hsiu-Wei Chiu, Wei-Ming Huang, Shu-Min Li, Jerry Chun-Wei Lin: A Single-Stage Tree-Structure-Based Approach to Determine Fuzzy Average-Utility Itemsets. 66-72
Comment: This paper is about fuzzy high utility itemset mining. A novel algorithm is presented. A difference also with previous paper is the use of the average utility function in fuzzy high utility itemset mining.

Next year

The IEA AIE 2022 conference will be held in Kitakyushu, Japan.

Conclusion

This was a good conference. I have attended several presentations and had a chance to discuss with some interesting researchers. Looking forward to the IEA AIE 2022 conference.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Conference, Machine Learning | Tagged ai, artificial intelligence, conference, iea aie, ieaaie, ieaaie2021, machinelearning | 7 Comments

Brief report about the DSIT 2021 conference (4th Intern. Conf. on Data Science and Information Technology)

Posted on 2021-07-23 by Philippe Fournier-Viger

This week, I am attending the DSIT 2021 conference (4th International Conference on Data Science and Information Technology) from July 23 to 25 in Shanghai, China.

The DSIT 2021 conference is co-located with the DMBD 2021 conference (the 4th International Conference on Data Mining and Big Data).

DSIT is a relatively young conference, which focuses on data science and data mining. But the quality was good and it was well organized. The proceedings of the conference are published by ACM. Thus, all papers are in the ACM Digital Library. This gives visibility to the papers.

A total of 150 submissions were received and 80 full papers were accepted for publication (acceptance rate = 53%). The papers were from several countries including China, Japan, Singapore, Vietnam, Philippines, Pakistan, Thailand, USA, Greece, France and Germany.

There was also several keynote speakers: Prof. Tok Wang Ling from National University of Singapore, Prof. Ma Maode from Nanyang Techn. University of Singapore, Prof. Shigeo Akashi from Tokyo University of Science, Japan and Prof. Philippe Fournier-Viger (myself) from Harbin Inst. of Technology (Shenzhen), China.

Due to the COVID pandemic and travel restrictions, the conference was held in Shanghai but some speakers were online through Zoom.

Day 1 – Registration

On the first day, I registered at the conference reception desk at hotel and receive a bag with the program, ID card, a small gift, and other things.

Day 2 – Keynote Talk

First, there was the opening ceremony.

Then, it was the keynote talks. I started first with my invited talk on algorithms for discovering patterns in data that are in interpretable (pattern mining).

Then, there was the talk by Prof. Jie Yang on adversarial attacks on deep neural networks. He has shown some recent work on generating adversarial pictures to fool neural networks. For instance a picture of a car may be slightly modified to fool a neural network into believing it is a house. What I find the most interesting about this talk is that it was shown that some modified pictures can fool not only one network but all the state of the art deep neural networks for image recognition. The reason why it is possible to fool multiple networks with a same modified picture is that an attack based on attention was used and that many deep neural networks will use attention in a similar way (focusing on the same image features). A dataset of adversarial images called DAmageNet was also presented, which can be helpful to test ways to protecting against such attacks. An interesting conclusion was that these attacks are possible because deep neural models tend to ignore some important features and incorporate unnecessary features.

Then, there was the other keynote talks.

Day 2 – Paper presentation

Then there was the regular paper presentations and a poster session.

There was two papers related to pattern mining. The first one was about high utility itemset mining and the other about frequent pattern mining.

High Utility pattern mining based on historical data table over data streams by Xinru Chen, Pengjun Zhai and Yu Fang
MaxRI: A method for discovering maximal rare itemsets by Sadeq Darrab et al.

I took some pictures of a few slides from that paper about maximal rare itemsets, as I find this to be an interesting topic:

Conclusion

This is all I will write for this conference. Overall, that was an interesting conference. It is not a very big conference but I met some other interesting researchers and we had some good discussions. Some papers were also quite good.

In a few days, I will be attending the IEA AIE 2021 conference and will report also about it.

—

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Big data, Conference, Data Mining | Tagged artificial intelligence, china, data, data mining, data science, dmbd, dsit, machine learning | 2 Comments

Brief report about the CCF-AI 2021 conference

Posted on 2021-07-21 by Philippe Fournier-Viger

This week, I attend the CCF-AI 2021 conference, which is the Chinese Computer Federation conference on Artificial Intelligence. This conference is held in the city of Yantai (烟台) in Shandong province of China, from the 22th to 24th July 2021.

About CCF-AI

CCF-AI is a national conference. But it is a major conference in China, with over 1,000 attendees. I attend this conference to meet other researchers and get to know about the recent results in this area. There are many high level speakers at the conference and activities.

In the past CCF-AI has been held in various locations around China. Here is a few of them:

Location

The city where CCF-AI is held this year is Yantai (烟台). It is a coastal city in eastern China, in Shandong province. It has good weather during the summer, beaches and many other activities.

The conference was held at the Yantai International Expo Center:

Registration

After arriving at the hotel, all attendees have to pass a test for the COVID to ensure the safety of everyone at the conference. Then, I registered and received my bag and badge with the program and other information.

Day 1 – Multi-Agent Systems forum

The conference is divided into some sub-forums. On the morning of the first day, I attended the multi-agent system forum. I also had some good discussions with other researchers.

Day 1 – Meeting of CCF-AI members

On the evening, I attended the meeting of CCF-AI members.

It was voted that CCF-AI 2023 will be held at Xinjiang University in Urumqi, China.

There was also a vote to select new members of CCF-AI. I am happy to have been selected:

It was said that for CCF–AI 2021, 339 papers were submitted and 128 papers were accepted (38% acceptance rate).

Other days and conclusion

There was also many other interesting activities and talks at this conference in the following days. However, my schedule was very tight. I came to CCF-AI, right after attending ICSI 2021, and I had to leave on the second day of CCF-AI to go to Shanghai to attend the DSIT 2021 conference in Shanghai, which I will talk about in the next blog post! Then, I will also attend the IEA AIE 2021 conference.

—

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Conference, Machine Learning | Tagged 2021, ai, artificial intelligence, ccf, ccfai, ccfai 2021, china, conference, machine learning, ml, yantai | 1 Comment

Brief report about ICSI 2021 (12th Int. Conference on Swarm Intelligence)

Posted on 2021-07-15 by Philippe Fournier-Viger

In this blog post, I will talk about attending the 12th International Conference on Swarm Intelligence (ICSI 2021). The ICSI conference is a relatively young conference about swarm intelligence, metaheuristics and related topics and applications. This year, ICSI 2021 is held in Qingdao, a coastal city in eastern China, from July 17–21, 2021. The conference is also held partially online for those that cannot attend due to travel restrictions.

The conference was held at the Blue Horizon Hotel:

The ICSI conference has been held in several cities and countries, over the years:

ICSI 2020 – Serbia (virtual)
ICSI 2019 – Chiang Mai, Vietnam
ICSI 2018 – Shanghai, China
ICSI 2017 – Fukuoka, Japan
ICSI 2016 – Bali, Indonesia
ICSI-CCI 2015 – Beijing, China
ICSI 2014 – Hefei, China
ICSI 2013 – Harbin, China
ICSI 2012 – Shenzhen, China
ICSI 2011 – Chongqing, China
ICSI 2010 – Beijing, China

Proceedings

The proceedings of the ICSI conference are published in the Springer Lecture Notes in Computer Science (LNCS) series as two volumes (Part 1 and Part 2). This ensures that the proceedings are indexed by EI and other indexes like DBLP.

This year, the conference received 177 submissions, which were reviewed on average by 2.5 reviewers. From this 104 papers were accepted for publications, which means an acceptance rate of 58.76%. The paper were organized into 16 sessions.

Day 1 – Registration

On the first day, I registered. I received a paper bag with a badge and the conference program. The proceedings was available online as a download.

Day 1 – Reception

There was also a reception at the hotel in the evening that lasted about an hour. There was food, beer and other drinks. This was a social activity, which is a good opportunity to discuss with other researchers that attend the conference.

Day – 2 – Opening ceremony
On the second day there was the opening ceremony, where the general chair talked about the conference, and the program.

The program committee chair also talked about the paper selection process.

Day – 2 – Keynote talks and invited talks

On the second day, there was two keynote talks and two invited talks. Some good researchers had been invited, and some of the talks were quite interesting. Below is a very brief overview.

The first keynote talk was by Prof. Qirong Tang from Tongji University who talked about “Large-Scale Heterogeneous Robotic Swarms”. He developed a swarm robotic platform that is used for some applications such as searching for multiple light sources, searching for a target, drug delivery in the body, etc. The idea is that some robots can cooperate together to perform a task more quickly (e.g. cooperative search) and thus outperform a single high quality robot. The swarm can be heterogeneous, that is using different types of robots such as flying robots and ground robots. Many bio-inspired algorithms are used to control a robot swarm such as particle swarm optimization (PSO) and genetic algorithms but it was argued that PSO is particularly suited for this task.

The second keynote talk was online by Prof. Chaomin Luo from USA about swarm intelligence applications to robotics and autonomous systems. This includes for example, exploration robots, search and rescue robots.

There was an invited talk by Prof. Gai-Ge Wang from Ocean university. He talked about how to improve the performance of metaheuristics using information feedback. The idea is that during iterations, some feedback of previous iterations is used to guide the search process towards better solutions.

The second invited talk was by Prof. Wenjian Luo from Harbin Institute of Technology (Shenzhen) about many-objectives optimization when multiple parties are involved. For example, to buy a car, many objectives may have to be considered such as the price, size, and fuel consumption and multiple parties such as an husband and wife may put different weights on those objectives. The goal is to find a solution that is optimal for all the parties involved but it is not always possible.

Day 2 – Paper presentations

On the afternoon, there was paper presentations and a poster session. There was some good papers about a variety of topics such as sheep optimization, classification of imbalanced data with PSO, citation analysis, swarm intelligence for UAVs, and multi-robot cooperation.

I have presented the below data mining paper about proof searching for proving theorems using simulated anneealing (which is mainly the work of my post-doc. M. S. Nawaz). In that paper, we use the simulated annealing metaheuristic to search for proofs to PVS theorems and compare with a genetic algorithm.

Nawaz, M. S., Sun, M., Fournier-Viger, P. (2021). Proof Searching in PVS theorem prover using Simulated Annealing. Proceedings of the 12th Intern. Conf. on Swarm Intelligence (ICSI 2021) Part II, pp. 253-262

There was also a good paper by Prof. Wei Song et al. about using fish swarm optimization for high utility itemset mining:

Song, W. Li, J. Huang, C.: Artificial Fish Swarm Algorithm for Mining High Utility Itemsets. ICSI (2) 2021: 407-419

Day 2 – Banquet

In the evening, there was a banquet. The best paper awards were announced.

ICSI 2022
It was announced that next year the ICSI 2022 conference will be held in Xian, China from July 15 to 19 2022.

Conclusion
Swarm intelligence is not my main research area although I have participated to several papers on this topic. But the conference was interesting and well organized. The quality was generally good. I would attend it again if I have some papers on this topic.

Now, I will leave Qingdao, and next I will attend the CCF-AI 2021 conference, DSIT 2021 conference, and then the IEA AIE 2021 conference.

—
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in artificial intelligence, Conference, Data Mining, Data science | Tagged ai, artificial intelligence, china, icsi, icsi conference, machine learning, metaheuristics, optimization, Qingdao, swarm intelligence | 2 Comments

SPMF 2.48

Posted on 2021-06-28 by Philippe Fournier-Viger

Hi all, I have not been very active on the blog during the last month. This is because I had many thinsg going on in my personal and professional life that I will not reveal here. But I will be back soon with more regular content for the blog. Today, I write a blog post to give you some news:

SPMF 2.48

First, I would like to say that a new version of SPMF data mining software has just been released (v. 2.48) with two new algorithms:
– NEclatClosed for mining closed itemsets
– HUIM-SPSO for mining high utility itemsets using Set-based Particle Swarm Optimization
Those are the original implementations, provided by the authors.

T com.

MLiSE 2021 – deadline extension

Third, I would like to mention that the deadline for submiting your papers to the MLiSE 2021 workshop at PKDD that I co-organize has been extended to the 15th July. The theme of the workshop is Machine Learning in Software Engineering but the scope can be more broad so if you have any question about the workshop, feel free to contact with me. I would be happy to see your paper 🙂

Conclusion

This blog post was just to give some quick update. Hope it has been interesting.

—
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 200 data mining algorithms.

Posted in Machine Learning, Pattern Mining, spmf, Website | Tagged mlise, pattern mining, pattern mining wiki, spmf, wiki | Leave a comment

An Introduction to Episode Mining

Key Papers about Episode Mining

What is Machine Learning?

Why Journal Special Issues are Popular?

Useful Latex tricks for Writing Research Papers

Brief report about the IEA AIE 2021 conference

Brief report about the DSIT 2021 conference (4th Intern. Conf. on Data Science and Information Technology)

Brief report about the CCF-AI 2021 conference

Brief report about ICSI 2021 (12th Int. Conference on Swarm Intelligence)

SPMF 2.48

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: