The Data Blog

Brief report about IEEE ICDM 2021

Posted on 2021-12-08 by Philippe Fournier-Viger

This blog post provides a brief report about the IEEE ICDM2021 conference (International Conference on Data Mining), which was held virtually from New Zealand from the 7th December to the 10th December 2021.

What is ICDM?

ICDM is one of the very top data mining conferences. ICDM 2021 is the 21st edition of the conference. The focus of this conference is on data mining and machine learning. I have attended ICDM a few times. For example, you can read my report about ICDM 2020.

Opening ceremony

The opening ceremony started by a performance from local people. Then, there was some greetings from the general chair Prof. Yun Sing Koh.

Then Prof. Xingdong Wu, the founder of the conference, talked about ICDM. Here are a few slides from this presentation:

It was said that this year, that there is over 500 participants.

This is the acceptance rate of ICDM over the years:

The program co-chairs then gave some more details about the ICDM 2021 review process. In particular, this year, there was 990 submission, and 98 were accepted as regular papers (9.9% acceptance rate), and 100 as short papers, for a global acceptance rate of 20%. All papers have been reviewed in a triple-blind way.

In terms of papers by country, the largest number of accepted papers came from China, and then from the USA.

The most popular topics were about deep learning, neural networks and classification.

Most of the program committee members are from USA and China

The workshop chairs then talked about the workshops. This year, there was 18 workshops on various topics. On overall, the acceptance rate for workshop papers was about 50%. All the workshop papers are published in formal proceedings by IEEE and indexed by EI and stored in the IEEE Digital Library. In particular, this year, I have co-organized the UDML 2021 workshop on utility-driven mining and learning. Here are more details about the workshops:

Each workshop had from 4 to 12 accepted papers.

Then, the virtual platform chair Heitor Murilo Gomes introduced the virtual platform used by ICDM2021.

Keynote talks

There was several keynote talks.

The first keynote talk was by Prof. Masashi Sugiyama from Japan about robust machine learning against various factors (weak supervision, noisy labels and beyond). Here are a few slides:

Paper presentation

There was many paper presentations. I will not report in details about them.

Conclusions

The ICDM 2021 conference was interesting. It is a leading conference in data mining. Looking forward to attending it again next year for ICDM 2022 in Orlando, Florida.

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Big data, Conference, Data Mining | Tagged big data, conference, data mining, data science, icdm, icdm 2021, ieee icdm, machine learning | 5 Comments

Brief report about UDML 2021 (4th International Workshop on Utility-Driven Mining and Learning

Posted on 2021-12-07 by Philippe Fournier-Viger

Today, it was the 4th International Workshop on Utility-Driven Mining and Learning (UDML 2021), held at the IEEE ICDM 2021 conference. I am a co-organizer of the workshop and will give a brief report about it.

What is UDML?

UDML is a workshop that has been held for the last four years. It was first held at the KDD 2018 conference, and then at the ICDM 20119, ICDM 2020 and ICDM 2021 conference.

The focus of the UDML workshop is how to integrate the concept of utility in data mining and machine learning. Utility is a broad concept that represents the importance or value of patterns or models. For instance, in the context of analyzing customer data, the utility may represent the profit made by sales of products, while in the context of a multi-agent system, utility may be a measure of how desirable a state is. For many machine learning or data mining problems, it is desirable to find patterns that have a high utility or models that optimize some utility functions. This is the core topic of this workshop. But the workshop is also open to other related topics. For example, most pattern mining papers can fit the the scope of the workshop as utility can take a more broad interpreation of finding interesting patterns.

The program

UDML 2021 was held online due to the COVID pandemic. We had a great program with an invited keynote talk and 8 accepted papers selected from about 14 submissions. All papers were reviewed by several reviewers. The papers are published in the IEEE ICDM Workshop proceedings, which ensures a good visibility. Besides, a special issue in the journal of Intelligent Data Analysis was announced for extended versions of the workshop papers.

Keynote talk by Prof. Tzung-Pei Hong

The first part of the workshop was the keynote by Prof. Tzung-Pei Hong from National University of Kaohsiung, who kindly accepted to give a talk. The talk was very interesting. Basically, Prof. Hong has shown that the problem of erasable itemset mining can be converted to the problem of frequent itemset mining and that the opposite is also possible. This implies that one can simply reuse the very efficient itemset mining algorithms with some small modifications to solve the problem of erasable itemset mining. The details about how to convert one problem to the other were presented. Besides, some experimental comparison was presented using the Apriori algorithm (for frequent itemset mining) and the META algorithm (for erasable itemset mining). It was found that META is faster for smaller or more sparse databases but that in other cases, Apriori was faster.
Here is a few slides from this presentation:

Paper presentations

Eight paper were presented:

Paper ID: DM368, Md. Tanvir Alam, Amit Roy, Chowdhury Farhan Ahmed, Md. Ashraful Islam, and Carson Leung, “Mining High Utility Subgraphs“

Paper ID: S10201, Cedric Kulbach and Steffen Thoma, “Personalized Neural Architecture Search” (best paper award )

Paper ID: S10213, Uday Kiran Rage, Koji Zettsu, “A Uniﬁed Framework to Discover Partial Periodic-Frequent Patterns in Row and Columnar Temporal Databases“

Paper ID: S10210, Wei Song, Caiyu Fang, and Wensheng Gan, “TopUMS: top-k utility mining in stream data“

Paper ID: S10211, Mourad Nouioua, Philippe Fournier-Viger, Jun-Feng Qu, Jerry Chun-Wei Lin, Wensheng Gan, and Wei Song, “CHUQI-Miner: Mining Correlated Quantitative High Utility Itemsets“

Paper ID: S10209, Chi-Jen Wu, Wei-Sheng Zeng, and Jan-Ming Ho, “Optimal Segmented Linear Regression for Financial Time Series Segmentation“

Paper ID: S10202, Jerry Chun-Wei Lin, Youcef Djenouri, Gautam Srivastava, and Jimmy Ming-Tai Wu, “Large-Scale Closed High-Utility Itemset Mining“

Paper ID: S10203, Yangming Chen, Philippe Fournier-Viger, Farid Nouioua, and Youxi Wu, “Sequence Prediction using Partially-Ordered Episode Rules“

These papers cover various topic in pattern mining such as high utility itemset mining, subgraph mining, high utility quantitative itemset mining and periodic pattern mining but also about some machine learning topics such as for linear regression and neural networks.

Best paper award of UDML 2021

This year, a best paper award was given. The paper was selected based on the review scores and a discussion among the organizers. The recipient of the award is the paper “Personalized Neural Architecture Search“, which presented an approach to search for a good neural network architecture that optimize some criteria (in other words, a form of utility).

Conclusion

That was the brief overview about UDML 2021. The workshop was quite successful. Thus, we plan to organize the UDML workshop again next year, likely at the ICDM conference, but we are also considering KDD as another possibility. There will also be another special issue for next year’s workshop.

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Conference, Data Mining, Pattern Mining, Utility Mining | Tagged data mining, pattern mining, udml, utility mining, workshop | 1 Comment

What is a good pattern mining algorithm?

Posted on 2021-12-05 by Philippe Fournier-Viger

Today, I will continue talking about pattern mining, and in particular about what is a good pattern mining algorithm. There are a lot of algorithms for discovering patterns in data that are not useful in real-life. To design a good pattern mining algorithm, I will argue that it should ideally have some of the following desirable properties (not all are required):

Can be used in many scenarios or applications: Some algorithms are designed for tasks that are not realistic or that makes assumption that do not hold in real life. It is important to design algorithms that can be used in real-life scenarios.
Is flexible. An algorithm should ideally provide optional features so that it can be used in different situations where requirements are different.
Has excellent performance: An algorithm should be efficient, especially if the goal is to analyze large datasets. In particular, it can be desirable to have algorithms that have linear scalability or can scale well to be able to handle big data. Efficiency can be measured in terms of runtime and memory.
Has few parameters: An algorithm that has too many parameters is generally hard to use. However, it is OK to have optional parameters to provide more features to users. Some algorithms also do not have any parameters. This is the case for example of skyline pattern mining algorithms and some compression-based pattern mining algorithms.
Is interactive: Some systems for pattern mining will provide interactive features such as giving to the user the ability to guide the search for patterns by providing feedback about the patterns that are discovered. Some systems will also let the user perform targeted queries about specific items rather than finding all possible patterns.
Visualization: Having visualization capabilities is also a useful feature to help browsing through numerous patterns.
Can deal with complex data: It is also desirable to design algorithms that can deal with complex data types such as sequences and graphs as real-life data is often complex.
Can discover statistically significant or correlated patterns: Many algorithms can find millions of patterns but many of the patterns are spurious. In other words, some patterns may be weakly correlated or just appear by chance. To find significant patterns, a key approach is to use statistical test or correlation measures.
Let the user select constraints to filter out patterns: For users, a key features is to be able to set constraints on the patterns to be found such as a length constraint, so as to reduce the number of patterns.
Summarize or compress the data: Another important feature of a good pattern mining algorithm is the ability to find patterns that summarize or compress the data. In fact, rather than finding millions of patterns, it can be useful to find patterns that are representative in the sense that they capture well the characteristics of the data.
Discover pattern types that are interesting: The type of patterns that is discovered by an algorithm must be useful, or event surprising. It should not be too complex or too simple.
Can find an approximate solution: Because exact algorithms for pattern mining are often slow or do not scale well, designing approximate algorithms that can give an approximate solution is also important.

This is the list of properties that I think are the most important for pattern mining algorithms. Hope it has been interesting. Leave a comment below, if you want to add something else. 🙂

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Data science, Pattern Mining | Tagged data mining, data science, pattern mining | Leave a comment

Delayed Conference Notifications

Posted on 2021-12-01 by Philippe Fournier-Viger

Today, I will talk about the delay for conference paper notifications. This topic idea came to me as I observed the reaction of some Twitter users to the AAAI 2022 paper notification. What happened? Briefly, I was looking at the AAAI 2022 website yesterday and found out that many tweets on the conference page were joking or complaining about the notifications being delayed. All these tweets appeared on the webpage because it automatically aggregates and display the tweets having the hashtag #AAAI2022.

Here is a screenshot of the webpage:

And here are a few tweets about the delay. Some people are complaining, while others are joking or just want to know what is happening, or share their emotions about having to wait.

Having said that, the notification is just late by a single day. This is not something uncommon in academia. So, personally, I would not complain about this. But I would like to talk a little bit about why a conference may be late in general. I think this is something interesting to talk about.

Why the notification of a conference may be late? The notifications of conferences are often late and there can be many reasons. Sometimes, it is by several days, and sometimes even by a week or more. It is understandable because some conferences have to deal with a large number of papers, and the reviewers may be late, or some other problems may occur. For example, a situation that happens quite often is that several reviewers are late or do not submit their reviews on time. In that case, the organizers have to quickly find some alternative reviewers to make sure that all papers have some mimimum number of reviews. In the case of a big conference, there are thousands of papers so this can be very difficul and time-consuming to manage. Other reasons are that organizing committees involve researchers from all over the world and it is sometimes hard for committee members to have discussion on the papers to be accepted due to the time difference. And also, all the organizers of conferences generally have a job, family, and some unexpected events may also happen.

What a conference organizer can do? If the delay is going to be short, often the organizers will not update the website. But if the delay is going to be long, it is a good idea to announce that there is a delay on the website because some researchers are eagerly waiting for the notification. This is generally the case of young researchers who need a paper to graduate. I remember that for my first papers, I was very anxious about receiving the decision, and I was reloading the website many times to see if the decision would come out.

What researchers who submitted papers can do? In general, there is not much to do beside waiting. Complaining will not change anything to that. So personally, I do not recommend to complain about this publicly. The best is just to do something else and try to stop thinking about this. Eventually, the decision will come…

Conclusion

This is just a short blog post to talk about why conference notifications are sometimes delayed. The idea for writing about this came to me from the tweets that I noticed on the conference webpage, yesterday. Hope this has been interesting

Posted in Academia, artificial intelligence, Conference, Research | Tagged conference, delay, notification, paper | Leave a comment

Some common problems of pattern mining papers

Posted on 2021-11-29 by Philippe Fournier-Viger

Today, I will talk about pattern mining, a subfield of data mining, that aims at discovering interesting patterns in data. In particular, I will talk about why several papers from that field do not have much impact. This is generally because of some of the following limitations:

Irrealistic problem definition: Several papers on pattern mining defines a problem that is irrealistic and has no real application or few applications. For example, I have seen many papers about proposing algorithms for mining some types of patterns, but never saw any papers that have applied these algorithms to do anything in real life.
A problem definition that is too narrow or an algorithm that is not flexible. Another issue is that many papers propose algorithms for problem definitions that are too specialized or too simple to be used in real-life. For example, there are a lot of papers that talk about analyzing customer shopping data but fail to consider many important data such as the customer profile, the categories of items, the cost and profit, and the time of purchase. Without considering such data, many pattern mining model are not useful.
Incremental contributions. Too many papers in pattern mining provide simple ideas that are just a simple combination of existing ideas. It is important to come up with original research ideas.
Focusing too much on performance. A lot of papers on pattern mining focus on performance. While performance is important and it is exciting to design fast algorithm, it is not always the most important for the user. A user may rather want to have more flexibility and to have the ability to set constraints on patterns to be found.
Poor or limited experiments. Another problem of many pattern mining papers is that poor or limited experiments are carried out to evaluate the algorithms. The experiments often focus on performance but do not show that interesting patterns are discovered or that the patterns are useful in real-life. This is important to show that the algorithms are useful for something.
Poor presentation. Another issue is a poor presentation. Even if an idea is very good, if it is poorly presented in a paper, it will have a limited impact.
Incorrect and imcomplete algorithms. Several pattern mining algorithms are claimed to be correct and complete but are not. Generally this is because no proofs are made in the papers and the authors forget some important cases when designing the algorithms. This is something to be careful about.
Approach that is not reasonable or well-justified. Another problem in several pattern mining papers is that there is not enough justifications about the design of the algorithm to convince the reader that the approach is reasonable and innovative.

That was just a short blog post! Hope it has been interesting. Leave a comment below, if you want to add something else or give your opinion.

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Pattern Mining, Research | Tagged data, data mining, data science, pattern mining | Leave a comment

Three videos about Recent Studies on Pattern Mining

Posted on 2021-11-11 by Philippe Fournier-Viger

In this blog post, I will share three videos about pattern mining describing three research papers from from my team. These videos are not very long, as they are designed to just give a brief overview about the papers. Hope it will be interesting.

Chen, Y., Fournier-Viger, P., Nouioua, F., Wu, Y. (2021). Mining Partially-Ordered Episode Rules with the Head Support. Proc. 23rd Intern. Conf. on Data Warehousing and Knowledge Discovery (DAWAK 2021), Springer, LNCS, 7 pages [ppt] [***Watch the video***]

Nouioua, M., Fournier-Viger, P., Qu, J.-F., Lin, J.-C., Gan, W., Song, W. (2021). CHUQI-Miner: Correlated High Utility Quantitative Itemset Mining. Proc. 4th International Workshop on Utility-Driven Mining (UDML 2021), in conjunction with the ICDM 2021 conference, IEEE ICDM workshop proceedings, to appear. [ppt] [**** Watch the video ****]

Chen, Y., Fournier-Viger, P., Nouioua, F., Wu, Y.. (2021). Sequence Prediction using Partially-Ordered Episode Rules. Proc. 4th International Workshop on Utility-Driven Mining (UDML 2021), in conjunction with the ICDM 2021 conference, IEEE ICDM workshop proceedings, to appear. [ppt] [**** Watch the video ****]

That is all for today! I just wanted to share these videos 🙂

—
Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Pattern Mining, Video | Tagged episode mining, pattern mining, video | Leave a comment

Brief report about the PRICAI 2021 conference

Posted on 2021-11-09 by Philippe Fournier-Viger

In this blog post, I will write briefly about the 18th Pacific Rim International Conf on Artificial Intelligence (PRICAI 2021) conference, which was held from November 8–12, 2021, in Hanoi, Vietnam, and online.

What is PRICAI ?

PRICAI is an artificial intelligence (AI) conference that is well-established and is always held in Asia since over 30 years. PRICAI was previously held in various locations such as:

Nagoya (1990), Seoul (1992), Beijing (1994), Cairns (1996), Singapore (1998), Melbourne (2000), Tokyo (2002), Auckland (2004), Guilin (2006), Hanoi (2008), Daegu (2010), Kuching (2012), Gold Coast (2014), Phuket (2016), Nanjing (2018), Fiji (2019), and Yokohama (2020, online).

The proceedings of the conference are published as three proceedings books in the Springer Lecture Notes in Artificial Intelligence (LNAI) series, which ensures good visibility.

This year, there was 382 submissions, from which 92 regular papers were accepted (24.8% acceptance rate), and 28 short papers. Thus the combined acceptance rate for all papers is 31.41%.

Keynote talk by Prof. Virginia Dignum about responsable AI

There was an interesting keynote talk by Prof. Virginia Dignum about responsible artificial intelligence, what is it, and how it is relevant for today’s society.

She first said that AI is still not intelligent as smart as we would like it to be, and that three important components of AI are : adaptability, autonomy and interactions.

She then said that responsible AI is about understanding that AI systems are not alone. There are users, institutions and other systems around these AI systems, and we need to consider them.

There are several dilemnas. For example, should we just focus on optimizing the accuracy of AI systems or should we consider other criterias such as biases and energy consumption. We can view this as different objectives and we cannot optimize all of them. Thus, what should be prioritized?

There are various issues about AI that should be considered such as transparency, accountability and responsability:

The speaker also talked about other things but I will not report everything. If you are interested by this topic, she has published some papers on that subject.

Paper presentation

There was numerous paper presentations on various artificial intelligence topics. In particular, I participated to a paper that was presented to the conference, about high utility itemset mining:

Song, W., Zheng, C., Fournier-Viger, P. (2021). Mining Skyline Frequent-Utility Itemsets with Utility Filtering. Proc. 18th Pacific Rim Intern. Conf on Artificial Intelligence (PRICAI-2021), Springer LNCS, to appear.

Best papers

The best paper awards went to these two papers:

Best Paper: Ning Dong and Einoshin Suzuki, GIAD: Generative Inpainting-Based Anomaly Detection via Self-Supervised Learning for Human Monitoring
Best Student Paper: Bojian Wei, Jian Li, Yong Liu, and Weiping WangFederated Learning for Non-IID Data: From Theory to Algorithm

Conclusion

This was just a brief report about the PRICAI conference. Since my schedule is quite busy this week, I did not attend all the talks. But on overall, it was an interesting conference. It is a medium size conference that had some good talks.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in artificial intelligence, Conference | Tagged ai, artificial intelligence, conference, machine learning, pricai 2021, pricai conference | Leave a comment

2,000,000 visitors on this blog!

Posted on 2021-11-03 by Philippe Fournier-Viger

This blog has been created about 8 years ago (in 2013) for the purpose of talking about research, academia, and data mining.

Today, I just saw that the counter of visitors to this blog has passed 2,000,000. This is of course just a number, but still it is nice to see this. Thus, I would like to say thank you to all the readers of this blog.

Thanks for supporting the blog!

—-

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in General | Tagged blog, website | Leave a comment

Paying to be a keynote speaker?

Posted on 2021-11-01 by Philippe Fournier-Viger

Recently, I received an e-mail from the organizer of an event in the UK called “International Conference on Artificial Intelligence and Machine Learning ” from a website called UnitedResearchForum, where the organizers asked me to participate as a keynote speaker:

From that e-mail, I thought that it was some SPAM as I have never heard about this event before. But just to make sure, I sent an e-mail to ask how much they would pay me to give the keynote speech:

Then, I got their answer, which is quite amazing. They tell me that to be a keynote speaker at their event, I would need to pay 249$ USD:

This is quite ridiculous. In general, a keynote speaker must be paid by the conference. Not the other way around. A keynote speaker is supposed to be a special guest, and generally a conference will pay hotel and airplane plus some salary to their keynote speaker.

This conference should not call their speakers “keynote speakers” if they ask them to pay by themselves for the registration fee. This seems to just be a tactic to get attention.

Thus, I clicked the button in my e-mail inbox to report the e-mail as SPAM.

Conclusion

This is just another example of academic SPAM. I do not recommend publishing there.

—-

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia | Leave a comment

The “Top 2% most influential scientists of 2020” according to Stanford

Posted on 2021-10-28 by Philippe Fournier-Viger

In recent days, I see many posts on Linkedin about researchers that mention that they are in the list of the top 2% most influential scientists of 2020 according to Stanford. Here is an example:

I understand why people post about this. The reason is that they are happy to be in the ranking and to see that their research is impactful.

To see more about how this ranking works, I downloaded the data from : https://elsevier.digitalcommonsdata.com/datasets/btchxktzyw/3

I see that my name is also in that ranking, somewhere around rank #48,000. But this is not very important for me. Let’s talk instead about the data.

The data is provided as Excel files and there is also some Python code that was used to generate the ranking.

In the Excel files, it is interesting to see that each author is described by numerous metrics such as the number of citations as first author, the number of citation as corresponding author, the number of self-citations, the percentage of self-citations, etc.

Of course, all these metrics are not perfect. For example, in some fields it is easier to be cited than in some other fields and some researchers will try to manipulate these metrics for example by asking other researchers to cite their papers rather than deserving the citations. Thus, a higher rank does not necessarily mean that someone is a better researcher in real-life. But nonetheless, it is still interesting to look at this data.

By looking at the data for year 2020, I make a few observations that I find interesting.

After sorting the authors by the year of their first paper, I find that some authors have published papers for more than 180 years. For example, below, Marshall, William S. had his first paper in 1834 and his last paper in 2020. I guess that it must be some error in the data and that two persons must have the same name.

2. The attribute that I perhaps find the most interesting is “self %” which indicates the percentage of self-citations. If I sort from largest to smallest, I can see that some persons have from 90% to 100% self-citations in year 2020, which appears to be very high. Some of these persons also have a quite high “rank”. There can be some reasons for these high percentages… It perhaps that these authors work in some smaller research communities or on very specialized research topics.

If I look more closely at the data (year 2020), I find that:
– about 0.6% of the researchers have more than 50% self-citations,
– about 4.2% have more than 30% citations
– about 13.7% have more than 20% self-citations
– about 48% have more than 10% self-citations

If I look at the top 30 persons in the ranking, the self-citations percentage is all below 15% and sometimes even below 1 %:

For me this makes sense. I think a young researcher will typically have more self-citations while a very famous researcher should have less self-citations and more from other people.

That is the most interesting thing that I have found so far in this data.

I did not do a very deep analysis of this data, as I am quite busy. But I just wanted to share a few observations. It would be interested to go beyond that and look for example at the data by countries or to draw some charts to see the distribution of values for different attributes, and to measure the correlation between attributes.. If you find something interesting in this data, you may share it in the comment section below!

—
Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged Research, stanford, to 2% most influential scientists | Leave a comment

Brief report about IEEE ICDM 2021

Brief report about UDML 2021 (4th International Workshop on Utility-Driven Mining and Learning

What is a good pattern mining algorithm?

Delayed Conference Notifications

Some common problems of pattern mining papers

Three videos about Recent Studies on Pattern Mining

Brief report about the PRICAI 2021 conference

2,000,000 visitors on this blog!

Paying to be a keynote speaker?

The “Top 2% most influential scientists of 2020” according to Stanford

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: