Analyzing COVID-19 tweets to understand the public opinion

In this blog post, I will talk briefly about how tweets collected on Twitter can be analyzed to understand the public opinion about COVID-19. This is based on the below research paper, that I have recently participated to:


Noor, S., Guo, Y., Shah, S. H. H., Fournier-Viger, P., Nawaz, M. S. (2020). Analysis of Public Reaction to the Novel Coronavirus (COVID-19) Outbreak on Twitter. Kybernetes, Emerald Publishing, to appear.

I will give an overview of the above paper. For more details, you can click on the above link to see the whole research paper.

Why analyzing Tweets? There has been a lot of research about analyzing tweets in the past such as to detect the sentiment and feelings of people on different topics, or even to detect fake news and bots among other things. The interest of analyzing Twitter data is that Twitter is used by millions of people and that tweets are posted in real-time. Thus, tweets can be used to analyze what people are saying about a topic such as the coronavirus.

How can we understand public opinion about COVID-19 on Twitter? In the above research paper, we applied the following methodology. We have first collected thousands of tweets in English about COVID-19 during the first months of the pandemic. Then we applied some clustering algorithms to discover the main themes that were talked about on Twitter related to COVID-19. Moreover, we applied sequential pattern mining algorithms to find frequent words patterns in Tweets.

What have we discovered? We have found several interesting things. For the cluster analysis, we found seven main clusters of tweets that indicate some main themes discussed by Twitter users:

  • Cluster 1 (green): public sentiments about COVID-19 in the USA.
  • Cluster 2 (red): public sentiments about COVID-19 in Italy and Iran and a
  • vaccine,
  • Cluster 3 (purple): public sentiments about doomsday and science credibility.
  • Cluster 4 (blue): public sentiments about COVID-19 in India.
  • Cluster 5 (yellow): public sentiments about COVID-19’s emergence.
  • Cluster 6 (light blue): public sentiments about COVID-19 in the Philippines.
  • Cluster 7 (orange): Public sentiments about COVID-19 US Intelligence Report.

For example, this is the cluster 1:

And this is the cluster 2:

Cluster 3:

Some part of cluster 4:

Some part of cluster 5:

Some part of cluster 6:

We also found several patterns related for example to “Coronavirus, testing, lockdown”. Here is for example, some of the most frequent words:

More results are presented in the paper.

The above results represent what the sampled tweets have been talking about on Twitter in English from January to March 2020, related to COVID-19.

Conclusion

In this blog post, I have just given a very brief overview of what can be learnt from Tweets related to public opinion. For more details, please check the above paper! There is also obviously some limitations to that study such that Tweets were not geolocalized and that only the English language was used. If you have any comments you may post in the comment section below. Hope this has been interesting.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Data Mining, Data science | Tagged , , , , | Leave a comment

The Controversy around Extreme Learning Machines (ELM) and related models

Today, I will talk about an interesting topic in academia which is the controversy around ELM (Extreme Learning Machine) and its origins. This has been a hot topic of discussion in the field of machine learning for more than a decade, when some researchers started to question the high similarity of ELM to other models published before such as RBF (Radial Basis Function). There has also been recently some researchers arguing about the similarities between ELM and RVFL (Random Vector Functional Link) and other models.

In this blog post, I will give an overview of this controversy and impact but I will not take any sides. I will just look at it from an outsider’s persective. You can read the arguments from both sides and make your opinion and draw your own conclusions.

Some arguments against ELM

ELM was proposed in 2004. The controversy around the origins of ELM started around 2008 with a letter in IEEE transactions that claimed that it is unecessary to give a new name to a model that existed already with perhaps minor modifications:

  • L. P. Wang and C. R. Wan, “Comments on “The Extreme Learning Machine,” in IEEE L. P. Wang and C. R. Wan, “Comments on ‘The extreme learning machine’,” IEEE Trans. Neural Networks, Vol. 19, No. 8, 1494-1495, 2008.

Other researchers have raised this issue. And to understand this perspective, there is an anonymous website that provides a good summary of the issues raised by some researchers against ELM. It is called : ELM Origin (webs.com)

A problem with this website though is that it is anonymous, which means that we cannot be sure who wrote it. However, the website provides annotated ELM papers and claim that several ELM models are similar to papers published many years before. For example, it is said that ELM-Kernel is similar to LS-SVM with zero bias and kernel ridge regression.

I did not read the information in details asthis is outside my main research field so I am personally not sure whether all the claims are reasonable or not.

Some arguments for ELM

There has been researchers that have responsed to these claims to support that there are indeed differences between ELM and previous work. For example:

  • G.-B. Huang, “Reply to comments on ‘the extreme learning machine’,” IEEE Trans. Neural Networks, vol. 19, no. 8, pp. 1495-1496, Aug. 2008.
  • G.-B. Huang, “What are Extreme Learning Machines? Filling the Gap between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle,” Cognitive Computation, vol. 7, 2015.

However, some researchers argue that these differences are tiny. It was also argued in the defense of ELM that researchers may have simply missed some related work and thus not been aware of the prior work. This might be true… as it has happened in the past that some discoveries were made independently by several researchers.

Yann LeCun’s opinion

One of the fathers of deep learning has also given his opinion on this topic in a Facebook post:

He was clearly not impressed by ELM. However, this is just a Facebook post and it seems that LeCun perhaps did not read all the papers about ELM to have a clear idea about the topic (perhaps?).

Who is right?

As I said previously, I will not take position as this is not my main area. You may make your own mind or write your opinion in the comment section below if you have one.

What is the impact of this controversy?

This controversy has resulted in a kind of war between some researchers working in that area. I have observed that there are researchers against ELM and some that are for ELM that have been quite aggressive towards each other, and there are also many researchers that do not want to take sides but are caught between the two sides.

As I work as associate editor for various journals I have noticed for example, at some point that a reviewer wanted to directly reject a paper just for using the name of ELM. I also noticed some researchers that tried to push their citations against ELM or for ELM. In other cases, I have also seen some reviewer arguing that authors should change their paper because it had shown that ELM was better than some other models and the reviewer could not accept that conclusion, even arguing that this must have been due to experimental errors.

I personally dont really know what to think about this. But as an outsider, it seems to me that today, there is still a kind of war on this topic involving various people, and I think it is a pity for the people who are caught in the middle of that war but do not want to take side.

Conclusion

This is a short blog post to talk about the controversy around ELM. I just report about this topic, as I think it is interesting. As said above, you can read about it and make your own opinion. But personally, I think it is better to not take any side to avoid conflicts.

Posted in Machine Learning | Tagged , , | Leave a comment

Discovering Alarm Correlation Rules for Network Fault Management (video)

In this blog post, I will share the video of our new paper about analyzing alarms in telecomunication networks presented at the AIOPS 2020 workshop. This work is part of an industrial collaboration project. The motivation for this project is that there are typically thousands of alarms in a telecomunication network, and not all of them are important. To allows network operators to focus on fixing issues that are the most important, we propose a method to discover correlations between alarms.

For this purpose, we view a telecommunication network as an attributed graph where nodes represent devices, edges indicates connections between devices, and attributes of vertices represent alarms. Then, we apply a novel algorithm to find rules of the form A–>B indicating that if alarm A appears, Alarm B is likely to occur. Then, using these rules, we can reduce the number of alarms presented to network maintenance workers. Though, the approach is designed for analyzing alarms it could be applied to other data modelled as graphs.

Here is the link to watch the paper presentation:
http://philippe-fournier-viger.com/AIOPS.mp4

And here is the reference to the paper:


Fournier-Viger, P., Ganghuan, H., Zhou, M., Nouioua1, M., Liu, J. (2020). Discovering Alarm Correlation Rules for Network Fault Management. Proc. of the International Workshop on Artificial Intelligence for IT Operations (AIOPS), in conjunctions with the 18th International Conference on Service-Oriented Computing (ICSOC2020) conference,

That is all I wanted to write for today!

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Data science, Video | Tagged , , , , | 5 Comments

Merry Christmas and Happy New Year!

Today, I would like to wish all readers of this blog and users of my SPMF data mining software a merry Christmas and a happy new year!

merry christmas happy new year spmf

This year has been a special year due to the worldwide pandemic with several challenges and changes in our habits. But this year will soon be behind us. And I wish you all health, hapiness and success for 2021.

I would like at the same time to thank all the users of SPMF and readers of this blog for supporting those projects. For the SPMF software, a new version will be released very soon with several new algorithms! I am working on it these days! Keep you updated soon…


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in General | Tagged , | Leave a comment

Conference Badges: the Best and the Worst

Today, I talk about my collection of conference badges that I have collected since I was a PhD student till today. I have attended over 50 events and have kept all of the conference badges except maybe one or two. Here is a picture of all these conference badges:

conference badges

Totally, I have visited 28 countries and/or special territories but not all of them for attending conferences. Sometimes, it was only for a research visit or vacation. Below I will talk about what is a great conference badge and take a look at some of them to compare the different designs.

Generally, a good badge should have the following characteristics: (1) it is big enough, (2) the name is written in big letters, (3) it does not contain irrelevant information (e.g. it is unecessary to write the conference dates and hotel), (4) it is also beautiful, and (5) it cannot flip or otherwise it is printed on both sides.

The simple black and white badges

The badge below for DEXA 2018 is the most simple one. Printed on a piece of standard paper with a black and white printer, it only indicates the conference name, attendee name and country. Simple and effective. But could be more beautiful.

dexa conference badge

This is another simple black and white badge, for KDD 2018:

kdd 2018 conference badge

The simple badges with color

The badge below is still quite simple but has a bit more color which makes it more enjoyable than the black and white badges.

canadian ai conference badge

The one below is simple from IEA AIE 2018, colorful and effective as the key information is easy to read and big enough:

iea aie 2018 conference badge

The one below from PAKDD 2014 is also quite good as the name is really big and the design is nice and colorful. However, there is a lot of empty space at the bottom. The bottom third of the badge could be cut entirely.

pakdd 2014 conference badge

The one below from PAKDD 2017 is a bit better in my opinion as it is more beautiful. But the font for the name is a bit hard to read. Generally, it is better to put the first name bigger and to put the first name and the last name on different lines to avoid squeezing all letters on a single line like below.

pakdd 2017 conference badge

I like badges like the one below from IDA 2014 that are simple, colorful and just contain the key information (name, affiliation and conference acronym), and are also beautiful. That one uses a color picture which is nice.

ida 2014 conference badge

Badges with text that is too small

Some badge like the one below from ADMA 2018 are very big but do not use the space very well. The name of the attendee is actually very small. More than 50% of the space is basically empty.

adma 2018 conference badge

Badge with too many information

The badge below from PAKDD 2018 is beautiful but really contains too much information. It is not necessary for attendees of the conference to know the full conference name, dates, name of the hotel (!), and country. If we are attending the conference, we already know at which hotel we are and what is the date.

pakdd 2018 conference badge

Badges where you write your name by yourself

For some conferences, I had to write my name by myself. This is not a very good idea… Look at the messy result below when the ink does not dry well at ADMA 2013!

adma 2013 conference badge

Badges with a fancy design

The badge below is one of my favorite as it is made of plastic and has a very beautiful design representing the architecture of a famous tower in the city (Liaocheng). It could have been improved by adding the names of attendees.

Badges with a special material

Another badge that is quite special is the one below for the BDA 2019 conference as it has been etched in a piece a wood. That is the most unique material for a conference badge that I have seen, and for this it is really nice. However, I think that some information could be removed like the full conference name and dates. Just writing BDA 2019 would be enough and would make it easier to read.

bda conference badge

Badges with photo

Badges for some events also havea photo. Below is an example. Having a photo is nice and probably also a security measure to ensure that the badge is not stolen and used by someone else.

Another badge with photo is below. This one is really nice but a problem is that the name is really small.

The badges with no names

A few conferences have given badge with no names like below. Although I have enjoyed these conferences, I have to say that having a name on the badge would have been much better. It is important to help starting conversations with other attendees!

adma conference badge
icgec 2018 conference badge

Badge with text that is too small and too many colors

And the following badge is one of the worst (in my opinion). The problem with this badge is that it is really small (smaller than a credit card) and that the text is really hard to read because of the colors. At that time I was a graduate student and I had printed these badges and helped to do the design so I am partly responsible for that! What happened is that we first bought paper for badges that were too small and did not know how it would look like when printed in color. Also, I had no experience in designing badges and we were in a rush, so we did not had time to print them again. Today, I would not do like that 😉

its 2008 conference badge

But I also did the design of that badge at the same time and it looked a bit better:

educational data mining 2008 conference

Conclusion

In this blog post, I have talked about how a good conference badge should be designed and have shown some of the best and worst badges from my collection. 😉

Do you also keep all your conference badges? Which badge do you like the most or think is the worst? You may tell me in the comment section below.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Conference, General, Research | Tagged , , , | Leave a comment

Real Conferences VS Virtual Conferences

Year 2020 is soon ending, and it has been a quite special year due to the coronavirus pandemic around the world. This has forced many researchers to work from home, and to cancel or change their research travel plans. Moreover, may academic conferences in 2020 have been held online as virtual conferences as a safety measure and due to travel restrictions in several countries. In this blog post, I will talk about this new trend of holding virtual conferences and the advantages and benefits compared to “real” conferences (held in a physical location).

Since the begining of the year, I have attended several virtual conferences such as PAKDD 2020, ICDM 2020, IEA AIE 2020, and the AIOPS 2020 and UDML 2020 workshops, as well as the DAWAK 2020 conferences. Generally, these events have been well-organized. While some conferences took great care of scheduling talks of researchers based on their time zones, some other events had some small time management problems. For example, a session chair thought that a session was starting earlier due to a wrong time conversion, and the wrong time zone was indicated in the program of another conference, which led to some confusion. But on overall, it worked as planned.

Benefits of virtual conferences

Listening to a conference online has some benefits. One of them is that it is not necessary to travel very far to give a talk. Rather than flying to a location, one can just connect to a server, which is not time-consuming. Online conferences also provides flexibility as one can listen to talks while doing some other things at home, or from various locations. Moreover, a few conferences have provided a playback option to watch the videos of previous presentations in case we missed them. Another benefit of online conferences is that the registration fees have been often reduced, and that in some cases, attending the conferences became free. This may have helped some students or researchers to attend some conferences that they would otherwise have not attended.

Drawbacks of virtual conferences

There are also some drawbacks to online conferences. The first one is that the schedule is not suitable for everyone. For example, one may have to present a paper in the middle of the night due to the time difference. This was generally not a problem in my case, but I know some other researchers that had problems with this.

A second drawback is that the ability to socialize with other researchers is greatly reduced in online conferences. In a real conference, we can shake hands and talk with many people that we know or don’t know, especially during the coffee breaks and other social activities. This is important to establish contact with other researchers. However, in virtual conferences, there is not much opportunities for that… Some conferences like ICDM have adopted some online systems such as Gather.Town where we could walk using an avatar in a virtual room to talk with other people using a webcam and microphone but I found that the room was essentially empty every time I checked or with only a few inactive people. Thus, although that concept was nice, in practice, I was not able to talk with anyone using it.

Another issue with virtual conferences is that it is easy to not feel motivated to listen to the talks since they are all online and the schedule is often conflicting with real-life activities. Some talks may be in the middle of the night, or during work hours or lunch. Thus, I personally did not listen to many talks, while at a real conferences, I would attend most of the sessions.

Another thing that I don’t like so much about virtual conferences is that we often do not see the audience when we give a talk (unless they open their webcams). In this case, we are in front of the computer talking with our microphone but we have little feedback during the presentation. And in many cases, the talks are required to be pre-recorded, which do not make them interactive at all.

Interview ,microphone, Speech,woman

Attending real conferences again

Recently, I attended some real conferences again. This is because the pandemic is under control in the country where I live (China). The second week of December 2020 was the first time that I attended a real conference this year. And it was really enjoyable feeling to be able to meet again researchers and talk with them face to face. I met some very nice people and those were some great events. In general, the life where I am has gone back to normal already since several months, which I am very happy about. However, I am looking forward to the day where I can also attend international conferences abroad as I used to do many times per year, in the past. I think next year, real conferences will start to happen again… or perhaps some hybrid conferences that will be partly online and partly offline (e.g. IEA AIE 2020).

Conclusion

In this blog post, I talked about the experience of attending real and virtual conferences, and especially the benefits and drawbacks of virtual conferences. I hope that it has been interesting. If you want to share your thoughts and experience about that, please leave a comment below! I will be happy to read you.

Posted in Academia, Conference | Tagged , , , , | 1 Comment

How to prepare your thesis defense?

Today, I will talk about an important topic for graduate students, which is how to prepare for your thesis defense. I will explain what should be done to prepare yourself well., and also talk about my experience as student and currently as professor and judge for thesis defenses.

Brown and Black Wooden Chairs Inside Room

Before the thesis defense

  • If you have a chance, attend some thesis defenses by other graduate students to get familiar with the process.
  • Ask about how the thesis defense are done at your institution and who will be the judges. Especially, you need to know about the amount of time that you will have to give your presentation.
  • Start to prepare early and talk to your thesis supervisor about your preparation. Your supervisor may give you some good advices, especially with respect to how defenses are conducted at your school.
  • Spend a good amount of time to prepare your presentation. Preferably, prepare your slides a week earlier and show them to your supervisor and friends for comments. You may read my advices about how to give a good talk. In particular, avoid putting too many slides and too many details., and make sure there are no errors or typos.
  • Rehearse your presentation several times to make sure you are comfortable giving it, and that you can present whitin the time limit. You may ask some friends to listen to your presentation.
  • Eat well and have a good sleep before the talk. This can make a big difference. For example, in the past, I was judge for a thesis defense where a student felt down and almost loose consciousness due to the high stress, fatigue and not eating breakfast. To be able to sleep well and be at your best, you need to finish your preparation at least one day before the defense.
  • Prepare a list of questions that you think judges may ask you and a list of corresponding answers. This will help you to better answer questions.

If you prepare yourself well, you will not be stressful and you will perform better.

During the defense

  • Wear some suitable clothing. Be polite.
  • Don’t talk too fast. A common mistake is that some students will try to talk very fast to say more things. But this is not necessary. Instead, summarize and talk about what is important at normal speed.
  • Look at your audience. Another common mistake is to look at your screen instead of looking at your audience. A presentation is much more interesting when the presenter look at attendees.
  • Keep track of the time. This is one of the most important thing. You need to make sure that you will not exceed the time limit. Thus, keep an eye on the clock, cellphone or your watch to know how much time is left.
  • Listen carefully to the questions from judges before answering. If you did not understand, ask to clarify the questions or repeat the question in your own words, before answering. This is important because If you did not understand a question, you may give an unrelated answer.
  • When answering a question, remember that the judge may not be an expert on your topic. Thus, try to give an answer that is easy to understand if you think the judge may not be familiar with your research area.

Conclusion

In this blog post, I gave some advices about how to prepare for your thesis defense. Hope it will be useful. If you think I missed something or would like to talk about your experience, please leave a comment below!

Wish you a successful thesis defense!


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged , , , | Leave a comment

The importance of explainable data science and machine learning models

In this blog post, I will talk about an important concept, which is often overlooked in data mining and machine learning: explainability.

To discuss this topic, it is necessary to first remember what is the goal of data mining and machine learning. The goal of data mining is to extract models, knowledge, or patterns from data that can help to understand the data and make predictions. There are various types of data mining techniques such as clustering, pattern mining, classification, and outlier detection. The goal of machine learning is more general. It is to build software that can automatically learn to do some tasks. For example, a program can be trained to recognize handwritten characters, play chess, or to explore a virtual world. Generally, data mining can be viewed as a field of research that is overlapping with machine learning and statistics.

Machine learning and data mining techniques can be unsupervised (do not require labelled data to learn models or extract patterns from data) or supervised (labelled data is needed).

In general, the outcome of data mining or machine learning can be evaluated to determine if something useful is obtained by applying these techniques. For example, a handwritten character recognition model may be evaluated in terms of its accuracy (number of characters correctly identified divided by the number of characters to be recognized) or using other measures. By using evaluation measures, a model can be fine-tuned or several models can be compared to choose the best one.

In data mining and machine learning, several techniques work as black-boxes. A black box model can be said to be a software module that takes an input and produces an output but does not let the user understand the process that was applied to obtain the output.

Some examples of blackbox models are neural networks. Several neural networks may provide a very high accuracy for tasks such as face recognition but will not let the user easily understand how the model makes predictions. This is not true for all models, but as neural networks become more complex, it becomes more and more difficult to understand them. The opposite is glassbox models, which let the user understand the process used to generate an output. An example of  glass box models are decision trees. If a decision tree is not too big, it can be easy to understand how it makes its predictions. Although such models may yield a lower accuracy than some blackbox models, glassbox models are easily understood by humans. In data mining, another example of explainable models are patterns extracted by pattern mining algorithms.

A glassbox model is thus said to be explainable.  Explanability means that a model or knowledge extracted by data mining or machine learning can be understood by humans. In many real world applications, explanability is important. For example, a marketing expert may want to apply data mining techniques on customer data to understand the behavior of customers. Then, he may use the learned knowledge to take some marketing decisions or to design a new product. Another example is when data mining techniques are used in a criminal case. If a model predicts that someone is the author of an anonymous text containing threats, then it may be required to explain how this prediction was made to be able to use this model as an evidence in a court.

On the other hand, there are also several applications where explanability is not important. For example, a software program that do face recognition can be very useful even though how it works may not be easily understandable.

Nowadays, many data mining or machine learning models are not explainable. There is thus an important research opportunity to build explainable models. If we build explainable models, a user can participate in the decision process of machines and learn from the obtained models. On the other hand, if a model is not explainable, a user may be left out of the decision process. This thus raises the question of whether machines should be trusted to make decisions without human intervention?

Conclusion

In this blog post, I have described the concept of explainability. What is your opinion about it? You can share your opinion in the comment section below.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Big data, Machine Learning | Tagged , , , , , | Leave a comment

Brief Report about IEEE ICDM 2020

In this blog post, I will talk about the IEEE ICDM 2020 conference that I have attended virtually. The conference was supposed to be held in Italy but due to the coronavirus pandemic, it was held online.

About the ICDM conference

This year was the 20th edition of the IEEE ICDM conference. It is a well-known conference that is quite competitive. It is one of the top data mining conferences. The proceedings are published by IEEE. The conference has a research paper track, as well as a dozen workshops and tutorials. The papers are mainly about  data mining and machine learning.

Conference opening

The first day was mainly for workshops. On the second day, there was the ICDM conference opening. In the opening, the organizers were introduced, and an overview of the conference was given. Here are some of the slides, below.

The main research topics this year were:

Some statistics about the review process and accepted papers:

Most accepted papers are from China and the US, followed by Australia, Germany, India, France and Japan.

The online conference system

The ICDM conference was held on the website Underline.io where the prerecorded videos of papers could be viewed at anytime. Then, during sessions of the conferences, authors would join a Zoom session and give a 3 minutes summary of their papers and answer questions live, assuming that people had watched the videos already. A few sessions like the conference opening ceremony were held live.

Besides, there is an interesting function on Underline called the Lounge, implemented in with Gather.town, which allows to perform a video\audio chat with other conference attendees in a game-like virtual world (see picture below). In the lounge, the chat function is proximity-based. You can move your avatar close to the avatar of other persons to initiate a discussion with that person or listen to a discussion.

This is an interesting concept that aims to recreate how people would talk with each other during the coffee breaks of an on-site conference. However, in practice, there was not so many people in the lounge. I checked a few times during the first days of the conference and there was about 3 to 5 persons there。 But no one was discussing with each other. So it seems that this function is an interesting concept but in practice I did not see it being used.

My opinion about Underline is that it is relatively simple and it did the job but it relies on external services such as Zoom and Gather.town. Thus, Underline is more like a hub for different services for the conference. Having all these services under a single website or software would have been better in my opinion.

Registration

The registration of ICDM was quite low this year at 500 $ USD due to the conference being held online (because of the coronavirus pandemic). This is appreciated as ICDM is typically quite expensive, just like some other top conferences.

UDML 2020 worksop on Utility Pattern Mining and Learning

This year was the third edition of the UDML workshop on Utility Driven Mining and Learning (UDML 2020). This year, eleven papers were submitted and five were selected for publication for an acceptance rate of 45%. Three of the selected papers are about algorithm for high utility pattern mining, while another is related to spatiotemporal data mining, and another about multi-objective recommendation.

Here is a picture of the five accepted papers:

There was a good discussion during the workshop and it was nice to see some researchers that I knew already.

If you want to see the video of the paper of my post-doc about mining cross-level high utility itemsets, you may watch the video here.

Retrospect about 20 years of ICDM

To close the ICDM 2020 conference, there was a panel about ““20 Years of IEEE ICDM: Retrospect and Prospect” to discuss the two decades during which that conference was held.

Next year: ICDM 2021

The conference ICDM 2021 will be held in New Zealand, organized by the University of Auckland, from December 7 to 10 2021.

Regular papers

There was numerous paper presentations on various topics. I have listened to a few of them related to my interests. On some sessions, there was several watchers and several questions were asked.

Conclusion

In this blog post, I have given a quick overview about the ICDM 2020 conference. I will try write more about the event later. Looking forward to attend ICDM 2021 and then ICDM 2022.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 170data mining algorithms.

Posted in Big data, Conference, Data Mining, Machine Learning | Tagged , , , , , , , , , | 6 Comments

Is it a good to change research area?

In this blog post, I will talk about changing to another research area for researchers and what it implies. Moreover, I will talk about what is a good research area, and the importance of continuity for researchers. I will also discuss about my own experience related to changing research areas.

Time for Change Sign With Led Light

Reasons for changing research areas

There are several reasons for considering a change of research area at different points in the career of a researcher, and also for graduate students. Some reasons are:

  • Changing for a more popular research area. One may wants to work on a more popular research area to follow some new trends. For example, in computer science, one may want to change from a more traditional research area like compiler design to a more popular topic like big data, data science, the internet of things, sensor networks, or machine learning. By following some trends, it may be easier to find a job, get some research funding, get some industrial collaboration projects, publish papers in special issues or workshops, get more citations and have a greater research impact, etc.
  • Personal interests. A researcher may want to try something new or he may feel more interested into a different research area to explore new problems and learn other things.
  • Joining a research team that works on a specific research area. For example, a professor joining a university may want to slightly change research area to integrate with a research team that is specialized on a research topic.
  • Changing for a research area where it is easier to publish articles. For many universities, publishing papers is a performance evaluation criterion. In this context, some researcher will want to work on topics where it is easier to do new contributions, carry experiments and publish articles.

Those are some of the key reasons that a researcher may consider. Whether those are good reasons or not depends on each case. For example, a researcher may not care about working on a popular topic but may rather work on something that he really likes.

I will talk about my own experience as example. In my early research career, I have been working on intelligent tutoring systems and cognitive modelling but found that it was a difficult topic for carrying research as it required to do experiments with people to evaluate my proposals, which was very time-consuming. Moreover, the research community around intelligent tutoring systems is quite small (maybe a few hundred people), so the possibility of having a great research impact was in my opinion limited. Also, I have a personal interest in algorithm design and optimization. Hence, at the end of my Ph.D., I started to switch from this research area towards doing research on data mining. Nowadays, my research area is data mining, and more specifically pattern mining. I think it was a good decision in my case because data mining is a more popular research area, I like this field, and it is easier to do research and write papers, and there is more job opportunities. Besides, by working at a more fundamental level (algorithm design) rather than at the level of applications, I can have a greater research impact. For instance, my algorithms are not limited to only be applied in intelligent tutoring systems but can be used in other fields. If I would keep working on a narrow topic with a small research community, it would be harder to get citations (not so important, but it is still a performance evaluation criterion at some universities).

What is a good research area?

There is no absolute answer to this question. But a researcher can try to answer these questions to assess a research area:

  • Is this research area that is interesting for you?
  • Is this a research area where you can make some good contributions?
  • Is this related to your current expertise? This is important to avoid starting again from zero… If you change to a research area that is somewhat related to your current research area, it may be better.
  • Is this a popular research area?
  • Can you get some special opportunities in that research area (join a team, get a job, funding, etc.)?

Those are some important criteria but it is not necessary to meet all there criteria.

The importance of continuity
Changing research area can be good. However, continuity is also important in the career of a researcher. Changing too often from one research area to another is not good. It will show a lack of focus and it may seem that the researcher is a specialist of nothing. It is better for the career of a researcher to focus on a specific research area and make several good contributions in that area over the years to become more and more famous in that area and benefit from this. As a researcher continue to work in the same area, it becomes easier (and faster) to make better research contributions and write papers. The researcher can also build many collaborations with other researchers over the years, and it becomes also easier to obtain research funding in a research area where you have published many papers.

In my opinion, the best time to change research area is at the begining of the career of a researcher. For example, I gradually changed towards data mining towards the end of my Ph.D. and now mostly only do data mining research. Ten years later, I would not change research area again, because now, I am well-established researcher in that area, and I am also happy to work on this. If I would change again to another research area, then it would become harder to publish papers, obtain grants, and I would have to learn many things again. So my focus is on data mining, but I am still sometimes work on other topics as side-projects. 😉

Changing a research area also requires some planning and to think ahead of time. It is also better to gradually change toward the new research area, if possible.

Conclusion

In this blog post, I talked about changing research areas as it is a concern for several researchers especially early in their career. Hope that it has been interesting. If you would like to share your own experience or have comments related to this, please post in the comment section below!

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 170data mining algorithms.

Posted in Academia, General, Research | Tagged , , , | Leave a comment