How to become a journal or conference reviewer?

Today, I will write a blog post aimed a young researchers, who want to know what is the work of a reviewer in academiahow to become a reviewer of international journals or conferences, and what are the benefits of being a reviewer.

What is the work of a reviewer?

The main work of a reviewer is to read articles and evaluate if the content is suitable to be published. The role of reviewers is very important for the publication process in academia as it helps to filter bad papers and provide advice for improving other papers.

Toreview an article, a reviewer may spend a few minutes (e.g. when a paper is clearly bad or contain plagiarism, and is directly rejected by the reviewer) to several hours (when the paper is complex and read carefully by the reviewer) . Typically, in good journals and conferences, a paper will be evaluated by several reviewers as they may have different opinions and backgrounds, which help to take a fair decisions on whether to publish papers or not.

In general, the review process of top journals and conferences is quite effective at eliminating bad papers as those can recruit excellent reviewers. Smaller or less famous journals sometimes have more difficulty to find good reviewers related to the topics of papers. And sometimes, some bad manuscripts will once in a while pass through the review process due to various reasons. And for some predatory journals, they often do not have reviewers and will publish anything just to earn money.

What are the benefits of being a reviewer?

The main benefits is to help the academic community by providing feedback to publishers and authors.

The reviewers typically work for free.  But sometimes publishers provides some gift to reviewers. For example, Elsevier had been offering a free one month subscription to one of its online service named Scopus to reviewers, while some top journals of Springer offer to download a free e-book from the Springer library after completing a review on time. Such offers may help to convince researchers to work as reviewers.

Some other benefits of reviewing are:

  • Read the latest research and learn about topics that one would maybe not take the time to read otherwise.
  • Obtain some visibility in the research community. Some conferences will for example publish the names of reviewers in  conference proceedings or on their websites.
  • Learn to think like a reviewer, and become more familiar with the review process of journals. This help to write better papers and to know what to expect when submitting papers to the same conferences and journals.
  • Put this on a CV. Especially for those aiming to work in academia, being a reviewer of a good journal or conference can be useful on a CV.

How to become a reviewer?

A graduate student may start to review papers for his supervisor. In that case, the supervisor will let the student write the review and then the supervisor will check it carefully before the review is submitted to ensure that it is a good and fair review. This will give the opportunity to students to learn how to be a reviewer.

Sometimes, a PhD student may also find some opportunities to review papers by himself. For example, when I was a PhD student I visited some journal websites that advertised that they needed reviewers. I then sent an email with my CV to ask to be a reviewer. The journal was not famous, but it gave me some experience to start review some papers.

But generally, most journals will contact potential reviewers rather than the other way. Generally, when a paper is submitted, the journal editor will search for reviewers that have papers on the same topic or work in a related area, to ask them to review the paper. Thus, reviewers of a journal paper are often expert in the field. Often, editors will prefer to ask researchers who have previously published in the same journal, to review papers. Typically a reviewer, may have a PhD or at least be a PhD student with good publications. In some cases, a master degree students may be asked to review papers. I have seen it once or twice. But this is quite rare and these students had very strong publications.

For a conference, there is typically a program committee that is established. It is a set of researchers that will review the submitted papers. Each reviewer may for example have to review 3 to 5 papers. To join the program committee of a conference, one may email the organizers to ask if they need additional reviewers. Then, the organizers may accept if the applicant has a good CV. But for famous conferences, joining the program committee typically require to be recommended by a conference organizer or some members of the program committee. It is generally not easy to join the program committee (to be a reviewer) of top conferences.

Drawback of being a reviewer

One of the drawback of being a reviewer is that it can require some considerable amount of time. For example, I receive numerous emails to ask me to review journal papers for various journals and on a wide range of topics. Although I do several reviews every month, I also decline several invitations because of time constraints, and that I just receive too many invitations. I will for instance often decline to review papers not in my research area or that are for unknown journals, or journals not related to my research area.

How to review a paper?

If you have been selected to review papers, you may be interested to read a blog post that I wrote about how to review papers.

Conclusion

 Today, I have discussed about the work of reviewers and how to become a reviewer. If you have comments, please share them in the comment section below. I will be happy to read them.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged , , , | Leave a comment

If I would do a PhD again, what would I do differently?

Recently, I gave an invited talk at University of Pisa in Italy (online). A PhD student asked me: If you would do a PhD again, what would you do differently? In this blog post, I will answer this question, which I think can be interesting for graduate students.

First, I think that one of the key aspects to consider for a PhD is to choose a good research team, preferably in a good university, where you will have a good research environment and can work on some important research topic.. In my case, I did my PhD in a university that is maybe not so high in the world rankings but is still good, and more importantly my supervisor was great and gave me several opportunities through his social network. Thus, for this, I would not  change.

A second important aspect is about time management. If I would do a PhD again, I would try to manage my time in a better way to be more effective. As a student, I had a lot of time but sometimes spent time on things that were not so important. It is important to be able to assess what is the most important and to choose carefully how  to spend time. For example, if you have one day left to submit a paper, is it more important to spent it improving the colors of figures or proofreading? Generally, the latter is more useful.

A third important aspect is to collaborate more with other researchers. As a PhD researcher, it is easy to work by yourself on your thesis. But having the feedback of others can be highly valuable. Moreover, collaborating with others can help write   more papers and find other opportunities. On this aspect, I did well during my PhD as I had several collaborations but I could have perhaps discussed more about my project with   other researchers.

A fourth important aspect is to choose a research topic that you like. Personally, during my PhD, I first started doing something on e-learning before gradually moving towards data mining, which is my current research area. If I had made that decision earlier, it would have been better. But this is easy to say, afterwards. Although I also liked working on e-learning, the community was quite small to work on Intelligent Tutoring Systems and thus it was hard to have some impact in this field despite doing good research. Another reason why I stopped working on this is that conducting experiments was quite time-consuming and complex, while in data mining it can be as simple as running algorithms on a benchmark dataset that you download to test a new algorithm. Besides, I personally like research on algorithm design.

A fifth important aspect is to have clear goals for your career path after the PhD. It is never too early to search for jobs or opportunities such as postdoc positions. I think I did quite well on this part as I got a postdoc position in a good data mining team. But I could have started searching earlier.

A sixth important aspect is to focus on having quality papers in good journals and conferences, recognized worldwide if you intend to have an international career. In some countries like Canada, some conference papers are well regarded in computer science, and there is not much pressure to write journal papers for PhD students. Even, some PhD students may graduate without any papers at some universities. But internationally, several countries consider journal papers as highly important and have various ranking systems to evaluate journals and conferences. For researchers on intend to work internationally, it is thus something important to consider. Sometimes, it is better a few very good papers than have too many papers.

Conclusion

In this blog post, I gave some answers to the question of what would I do differently if I would do another PhD. Hope it was interesting. If you have some comments, please write below.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, General, Research | Tagged , , | Leave a comment

Analyzing COVID-19 tweets to understand the public opinion

In this blog post, I will talk briefly about how tweets collected on Twitter can be analyzed to understand the public opinion about COVID-19. This is based on the below research paper, that I have recently participated to:


Noor, S., Guo, Y., Shah, S. H. H., Fournier-Viger, P., Nawaz, M. S. (2020). Analysis of Public Reaction to the Novel Coronavirus (COVID-19) Outbreak on Twitter. Kybernetes, Emerald Publishing, to appear.

I will give an overview of the above paper. For more details, you can click on the above link to see the whole research paper.

Why analyzing Tweets? There has been a lot of research about analyzing tweets in the past such as to detect the sentiment and feelings of people on different topics, or even to detect fake news and bots among other things. The interest of analyzing Twitter data is that Twitter is used by millions of people and that tweets are posted in real-time. Thus, tweets can be used to analyze what people are saying about a topic such as the coronavirus.

How can we understand public opinion about COVID-19 on Twitter? In the above research paper, we applied the following methodology. We have first collected thousands of tweets in English about COVID-19 during the first months of the pandemic. Then we applied some clustering algorithms to discover the main themes that were talked about on Twitter related to COVID-19. Moreover, we applied sequential pattern mining algorithms to find frequent words patterns in Tweets.

What have we discovered? We have found several interesting things. For the cluster analysis, we found seven main clusters of tweets that indicate some main themes discussed by Twitter users:

  • Cluster 1 (green): public sentiments about COVID-19 in the USA.
  • Cluster 2 (red): public sentiments about COVID-19 in Italy and Iran and a
  • vaccine,
  • Cluster 3 (purple): public sentiments about doomsday and science credibility.
  • Cluster 4 (blue): public sentiments about COVID-19 in India.
  • Cluster 5 (yellow): public sentiments about COVID-19’s emergence.
  • Cluster 6 (light blue): public sentiments about COVID-19 in the Philippines.
  • Cluster 7 (orange): Public sentiments about COVID-19 US Intelligence Report.

For example, this is the cluster 1:

And this is the cluster 2:

Cluster 3:

Some part of cluster 4:

Some part of cluster 5:

Some part of cluster 6:

We also found several patterns related for example to “Coronavirus, testing, lockdown”. Here is for example, some of the most frequent words:

More results are presented in the paper.

The above results represent what the sampled tweets have been talking about on Twitter in English from January to March 2020, related to COVID-19.

Conclusion

In this blog post, I have just given a very brief overview of what can be learnt from Tweets related to public opinion. For more details, please check the above paper! There is also obviously some limitations to that study such that Tweets were not geolocalized and that only the English language was used. If you have any comments you may post in the comment section below. Hope this has been interesting.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Data Mining, Data science | Tagged , , , , | Leave a comment

The Controversy around Extreme Learning Machines (ELM) and related models

Today, I will talk about an interesting topic in academia which is the controversy around ELM (Extreme Learning Machine) and its origins. This has been a hot topic of discussion in the field of machine learning for more than a decade, when some researchers started to question the high similarity of ELM to other models published before such as RBF (Radial Basis Function). There has also been recently some researchers arguing about the similarities between ELM and RVFL (Random Vector Functional Link) and other models.

In this blog post, I will give an overview of this controversy and impact but I will not take any sides. I will just look at it from an outsider’s persective. You can read the arguments from both sides and make your opinion and draw your own conclusions.

Some arguments against ELM

ELM was proposed in 2004. The controversy around the origins of ELM started around 2008 with a letter in IEEE transactions that claimed that it is unecessary to give a new name to a model that existed already with perhaps minor modifications:

  • L. P. Wang and C. R. Wan, “Comments on “The Extreme Learning Machine,” in IEEE L. P. Wang and C. R. Wan, “Comments on ‘The extreme learning machine’,” IEEE Trans. Neural Networks, Vol. 19, No. 8, 1494-1495, 2008.

Other researchers have raised this issue. And to understand this perspective, there is an anonymous website that provides a good summary of the issues raised by some researchers against ELM. It is called : ELM Origin (webs.com)

A problem with this website though is that it is anonymous, which means that we cannot be sure who wrote it. However, the website provides annotated ELM papers and claim that several ELM models are similar to papers published many years before. For example, it is said that ELM-Kernel is similar to LS-SVM with zero bias and kernel ridge regression.

I did not read the information in details asthis is outside my main research field so I am personally not sure whether all the claims are reasonable or not.

Some arguments for ELM

There has been researchers that have responsed to these claims to support that there are indeed differences between ELM and previous work. For example:

  • G.-B. Huang, “Reply to comments on ‘the extreme learning machine’,” IEEE Trans. Neural Networks, vol. 19, no. 8, pp. 1495-1496, Aug. 2008.
  • G.-B. Huang, “What are Extreme Learning Machines? Filling the Gap between Frank Rosenblatt’s Dream and John von Neumann’s Puzzle,” Cognitive Computation, vol. 7, 2015.

However, some researchers argue that these differences are tiny. It was also argued in the defense of ELM that researchers may have simply missed some related work and thus not been aware of the prior work. This might be true… as it has happened in the past that some discoveries were made independently by several researchers.

Yann LeCun’s opinion

One of the fathers of deep learning has also given his opinion on this topic in a Facebook post:

He was clearly not impressed by ELM. However, this is just a Facebook post and it seems that LeCun perhaps did not read all the papers about ELM to have a clear idea about the topic (perhaps?).

Who is right?

As I said previously, I will not take position as this is not my main area. You may make your own mind or write your opinion in the comment section below if you have one.

What is the impact of this controversy?

This controversy has resulted in a kind of war between some researchers working in that area. I have observed that there are researchers against ELM and some that are for ELM that have been quite aggressive towards each other, and there are also many researchers that do not want to take sides but are caught between the two sides.

As I work as associate editor for various journals I have noticed for example, at some point that a reviewer wanted to directly reject a paper just for using the name of ELM. I also noticed some researchers that tried to push their citations against ELM or for ELM. In other cases, I have also seen some reviewer arguing that authors should change their paper because it had shown that ELM was better than some other models and the reviewer could not accept that conclusion, even arguing that this must have been due to experimental errors.

I personally dont really know what to think about this. But as an outsider, it seems to me that today, there is still a kind of war on this topic involving various people, and I think it is a pity for the people who are caught in the middle of that war but do not want to take side.

Conclusion

This is a short blog post to talk about the controversy around ELM. I just report about this topic, as I think it is interesting. As said above, you can read about it and make your own opinion. But personally, I think it is better to not take any side to avoid conflicts.

Posted in Machine Learning | Tagged , , | Leave a comment

Discovering Alarm Correlation Rules for Network Fault Management (video)

In this blog post, I will share the video of our new paper about analyzing alarms in telecomunication networks presented at the AIOPS 2020 workshop. This work is part of an industrial collaboration project. The motivation for this project is that there are typically thousands of alarms in a telecomunication network, and not all of them are important. To allows network operators to focus on fixing issues that are the most important, we propose a method to discover correlations between alarms.

For this purpose, we view a telecommunication network as an attributed graph where nodes represent devices, edges indicates connections between devices, and attributes of vertices represent alarms. Then, we apply a novel algorithm to find rules of the form A–>B indicating that if alarm A appears, Alarm B is likely to occur. Then, using these rules, we can reduce the number of alarms presented to network maintenance workers. Though, the approach is designed for analyzing alarms it could be applied to other data modelled as graphs.

Here is the link to watch the paper presentation:
http://philippe-fournier-viger.com/AIOPS.mp4

And here is the reference to the paper:


Fournier-Viger, P., Ganghuan, H., Zhou, M., Nouioua1, M., Liu, J. (2020). Discovering Alarm Correlation Rules for Network Fault Management. Proc. of the International Workshop on Artificial Intelligence for IT Operations (AIOPS), in conjunctions with the 18th International Conference on Service-Oriented Computing (ICSOC2020) conference,

That is all I wanted to write for today!

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Data science, Video | Tagged , , , , | 5 Comments

Merry Christmas and Happy New Year!

Today, I would like to wish all readers of this blog and users of my SPMF data mining software a merry Christmas and a happy new year!

merry christmas happy new year spmf

This year has been a special year due to the worldwide pandemic with several challenges and changes in our habits. But this year will soon be behind us. And I wish you all health, hapiness and success for 2021.

I would like at the same time to thank all the users of SPMF and readers of this blog for supporting those projects. For the SPMF software, a new version will be released very soon with several new algorithms! I am working on it these days! Keep you updated soon…


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in General | Tagged , | Leave a comment

Conference Badges: the Best and the Worst

Today, I talk about my collection of conference badges that I have collected since I was a PhD student till today. I have attended over 50 events and have kept all of the conference badges except maybe one or two. Here is a picture of all these conference badges:

conference badges

Totally, I have visited 28 countries and/or special territories but not all of them for attending conferences. Sometimes, it was only for a research visit or vacation. Below I will talk about what is a great conference badge and take a look at some of them to compare the different designs.

Generally, a good badge should have the following characteristics: (1) it is big enough, (2) the name is written in big letters, (3) it does not contain irrelevant information (e.g. it is unecessary to write the conference dates and hotel), (4) it is also beautiful, and (5) it cannot flip or otherwise it is printed on both sides.

The simple black and white badges

The badge below for DEXA 2018 is the most simple one. Printed on a piece of standard paper with a black and white printer, it only indicates the conference name, attendee name and country. Simple and effective. But could be more beautiful.

dexa conference badge

This is another simple black and white badge, for KDD 2018:

kdd 2018 conference badge

The simple badges with color

The badge below is still quite simple but has a bit more color which makes it more enjoyable than the black and white badges.

canadian ai conference badge

The one below is simple from IEA AIE 2018, colorful and effective as the key information is easy to read and big enough:

iea aie 2018 conference badge

The one below from PAKDD 2014 is also quite good as the name is really big and the design is nice and colorful. However, there is a lot of empty space at the bottom. The bottom third of the badge could be cut entirely.

pakdd 2014 conference badge

The one below from PAKDD 2017 is a bit better in my opinion as it is more beautiful. But the font for the name is a bit hard to read. Generally, it is better to put the first name bigger and to put the first name and the last name on different lines to avoid squeezing all letters on a single line like below.

pakdd 2017 conference badge

I like badges like the one below from IDA 2014 that are simple, colorful and just contain the key information (name, affiliation and conference acronym), and are also beautiful. That one uses a color picture which is nice.

ida 2014 conference badge

Badges with text that is too small

Some badge like the one below from ADMA 2018 are very big but do not use the space very well. The name of the attendee is actually very small. More than 50% of the space is basically empty.

adma 2018 conference badge

Badge with too many information

The badge below from PAKDD 2018 is beautiful but really contains too much information. It is not necessary for attendees of the conference to know the full conference name, dates, name of the hotel (!), and country. If we are attending the conference, we already know at which hotel we are and what is the date.

pakdd 2018 conference badge

Badges where you write your name by yourself

For some conferences, I had to write my name by myself. This is not a very good idea… Look at the messy result below when the ink does not dry well at ADMA 2013!

adma 2013 conference badge

Badges with a fancy design

The badge below is one of my favorite as it is made of plastic and has a very beautiful design representing the architecture of a famous tower in the city (Liaocheng). It could have been improved by adding the names of attendees.

Badges with a special material

Another badge that is quite special is the one below for the BDA 2019 conference as it has been etched in a piece a wood. That is the most unique material for a conference badge that I have seen, and for this it is really nice. However, I think that some information could be removed like the full conference name and dates. Just writing BDA 2019 would be enough and would make it easier to read.

bda conference badge

Badges with photo

Badges for some events also havea photo. Below is an example. Having a photo is nice and probably also a security measure to ensure that the badge is not stolen and used by someone else.

Another badge with photo is below. This one is really nice but a problem is that the name is really small.

The badges with no names

A few conferences have given badge with no names like below. Although I have enjoyed these conferences, I have to say that having a name on the badge would have been much better. It is important to help starting conversations with other attendees!

adma conference badge
icgec 2018 conference badge

Badge with text that is too small and too many colors

And the following badge is one of the worst (in my opinion). The problem with this badge is that it is really small (smaller than a credit card) and that the text is really hard to read because of the colors. At that time I was a graduate student and I had printed these badges and helped to do the design so I am partly responsible for that! What happened is that we first bought paper for badges that were too small and did not know how it would look like when printed in color. Also, I had no experience in designing badges and we were in a rush, so we did not had time to print them again. Today, I would not do like that 😉

its 2008 conference badge

But I also did the design of that badge at the same time and it looked a bit better:

educational data mining 2008 conference

Conclusion

In this blog post, I have talked about how a good conference badge should be designed and have shown some of the best and worst badges from my collection. 😉

Do you also keep all your conference badges? Which badge do you like the most or think is the worst? You may tell me in the comment section below.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Conference, General, Research | Tagged , , , | Leave a comment

Real Conferences VS Virtual Conferences

Year 2020 is soon ending, and it has been a quite special year due to the coronavirus pandemic around the world. This has forced many researchers to work from home, and to cancel or change their research travel plans. Moreover, may academic conferences in 2020 have been held online as virtual conferences as a safety measure and due to travel restrictions in several countries. In this blog post, I will talk about this new trend of holding virtual conferences and the advantages and benefits compared to “real” conferences (held in a physical location).

Since the begining of the year, I have attended several virtual conferences such as PAKDD 2020, ICDM 2020, IEA AIE 2020, and the AIOPS 2020 and UDML 2020 workshops, as well as the DAWAK 2020 conferences. Generally, these events have been well-organized. While some conferences took great care of scheduling talks of researchers based on their time zones, some other events had some small time management problems. For example, a session chair thought that a session was starting earlier due to a wrong time conversion, and the wrong time zone was indicated in the program of another conference, which led to some confusion. But on overall, it worked as planned.

Benefits of virtual conferences

Listening to a conference online has some benefits. One of them is that it is not necessary to travel very far to give a talk. Rather than flying to a location, one can just connect to a server, which is not time-consuming. Online conferences also provides flexibility as one can listen to talks while doing some other things at home, or from various locations. Moreover, a few conferences have provided a playback option to watch the videos of previous presentations in case we missed them. Another benefit of online conferences is that the registration fees have been often reduced, and that in some cases, attending the conferences became free. This may have helped some students or researchers to attend some conferences that they would otherwise have not attended.

Drawbacks of virtual conferences

There are also some drawbacks to online conferences. The first one is that the schedule is not suitable for everyone. For example, one may have to present a paper in the middle of the night due to the time difference. This was generally not a problem in my case, but I know some other researchers that had problems with this.

A second drawback is that the ability to socialize with other researchers is greatly reduced in online conferences. In a real conference, we can shake hands and talk with many people that we know or don’t know, especially during the coffee breaks and other social activities. This is important to establish contact with other researchers. However, in virtual conferences, there is not much opportunities for that… Some conferences like ICDM have adopted some online systems such as Gather.Town where we could walk using an avatar in a virtual room to talk with other people using a webcam and microphone but I found that the room was essentially empty every time I checked or with only a few inactive people. Thus, although that concept was nice, in practice, I was not able to talk with anyone using it.

Another issue with virtual conferences is that it is easy to not feel motivated to listen to the talks since they are all online and the schedule is often conflicting with real-life activities. Some talks may be in the middle of the night, or during work hours or lunch. Thus, I personally did not listen to many talks, while at a real conferences, I would attend most of the sessions.

Another thing that I don’t like so much about virtual conferences is that we often do not see the audience when we give a talk (unless they open their webcams). In this case, we are in front of the computer talking with our microphone but we have little feedback during the presentation. And in many cases, the talks are required to be pre-recorded, which do not make them interactive at all.

Interview ,microphone, Speech,woman

Attending real conferences again

Recently, I attended some real conferences again. This is because the pandemic is under control in the country where I live (China). The second week of December 2020 was the first time that I attended a real conference this year. And it was really enjoyable feeling to be able to meet again researchers and talk with them face to face. I met some very nice people and those were some great events. In general, the life where I am has gone back to normal already since several months, which I am very happy about. However, I am looking forward to the day where I can also attend international conferences abroad as I used to do many times per year, in the past. I think next year, real conferences will start to happen again… or perhaps some hybrid conferences that will be partly online and partly offline (e.g. IEA AIE 2020).

Conclusion

In this blog post, I talked about the experience of attending real and virtual conferences, and especially the benefits and drawbacks of virtual conferences. I hope that it has been interesting. If you want to share your thoughts and experience about that, please leave a comment below! I will be happy to read you.

Posted in Academia, Conference | Tagged , , , , | 1 Comment

How to prepare your thesis defense?

Today, I will talk about an important topic for graduate students, which is how to prepare for your thesis defense. I will explain what should be done to prepare yourself well., and also talk about my experience as student and currently as professor and judge for thesis defenses.

Brown and Black Wooden Chairs Inside Room

Before the thesis defense

  • If you have a chance, attend some thesis defenses by other graduate students to get familiar with the process.
  • Ask about how the thesis defense are done at your institution and who will be the judges. Especially, you need to know about the amount of time that you will have to give your presentation.
  • Start to prepare early and talk to your thesis supervisor about your preparation. Your supervisor may give you some good advices, especially with respect to how defenses are conducted at your school.
  • Spend a good amount of time to prepare your presentation. Preferably, prepare your slides a week earlier and show them to your supervisor and friends for comments. You may read my advices about how to give a good talk. In particular, avoid putting too many slides and too many details., and make sure there are no errors or typos.
  • Rehearse your presentation several times to make sure you are comfortable giving it, and that you can present whitin the time limit. You may ask some friends to listen to your presentation.
  • Eat well and have a good sleep before the talk. This can make a big difference. For example, in the past, I was judge for a thesis defense where a student felt down and almost loose consciousness due to the high stress, fatigue and not eating breakfast. To be able to sleep well and be at your best, you need to finish your preparation at least one day before the defense.
  • Prepare a list of questions that you think judges may ask you and a list of corresponding answers. This will help you to better answer questions.

If you prepare yourself well, you will not be stressful and you will perform better.

During the defense

  • Wear some suitable clothing. Be polite.
  • Don’t talk too fast. A common mistake is that some students will try to talk very fast to say more things. But this is not necessary. Instead, summarize and talk about what is important at normal speed.
  • Look at your audience. Another common mistake is to look at your screen instead of looking at your audience. A presentation is much more interesting when the presenter look at attendees.
  • Keep track of the time. This is one of the most important thing. You need to make sure that you will not exceed the time limit. Thus, keep an eye on the clock, cellphone or your watch to know how much time is left.
  • Listen carefully to the questions from judges before answering. If you did not understand, ask to clarify the questions or repeat the question in your own words, before answering. This is important because If you did not understand a question, you may give an unrelated answer.
  • When answering a question, remember that the judge may not be an expert on your topic. Thus, try to give an answer that is easy to understand if you think the judge may not be familiar with your research area.

Conclusion

In this blog post, I gave some advices about how to prepare for your thesis defense. Hope it will be useful. If you think I missed something or would like to talk about your experience, please leave a comment below!

Wish you a successful thesis defense!


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged , , , | Leave a comment

The importance of explainable data science and machine learning models

In this blog post, I will talk about an important concept, which is often overlooked in data mining and machine learning: explainability.

To discuss this topic, it is necessary to first remember what is the goal of data mining and machine learning. The goal of data mining is to extract models, knowledge, or patterns from data that can help to understand the data and make predictions. There are various types of data mining techniques such as clustering, pattern mining, classification, and outlier detection. The goal of machine learning is more general. It is to build software that can automatically learn to do some tasks. For example, a program can be trained to recognize handwritten characters, play chess, or to explore a virtual world. Generally, data mining can be viewed as a field of research that is overlapping with machine learning and statistics.

Machine learning and data mining techniques can be unsupervised (do not require labelled data to learn models or extract patterns from data) or supervised (labelled data is needed).

In general, the outcome of data mining or machine learning can be evaluated to determine if something useful is obtained by applying these techniques. For example, a handwritten character recognition model may be evaluated in terms of its accuracy (number of characters correctly identified divided by the number of characters to be recognized) or using other measures. By using evaluation measures, a model can be fine-tuned or several models can be compared to choose the best one.

In data mining and machine learning, several techniques work as black-boxes. A black box model can be said to be a software module that takes an input and produces an output but does not let the user understand the process that was applied to obtain the output.

Some examples of blackbox models are neural networks. Several neural networks may provide a very high accuracy for tasks such as face recognition but will not let the user easily understand how the model makes predictions. This is not true for all models, but as neural networks become more complex, it becomes more and more difficult to understand them. The opposite is glassbox models, which let the user understand the process used to generate an output. An example of  glass box models are decision trees. If a decision tree is not too big, it can be easy to understand how it makes its predictions. Although such models may yield a lower accuracy than some blackbox models, glassbox models are easily understood by humans. In data mining, another example of explainable models are patterns extracted by pattern mining algorithms.

A glassbox model is thus said to be explainable.  Explanability means that a model or knowledge extracted by data mining or machine learning can be understood by humans. In many real world applications, explanability is important. For example, a marketing expert may want to apply data mining techniques on customer data to understand the behavior of customers. Then, he may use the learned knowledge to take some marketing decisions or to design a new product. Another example is when data mining techniques are used in a criminal case. If a model predicts that someone is the author of an anonymous text containing threats, then it may be required to explain how this prediction was made to be able to use this model as an evidence in a court.

On the other hand, there are also several applications where explanability is not important. For example, a software program that do face recognition can be very useful even though how it works may not be easily understandable.

Nowadays, many data mining or machine learning models are not explainable. There is thus an important research opportunity to build explainable models. If we build explainable models, a user can participate in the decision process of machines and learn from the obtained models. On the other hand, if a model is not explainable, a user may be left out of the decision process. This thus raises the question of whether machines should be trusted to make decisions without human intervention?

Conclusion

In this blog post, I have described the concept of explainability. What is your opinion about it? You can share your opinion in the comment section below.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in artificial intelligence, Big data, Machine Learning | Tagged , , , , , | Leave a comment