Funny pictures about data mining / machine learning

Today, I will share a few funny pictures related to data mining and machine learning that I have found online. These pictures comes from various sources (I don’t remember who created them). I will also perhaps add more later on that page

Associations between customer purchases?

Everybody is doing AI

Toy datasets vs real-life


Training a model

Overfitting (1)

Data distributions



Overfitting (2)


What people think I do?


If you also have some interesting pictures, you may share it in the comment below and I may add them to this page.

What are the milestones in the career of an academic researcher?

Today, I will talk about the different milestones that a researcher may meet during his career. I will start from the first stage, which is graduate studies until reaching the stage of being a permanent researcher working at a research institution or being a well-known researcher. I will give some advices about what is important at each stage of the career of a researcher.

Stage 1: Graduate student

The first stage is graduate studies. The goal of the master degree is to learn how to do research, by joining a research team. At that stage, one should learn how to read research papers about state of the art research,  develop ideas to solve some research problems, develop a solution, carry experiments, and write papers.

During the master degree, the supervisor usually guide the student and help him with some of the tasks (e.g. writing a paper). This is different from doing a PhD, where a student should do more tasks by himself. After completing PhD, one should be an autonomous researcher. It means that someone who has completed a PhD should be able to find interesting research problems by himself (without help from others) and to perform all other steps of a research project by himself.

Normally a graduate student will initially need much help to do research. But after completing a few projects and writing papers, one will become more and more efficient and autonomous. It is important to have that as a goal.

What one should focus on during graduate school?

  • learn to write well research papers (writing is a key skill for a researcher), 
  • publish several papers, and at least some in good conferences and journals (to convince other people of your research ability and then land a researcher job),
  • learn to find research problems and develop original research solutions,
  • improve your presentation skills (not only to present papers at conferences but because researchers who will work as lecturers or professors will be expected to teach well),
  • try to obtain grants and prizes during studies,
  • try to build a network of contacts in academia and have collaborations with other students or researchers,
  • try to publish some papers that may obtain citations (because citation count is sometimes considered as a performance indicator),
  • try to have some teaching experience such as teaching an undergraduate course, or being a teaching assistant,
  • try to have good grades (although this is less important than having good research output),
  • learn other useful research related skills such as finding papers online, using LaTeX for writing papers (especially for science papers), managing time well,
  • learn to identify limitations and weaknesses in the research of others when reading a paper or attending a presentation,
  • try to always ask at least one question when attending a presentation,
  • try to be involved in reviewing papers and other important academic activities.

Stage 2: Postoctoral researcher

Many persons become a postdoctoral researcher after doing the PhD. Such position may be for one or two years and sometimes more, with usually the goal of then obtaining a position of professor or lecturer, or working in the industry.

Why doing a postdoc? It gives the opportunity of exploring new research topics, that are often different from the PhD, and to write more papers further improve research skills, and gives some extra time to find a job. A postdoc will also generally be done with a research team that is not the same as that of the PhD, and sometimes even in another country. This allows to learn other ways of doing research and to build contact with other researchers.

What one should focus on during a postdoc?

  • Find a good team,
  • Write quality papers,
  • Be almost autonomous in finding research problems and doing research,
  • Try to participate in the research of other team members or researchers and perhaps even unofficially cosupervise students,
  • Try participating in funding applications,
  • Work on projects that will lead to papers in a relatively short time and have relatively low chance of failure as a postdoc is often short and may need to show results to then apply for other jobs,
  • Don’t be a postdoc for too many years (ideally no more than two years) as more than that may be considered negative in some fields.

Step 3: Faculty member / researcher

The next stage for an academic researcher is usually to become a faculty member or professional researcher, that is to work for a university or research center and perform research and perhaps also teach.

There are different ranks for faculty members in universities, which depends on the countries. In north america and China some typical ranks in a university  are lecturer, assistant professor, associate professor and professor (also called full professor). Sometimes there are also some honorific ranks such as distinguished professor. Typically, the rank of lecturer consists of only teaching (no research), while the lowest rank that consists of doing research and teaching is assistant professor.

The goal of a new faculty member should be to climb ranks by:

  • Creating a research program that spans over several years with a long-term vision (different from a graduate student that typically do not think more far than a paper at a time).
  • Writing research proposals that obtain significant research funding,
  • Writing high quality papers that have a significant impact,
  • Being an excellent teacher,
  • Obtaining awards, getting involved in international committees,
  • Supervising graduate students successfully, and learning to manage a team,
  • Having international collaborations and industry collaborations,
  • Being involved in university affairs,
  • Having other activities such as publishing books, organizing workshops, conferences, and being a journal editor.

Several young faculty members have problems developing a long term research plan, and/or are still having difficulty finding good research problems. This lead to the inability of obtaining research funding and publishing good papers, and is often caused by not learning to become autonomous during  the PhD. It is thus important to develop these skills as early as possible during one’s career. If one is unable to have a research plan or obtain funding, he may not be promoted and may even not have his contract renewed. I have seen this several times.

Besides climbing the ranks, one may aim at becoming influential and well-known in his field. This requires the same goals but to put extra effort and to work strategically to obtain this goal.

For young faculty members, the most critical period is the first three to five years, where one needs to prove himself to become permanent or be promoted. This requires a huge amount of work because one not only need to prepare new courses as a new faculty member, but also to teach and do well in terms of research.


This post has given an overview of the main steps in the career of an academic researcher. Hope it was interesting. If you have comments and think that I have missed something important, please post a comment below. I will be happy to read it.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

What happens after the PhD?

People will work several years to obtain a PhD, sometimes with the goal of becoming a  researcher in academia or the industry, or a lecturer. Some think that getting a PhD is  enough to become a successful researcher. But obtaining a PhD is not enough to ensure that.

For example, when I was doing my PhD in Canada, I noticed that there was a huge difference between the best and the worst students who completed their PhD studies. Some students would finish a PhD without publishing a paper (only a thesis), while other had scholarships and dozen of papers and awards, and had multiple collaborations with international researchers. All these students received the same Ph.D. diploma. But their CV were not equal and it made a big difference when it was time to apply for a job, and how successfully they would establish a research career.

I also noticed that some students would finish their PhD in the minimum amount of time, while in some case a student finished in ten years due to a lack of motivation, a part-time job, not producing meaningful research, and perhaps a lack of support from his advisor. This latter student was then unable to pursue a research career despite having  finally obtained his PhD. In fact, he should have perhaps chosen another career path earlier.

Another problem that some PhD students face is that they would wait perhaps just a month before graduating to look for a job. But finding a good research position after the PhD is not always easy and require preparation.

So how to ensure a successful career after the PhD?

I will give some advices:

  • Try to find a mentor which has research experience to give you advices about how to succeed in your field, and overcome the challenges that you are facing to establish your career. This can greatly help as you will avoid making some errors that other people have made.
  • Set a clear goal for your career as early as possible, then think about the milestones or subgoals that you need to attain to succeed.
  • Make a realistic plan of how to attain your goals as early as possible.
  • Build a network of contacts and collaborators in your field. This can help you to find opportunities and bring other benefits. Attend conferences, talk with other researchers online, in your university, etc.
  • Create a website, and online profile on research oriented social networks  like ResearchGate, and a Linkedin profile. This can help to promote your research and keep contact with other researchers. Share your papers online.
  • Publish your data, or software programs that you developed as open source. People who will use them will cite you.
  • Find an important research problem to work on and develop something innovative. Choose a project that is realistic (will not likely lead to failure), not too long (will not likely delay your PhD), and can lead to good publications.
  • Improve your writing skills. This is a key aspect for researchers in academia as writing papers and grant proposals is something researchers always do. A well-written paper or grant proposal that is convincing has always more chance to be accepted/funded.
  • Aim at publishing in good journals and conferences. Getting your papers accepted there will show that your research is recognized by your peers. Publishing in unknown conferences and journals, or not having publiciations  will not convince anyone of your research abilities when it is time to look for a job or apply for funding. Often, publishing good papers is more important than publishing many papers.
  • Improve your presentation skills. As a researcher, you will often need to present your research and deliver talks. A good presentation can make an enormous difference. Besides, when it is time to apply for a job in academia, the hiring committee will likely ask you to present your work and give a teaching demo to evaluate your teaching skills. A poor presenter may not be hired even if he is a good researcher. And an average researcher with poor presentation skills will likely not be hired. Here are some tips for improving your presentation skills.
  • Choose a good PhD supervisor, with a strong team. A good team will give you a good environment for your research and bring opportunities. Working with a famous researcher in your field may bring various benefits, including learning from successful researchers.
  • Don’t be afraid to go abroad or in other cities to find better opportunities. For researchers, having experience abroad generally looks good on a CV, and is even a criterion for hiring in some universities. If no suitable jobs are available in your countries, looking abroad may help find one. I for example did my PhD in Canada, my postdoc in Taiwan, moved to another province in Canada, before going to back to China. And this strategy of going abroad has paid off well as it opened new opportunies that I would not have if I always stayed in the same city.


That is all for today, as I am writing this on the airplane and it will soon land. If you have comments, please share them in the comment section below. I will be happy to read them.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

A Tribute to Hypercard

In this blog post, I will talk about the first programming language that I have learn, which is HyperTalk. Younger readers may have never heard about it, as it was mostly popular in the 1980s and 1990s. Though, it is not a complex language, it was ahead of its time with many ideas, and has influenced many other important technologies such as the Web that we used today. I will briefly introduce the main idea around HyperTalk and its authoring system named HyperCard, and also talk a bit about my experience.

Hypercard software

HyperCard is a visual authoring tool for writing software that was developped for Apple computers. It was designed to be used by novice as the user interface of a software could be built visually by drawing, and dragging and dropping buttons and fields. But one could also use the Hypertalk programming language to add more complex functions to the software. It become popular the end of the 1980s mainly due to it ease of use compared to other programming languages, and because at some point it was distributed for free will all Apple computers. The last release was in 1998.

A program written using HyperCard was called a Stack, and contained several cards. You can consider a card as a page, where you could draw using painting tools and add some elements such as buttons and text fields for entering data and interacting with the software. Then it was possible to program buttons to do action such as going to another card, displaying messages, processing the data that the user entered in the text fields, and playing sounds.

The concept of a stack of cards with links between them was very innovative and can be seen as a precursor of the Word Wide Web. Indeed, the authors of the Mosaic Web browser in the 1990s have indicated that it has inspired them. But the difference with the Web is that Web pages are on different computers, rather than being inside a program. It can also be seen as something similar to Powerpoint as cards could be viewed as slides, but Hypercard would allow more complex programming and was not designed for presentation.

Another innovative aspect of HyperCard was its programming language that was designed to be close to the English language to make it very to read and learn. For example, some code in a button would look like this:

On Mouseup
ask “What is your name”
put answ into field “output”
Go to next card
End Mouseup

This code is very simple and easy to understand even by someone who did not learn the Hypertalk language. When the user click the button, it displays a dialog asking to enter a name, and then the name is put in a text field called “output”. Then the next card is displayed. Clicking on a button could also create new cards. For example, one could write a software to manage contacts, where each card was storing contact information.

An address book Hypercard stack
hypercard stack
A battleship game stack

But one of the best thing aboutf Hypercard is that it was promoting open-source software. In fact, HyperTalk is an interpreted programming language, and the HyperCard software initially acted as both an authoring tool for developing software and a player for running the software. As a user, this concept was extremely interesting, as one could obtain a stack (a program) made by someone else, run the stack, and at anytime look at the code inside the buttons, fields and other objects to learn how it work and modify it. There was of course some ways to hide the code such as calling some binary code external to the stack (eg. XCMDs) of setting up a password, but by default, the code of a stack was open to anyone.

Because Hypercard was an interpreted language, it was not designed to run very fast but it allowed to easily built some software with graphical user interface, and that in the 1980s. Building a user interface with other programming languages was far from easy for novices. When I was 12 years old, I learn programming using HyperCard on a black and white Mac Computer with a 80 Mb Hard Drive and 2 Mb of RAM. During that year, I was in high school and took a week long summer camp to learn HyperTalk at a college during the summer and then bought a book to learn more. I then programmed a few interesting software programs:

  • House of horror 1 and 2. This was a video game where you had to enter an haunted house and click on the right doors to find the exit. Choosing the wrong door would show a monster and the player would loose. Creating this type of visual game with HyperCard was not that hard as one could draw on the cards. In the second version of the game, I made it more complicated by implementing a life bar such that one would not die right away after an attack by a monster. That software was then installed on some computer in a local school for kids to play.
  • A fighting game. I also programmed a simple fighting game using the keyboard. There was a few keys to punch, kick or block, with a life bar for the player and the opponent, which was controled by the computer. Both opponents could not move forward or backward but just kick, punch, block. There was three fighters, and it was inspired by the Street Fighters II game, popular in 1992.
  • Encryption software. I also developed a simple software for encrypting/decrypting messages using a password.
  • A software for playing mazes. The software would allow to load or save a maze. Then the maze was drawn on the screen. The user would have to drag the mouse inside the maze to reach the exit, while avoiding touching the walls.

Unfortunately, I don’t have a copy of these software programs anymore. They were on a 3.5 inch floppy disk, and such disk were not reliable. But anyway, it was just a fun experience and it does not really matter.

For those who wanted to play with Hypercard, it is still possible to use it inside an emular of a Macintosh computer with the System 7 operating system:

Another interesting thing about HyperCard is that it was basically designed by a single man: Bill Atkinson. This man is a legendary software developer. On the first Apple computer, I would open the MacPaint drawing software and see his name as the lead developer, and then open HyperCard and also see his name as the lead developer. He wrote a large part of these software by himself. Moreover, he designed core parts of the operating system of Apple computers such as the QuickDraw for drawing graphics on screen, the event manager and menu system. Bill Atkinson was a very smart man. He was actually almost completing PhD in neuroscience before being called by Steve Jobs to join Apple and write these software programs. For those interested, there are some videos of interviews with him available online.

See the source image

After learning Hypertalk, I learned many other programming languages, including Cobol, C, C++, Java, Assembly language for SPARC processors, and Lisp among others.

That is all for today. I wanted to share something a bit different on this blog this time. What is your first programming language? Or have you used Hypercard? If you want to share your experience, please post in the comment section below!

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

A brief report about the IEA AIE 2019 conference

I have just arrived Austria to attend the IEA AIE 2019 conference ( 32nd Intern. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems), which is held in Graz from the 9th to 11th July. In this blog post, I will give a report about the conference.

About the IEA AIE conference

It is a conference on artificial intelligence and applications that has been held for more than 30 years. The proceedings of IEA AIE 2019 are published by Springer in the Lecture Notes on Artificial Intelligence, which ensures good visibility for the papers.

I have attended this conference several times. You can read my reports about IEA AIE 2018 (Canada) and IEA AIE 2016 (Japan). And I also had papers at IEA AIE 2009, IEA AIE 2010, IEA AIE 2011 and IEA AIE 2014.

This year, 151 papers were submitted. From that 41 were selected as full papers, and 32 as short papers.


The IEA AIE 2019 conference was held in the city of Graz in Austria, and more precisely at the Graz University of Technology.

iea aie 2019 map of location

The Graz University of Technology:

Opening cemenony

The organizers first introduced the program of this year’s conference. Below is a picture of the general chair Prof. M. Ali. giving a few words, and then below a slide about statistics.

Keynote by Reiner John, titled “The 2nd wave of AI – Thesis for success of AI in thrustworthy, safety critical mobility systems”

The talk was about highly automated driving. It talked about challenges for highly automated riving (HAD), an architecture for HAD, the opportunities for AI in components and subsystems and how AI can participate in the system at an application level.

Some of the challenges are how to drive in extreme weather conditions. Humans often rely on experience, precaution, adaptation, training and foreseen scenarios, to handle difficult situations.

A car is a very complex system and AI can be used to control that complexity. Also safety is very important, as well as predictive maintenance. AI can be used to enhance safety, efficiency and functionality. Here is a pictures of some requirements for automated cars:

Another important aspects is connectivity between cars to collaboratively manage traffic. There was then, a lot more details but here I just report some main ideas.

Welcome reception

On the first evening, there was a nice welcome reception on the top of a building that belongs to the university. A dinner was served. Here are a few pictures:

Paper presentation

I am also excited to present a paper at this conference proposing a new model to discover stable periodic patterns in a sequence of transactions (transaction database). This paper, which was written by my student, received a best paper award. The solution based on the cummulative sum is quite innovative and could be extended other pattern mining problems. I will also release the source code soon in my SPMF software. You can read the paper here:

Fournier-Viger, P., Yang, P., Lin, J. C.-W., Kiran, U. (2019). Discovering Stable Periodic-Frequent Patterns in Transactional Data. Proc. 32nd Intern. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA AIE 2019), Springer LNAI, 14 pages (to appear)


On the evening of the second day, there was a banquet on the top of a hill with a good view of the city. The awards were announced.

Keynote by Dietmar Jannach

Prof. Jannach gave a talk about recommender systems. Recommender systems have numerous applications in our daily lives. They help to filter information and find relevant information. Research in tha field started as far as the 1970s with “Selective Dissemination of Information” and then “Collaborative filtering” and “content-based” approaches in the 1990s.

A common abstraction of the recommendation problem is to see it as a matrix completion task, where the goal is to learn a function to recommend that can be assessed using measures such as accuracy.

The above problem has been well-studied The topic of this talk is session-based recommendation where instead of a rating matrix, we have a sequentially ordered log of user interactions (item views, purchases, etc.). And in many cases, we don’t have a user id or long term preference information, etc. We also don’t know the user intent but want to predict the next user action(s) given his last actions (in the current session) and other types of information (community behavior etc.

How to solve these problems? Some method are to use association rules, markov chains, sequential rules, sequential patterns, neural networks, session-based nearest neighbors, etc.

A problem to evaluate session-based recommender system is that there is no standard benchmark protocols and datasets.

The speaker also mentioned that neural networks often do not perform much better than simple approaches.

There was then more details, but I will not report all in this blog post.

Next year: IEA AIE 2020

It was announced that IEA AIE 2020 will be held in Kitakyushu, Japan from 21st to 24th July. The website of IEA AIE 2020 is online already. I am one of the Program Chair of IEA AIE 2020, and I am looking forward to it.


The conference was good on overall. The organization was well done, and the location was interesting. I had a chance to meet several researchers that I knew beforehand and also meet some interesting researchers. Looking forward to next year!

Postdoctoral positions in data mining in Shenzhen, China (apply now)

The CIID research center of the Harbin Institute of Technology (Shenzhen campus, China) is looking to hire two postdoctoral researchers to carry research on data mining / big data.

Harbin Institute of Technology (Shenzhen)

An applicant:

  • must have obtained a Ph.D. in computer Science within the last 3 years,
  • must be less than 36 years old 
  • has a strong research background in data mining/big data or artificial intelligence,
  • have demonstrated the ability to publish papers in excellent conferences and/or journals in the field of data mining or artificial intelligence,
  • have an interest in the development of data mining algorithms and its applications,
  • can come from any country (but if the applicant is Chinese, s/he should hold a Ph.D. from a 211 or 985 university, or from a university abroad).

The successful applicant will:

  • work on a data mining project that could be related to sequences, time series and spatial data,  or some other topics related to data mining with both a theoretical part and an applied part (the exact topic will be open for discussion to take advantage of the applicant’s strengths),
  • join an excellent research team, led by Prof. Philippe Fournier-Viger, the founder of the popular SPMF data mining library, and have the opportunity to collaborate with researchers from other fields,
  • will have the opportunity to work in a laboratory equipped with state of the art equipment (e.g. very expensive workstations, a cluster of severs to carry big data research, GPU servers, virtual reality equipment, body sensors, and much more).
  • will be hired for 2  years, at a salary of 231,600 RMB  / year  ( 51,600 RMB from the university + 180,000 RMB from the city of Shenzhen) .  Note that there the post-doctoral researcher will pay no tax on the salary, and that an apartment can be rent at a very low price through the university depending on availability (around 1500 RMB / month, which saves a lot of money).
  • work in one of the top 50 universities in the field of computer science in the world, and one of the top 10 universities in China.
  • work in Shenzhen, one of the fastest-growing city in the south of China, with low pollution, warm weather all year, and close to Hong Kong.

If you are interested by this position, please apply as soon as possible by sending your detailed CV (including a list of publications and references), and a cover letter to Prof. Philippe Fournier-Viger:  It is possible to apply for year 2020.

Correlation does not imply causation

There is a well known principle in statistics that correlation does not imply causation. It means that even if we observe that two variables behave in the same way, we should not conclude that the behavior of one of those variables is the cause (or is related) to the other.

In statistics and data mining, we can calculate the correlation between two variables or time series to see if they are correlated. The range of values for the correlation is usually [-1,1] where -1 indicates a negative correlation (two variables that behave in opposite ways, 0 indicates no correlation, and 1 indicates a positive correlation. Two variables that have a high correlation may be related. But if two variables have a high correlation but are not related, they are called a spurious correlations.

To be convinced of the principle that correlation does not imply causation, I will share a few examples from a very good website on this topic ( ), that lists thousands of spurious correlations.

spurious correlation
Correlation of 0.78
spurious correlation 2
Correlation of 0.66
spurious correlation 3
Correlation of 0.99
spurious correlation of time series

Obviously, these correlations are totally spurious although the variables show very similar behavior. This shows the needs to always look further than just using a correlation measure.

Those are just a few example of spurious correlations. If you try the website, you can also browse various variables to find other spurious correlations.


In this short blog post, I shown a few examples of spurious correlations at I think it is quite interesting. If you have comments, please share them in the comments section below.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Too many machine learning papers?

A few days ago, I have read a post on LinkedIn showing that the number of Machine Learning (ML) papers has been increasing very quickly over the last few years to about 100 ML papers per day (on Arxiv, a popular public repository of research papers).

growth of machine learning papers
Chart obtained from LinkedIn (a reader pointed out that it is from )

That is about 33,000 papers per year. This shows the excitement about the new advances in that field in particular with respect to deep learning that has lead to obtaining good results for various applications. Some people on LinkedIn wondered if there are too many ML papers and how they could keep up with advances in that field.

I will make a few comments about this.

  • First, in general in computer science, the papers that present a major innovation or breakthrough are few. There is always a lot of papers that make incremental advances by simply reusing ideas with some small modifications, or that just focus on applications rather than on fundamental problems. In fact, generally, few papers are highly cited while many paper receive few citations. Thus, although there may be a great increase in ML papers, one can ignore a huge amount of low quality papers. It is thus important to learn some strategies to detect low quality papers such as looking at the reputation of conferences and journals where papers are published and other criteria such as paper citation count.
  • Second, the large increase of ML papers result in a huge demand to review ML papers but a problem is that there is perhaps not enough experts to read those. I can share some story related to that. Recently, I have been invited to join the program committee of a good neural network conference. Honesty, I was surprised because I have never published there, and I have never made any significant contributions in that field. I have used neural networks as a tool with other techniques in an applied paper about 4 years ago but that is all, and it should not count. Thus, I tend to think that there is not enough expert reviewers and they may have invited many researchers such as me because I work on data mining, which is related. I also noticed an increase in the number of invitation to review ML papers for journals in my mailbox. But honestly, I rarely accept these invitations because it is not much related to my research. If there is not enough reviewers though, this may just be a temporary problem.
  • Third, due to the increasing number of papers, some conferences on related or overlapping topics such as database or data mining start to receive many ML papers. There is generally no problem about that. But in some cases, these papers are inadequate for the topic of the conference. For example, this year, a conference that I will not name related to databases, clearly mentioned to reviewers that if a paper is on ML and they do not understand the content or it doesn’t seem interesting to the target audience, then to not  recommend these ML papers for acceptance. As always, it is important to choose a relevant conference when submitting a conference paper (for papers on any topics).
  • Fourth, ML has currently a lot of hype because of some excellent results obtained for applications such a computer vision and translation. Should there be so many researchers working in that area? I do not have the answer but it is a question that is worthy to be asked. For example, I know that in some university, more than 50% of graduate students are now working on deep learning. But it remains that deep learning cannot solve all the problems of computer science, and many other research areas still have complex challenges to address. Also there is always some trends in research that come and goes every few years. For example, a technique like SVM was quite popular 10 years ago but now is less than deep learning. Neural networks have also had cycles of popularity over the last forty years. As an individual, it can be good to somewhat follow the trends to take advantage of opportunities, or at least be aware of them.


In this short blog post, I have just shared a few comments and observations related to the ML trend. If you have other comments, please share them in the comments section below. I will be happy to read them.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Unethical reviewers in academia

In this blog post, I will discuss about the importance of an ethical review process in academia, and the problem of unethical reviewers. I will share some stories about some unethical reviewers in journals and conferences.

Peer review in academia

The process of peer review in academia consists of several researchers that evaluate the work of other researchers to determine if it should be published, revised or rejected.

Peer review is important because it acts as a filter to ensure the quality of papers that are published. For conferences, the goal of peer review is also to rank the papers to select the best one to be published.

In the best case, the peer review process is fair and the best papers that are the most worthy of being published are published. But this is not always the case. One of the reason is that the opinion of reviewers is sometimes subjective. But sometimes, it is also due to some unethical behaviour. I will discuss this problem in more details.

Case 1. Reviewers who ask authors to cite their paper to increase their citation count

This is one of the problem that I see quite often in academia. It happened to me several times that after submitting a paper to a journal, a reviewer would ask me to cite 3 to 10 of his papers as a condition for accepting the paper. Of course the review is anonymous, but when the reviewer asks to cite several papers by a same author and these papers are not really related to the topic, it is quite obvious that this author is the reviewer. In an extreme case, I saw a reviewer asking to cite 10 papers, and I complained to the editor of that journal. But it did not appear to have much effect.

As a program committee member of a good conference, I once saw an anonymous reviewer who reviewed several papers, and in each of his review was systematically asking the author to cite his paper(s). This is unprofessional.

Case 2. Editor who ask authors to cite his papers or papers from his journal

Yes, sometimes, it is the editor that directly asks that an author cites his papers (!) This is surprising but it happened to me and my collaborators at least twice. In that case, the editor seemingly wants to increase his citation count. In some case, the editor also asks to cite papers from his own journal to increase its impact factor.

This type of behaviour is very serious and has lead some journals from famous publishers to be banned from journal indexes. For example, the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS was banned from JCR (Journal Citation Report) in 2015 for “citation stacking”. A special committee was set up by the IEEE to oversee that journal and the IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS for a similar reason ( more details here: ).

As an author, if a reviewer is unethical, you can complain to the editor, but if the editor is unethical, then this is a difficult situation to handle. And this can happen even in journals published by some famous publishers.

Case 3. Reviewer reviewing his own papers or those of his friends to accept them

This is another type of cheating in academia. As a program committee member of a conference, I have once found that a reviewer had created two accounts and was reviewing his own paper with a slightly different name. I then reported him to the conference organizers who banned him from the conference. This is relatively easy to detect. But it becomes more difficult to detect such problem when some person review the papers of his friends instead of his own papers. In some top conferences, I have heard rumors that some authors were doing this type of cheating.

Case 4. Reviewer rejecting papers because of a conflict of interest.

Another problem in academia is that a reviewer may reject a paper just because it is in conflict with his own research. For example, an unethical reviewer may reject a paper because he does not want someone else to publish on a topic before him. This is unethical, but it does happen, and as an author there is not much that one can do because usually reviewers are anonymous.

Case 5. Reviewer who disclose publicly an unpublished paper, or to his collaborators

A reviewer should always ensure that unpublished papers remain confidential and are not leaked to the public. But this is not always the case. I have found this the hard way around 2012, when I submitted my TRuleGrowth paper to the PKDD conference. My paper was rejected, but by searching on Google, I found that the paper that I had submitted was publicly available on the website of the reviewer. I then contacted the PKDD organizers to complain about that reviewer who leaked my unpublished paper. Then the reviewer said sorry and that he just put the paper on his webserver because he was travelling and did not expect it to appear in Google…

In some cases, an unethical reviewer will also send unpublished papers to his collaborators.


The peer-review process is very important in academia. Although some authors are unethical, it also happens that reviewers and editors may also be unethical. In this blog post, I have discussed several such scenarios that I have noticed or heard of. Of course, a researcher should always have an ethical behavior and avoid cheating. If you want to share your own experiences, please post them in the comment section below. I would like to hear your stories too.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

China International BigData Industry Expo 2019 (a brief report)

This week I am attending the 2019 China International Big Data Industry Expo (CIBD 2019), held in Guiyang, China. I will report on the event on this blog. The event is from May 26-29.

Why this event is important?

The China International Big Data Industry Expo is a huge event, and the biggest related to big data in China. This year 448 companies are participating, including over 150 foreign companies such as SAS and Microsoft, and major Chinese companies like Tencent and Huawei. The exhibition space is more than 60,000 square meters and more than 1700 foreign visitors from 38 countries are attending. In previous years many leaders of the Chinese industry have also given talks at this expo such as Pony Ma and Jack Ma.

Why I attend?

It is an excellent event to connect with the industry and see the trends and recent innovations related to big data, and also to learn about new government policies. I have attended CIBD 2018 last year (report about CIBD 2018 here), and I think it was a great event. I attend as VIP guest.

Why is it held in the city of Guiyang?

I will explain briefly. Guiyang is located in the province of Guizhou in China. Historically, Guizhou is not one of the richest provinces in part due to its location a bit far from the coast. However, a key feature of the region is its large water and electricity supply, cool weather, and it is located in a stable geological area. All these factors are highly desirable for setting up large data centers for storing big data. For this reason, it has been selected as a key city for the development of the big data industry in China. Huge government incentives are in place to transform Guiyang into the Chinese city of big data. Due to this, it has grown very fast in recent years. Numerous large international and Chinese companies have data centers in Guiyang such as Apple, Tencent, and Alibaba. It is said that more than 1600 big data companies are now operating in Guiyang, generating a yearly revenue of more than 15 billions USD. The GDP of the city is also growing very fast (increased by 10 percent last year!). It is thus a very interesting place for everything about big data. The Big Data Expo is held every year in Guiyang around the end of May.

Location of Guiyang in China

Theme: Data creates values

This year, the expo has a special theme on the applications of big data. Beside the exhibition, 49 forums, and several talks, conferences and other activities are held. Some of the topics that are going to be discussed are big data, AI, self driving cars, security, data science, 5G, intelligent manufacturing,  blockchains, and smart cities.

Some announcements are also expected about new policies in Guiyang to attract talents, and the growth of the Shubo Avenue, a novel district in Guiyang for big data companies and projects that is receiving major investments.

2019 China International Big Data Fusion and AI Global Competition

On the 25th May afternoon, I attended this competition, which was held at the Empark hotel, and sponsored by Intel. The format of this competition is quite interesting with a set of 9 judges evaluating competitors, followed by an award ceremony. The judges included Prof. Jian Pei, King Wang (Tencent cloud), and others. Each competitor team had 8 minutes for presenting his project and answer questions from judges. The event was very well organized, offering simultaneous translation from Chinese to English, which makes it accessible to non Chinese speakers.

The first team was from Israel, a company called Keepod. They mentioned that 4 billion persons don’t have access to personal computing (excluding mobile devices) and the solution is not to buy a computer to each one. They instead propose to distribute an encrypted USB to each student that contains data and applications so that many people can share the same computers by just plugging their USB to a computer to work and then leave with their USB. The project is used in Cameroon and other countries.

The second team is a startup iSpace that relies on AI. They develop an advanced recovery control system and fault analysis for rockets. The system appeared interesting but the presenter spoke very fast and changed slides sometimes very quickly. In my opinion, they should have the presentation more succinct rather than try to show too much in a short time. But the technology looks great.

The third team is a company from Beijing working on AR (augmented reality). They mention that resolution of AR glasses is important. They developed prototypes of advanced AR glasses, that can have various applications such as for military. They are focused on the hardware solution and optics.

The next company is Braid (不来赛德) from Shenzhen, and relies on AI for industrial projects. They use knowledge and concept graphs, deep learning and other technologies. Some of their projects is related to analyzing transaction data from stock markets.

The next company is TrueMicro. They work on low power chips. One of their product is a computing stick called Movidius. One application is for traffic monitoring. They also develop chips based on RISC-V architecture for AI. They supply chips for some Huawei servers and 5G base stations. They also provide ASIC and FPGA.

The next team is Pzartech from Israel. Its goal is to provide solutions to reduce the downtime of complex mechanical systems such as engines of airplanes. In fact, if an airplane has a problem then it cannot fly until it is fixed and money is lost. The proposed solution uses image treatment, deep learning and semi synthetic data generation. A technician that repairs an engine takes a picture of a part with his cellphone to find information about the part such as its name, which greatly helps to fix a system more quickly. It is basically and object recognition problem supported by the cloud.

The next company works on IoT with 5G technology, and is named CranCloud. It works on base stations. They work on integrated solutions rather than only chips.

The next company is related to AI for smart security checks. Mostly, they have solutions for the analysis of pictures or videos based on AI, such as to analyze pictures from security checks at airports, or pictures of parcels send through mail. They use labelled data from train or subway security checks. They aim to detect forbidden objects such as lithium batteries.

The next company is about computer vision with AI. They mention that there are many applications of computer visions/ They discussed some applications such as intelligent security checks and intelligent kitchen. They propose an algorithm platform named Extreme Vision for vision recognition, which has more than 500 algortihms. Some applications are fire detection or detecting that construction workers don’t wear helmets. One of the judges mentioned that there are already many AI vision companies. The presenter explained that they provide a platform to facilitate the development of AI vision solutions.

The last company is a Shanghai based company, also working on deep learning technology for image processing and other related topics, which collaborates with Huawei, Xiaomi, Toyota and Apple. They have a transportation big data platform, and analyze data from vehicles to improve self driving cars, among other projects. They also have technology to analyze industrial parts. Their business model is to sell license for their software.

The judges then provided some general comments. One of the comment is that many teams were focusing on computer vision with AI, and solutions for this type of problems have become quite mature, and perhaps that it is important to focus on specific applications such as security checks for this type of project. Moreover, a judge was also happy to see the more fundamental research such as on chips. There was also several other comments.

The awards were then presented. Keepod, Braid, and Pzartech received some “access” awards. Three companies received a “innovation award” such as the Extreme vision platform company from Shenzhen. Finally, the top three winners were announced. The third prize was to TrueMicro, the first prize was to iSpace, and the second prize was to CranCloud. I perhaps missed a few details about the awards and may not be totally accurate.

Opening ceremony

The opening ceremony was held on the 26th May at the Guiyang International Eco Conference Center.

Several leaders from the Chinese government were present such as:

Wang Chen, Member of the Political Bureau of the CPC Central Committee, Vice Chairman of the Standing Committee of the National People’s Congress, Miaowei, Minister of Industry and Information Technology, Guo Zhenhua, Deputy Secretary-General of the Standing Committee of the National People’s Congress, Yang Xiaowei, Deputy Director of the National Internet Information Office, Rongfa, Vice-Director of the State Administration of Taxation, Xianzude State Statistics Bureau, Wang Mingyu, Vice-Governor of Liaoning Province

and representatives and CEOs from many companies.

A letter from the Chinese president Xi Jinping supporting the expo was read.

and a letter from the secretary general of United Nations:

It was mentioned during the ceremony that some goals are to support big data companies, and the recruitment of talents, how big data can support the industry, how to ensure security of the data, build core technology, how to design regulations about how data is handled.

Paul M. Romer winner of the 2018 Nobel prize of economics gave a talk. He talked about the concept of cyber sovereignty, that is that each country should be able to regulate the Internet. He mentioned that in some countries like USA, what is good for firms often take over was is good for society. The most common business model is targeted advertisement and the user often don’t know about the data they are giving. He talked about other things such as implementing big data for road networks to improve people s lives using big data.

There was then a talk by Whitfield Diffie, famous cryptography specialist and Turing award winner. He first mentioned that 5000 years ago, and now we are moving our culture in the cloud. He mentioned that computers were designed for big data, as the properties of big data such as variety have always been there. For big data, we need computers to store and process data. He defined artificial intelligence as using computers to do things that people used to do such a playing Chess, Go, translation and autonomous driving. For him the most important aspect of AI is to leverage huge amount of data to think about things that people cannot think about. He also talked about cyber-security. Information corrupted (integrity) and we need to know the source (authenticity). Confidentiality of data is important and depends on authenticity (for example, phishing websites). Big data can be used to reduce security, But AI can give new techniques of controlling computers that may improve security. Big data security depends on the control of input data, the mining process, and the results. Big data will be everywhere in our society and its security is crucial.

He shown a quote from the Chinese president:

There was then a talk by Prof Gao Wen From Beijing University. He talked that a fourth industrial revolution may happen in 10 years, where artificial intelligence would play the key role. He talked about weak vs strong AI and that technology we have today is weak AI. He also mention that AI has evolved from coding in the 1970s to expert systems with rules in the 1980s to now deep neural nets trained using big data (in terms of trends). He mentioned that having data is key to doing big data research so working with companies is good for academia. He thinks that computer that we use today will not be able to achieve strong AI, for example the brain consumes much less energy than a computer, so we should reach that efficiency. An advantage of China is the large amount of data. Open source platforms are great for advancing technology. China would benefit from developing its own open source platforms, and having more AI specialists.

The Industry Expo

The Interactive Art Exhibition

There was also a very interactive art exhibition where people could interact with art using technology. Here are a few pictures.

The Big Data Concert

In the evening, I was invited to a great Big Data concert by the Symphony Orchestra of Guiyang.

The Belt and Road Forum

I attended the “Belt and Road” Big Data Innovation Entrepreneurship Forum. There was several guests.

From the industry, Johannes Vizethum from Advisory Allies gave a talk about the applications of AI with big data, and about how AI can benefit to the industry. He discussed several use cases, including some augmented reality system to help repair cars. This system can recognize pieces of a car using image processing. Another system can evaluate the fatigue level or productivity of workers from video cameras. Another use case is intelligent tools for construction sites. Here are a few interesting slides related to AI:

The next speaker was Michael Eagleton and is presentation was called “Together we build prosperity”. He is now living in Shenzhen and involved in several business including his own Shenzhen Xinshunao Co. Ltd. He first talked about what is big data, and how internet is important in our daily lives. He has shown statistics indicating that 97.2% of organizations are investing in AI and big data, but it is not clear if these statistics are from China, USA or other places. He indicated that according to Wikibon the big data and analytics market is worth 49 billion $. Moreover, he shown statistics from Statista indicating that the big market is expected to grow by 20 %. He also cited Forbes/IBM, which says that data science and analytics jobs will reach 2.7 million by 2020 for the world, and that there is a gap between offer and demand, and more talents are needed on the job market. According to Domo, every person will generate 1.7 MB of data per second in 2020. He mentioned that automated analytics will be crucial in the future, and that all countries should collaborate to build the future.

The next speaker was Philip Beck and his talk was “Expand your business with Big Data”. He is an “angel investor”, who has worked a lot on marketing, and who has been living in China for more than a decade. He mentioned that for a business, some customers are more valuable (e.g. spend more money) than others, and that with big data, we can understand what characterize the most valuable customers. He mentioned how mobile payment systems like Alipay and Wechat Pay are widely used in China, and that the data collected from these systems can be used for marketing (e.g. sending targeted advertisements to some specific customers).

The next speaker was David Kovacs, director of CDSI Startup Campus Global. He first mentioned the strong relations between China and Europe. He also mention that the regulations and technology in China are suitable for innovation, and there are a lot of support from the government.

Then, there was a talk by Marian Danko founder and CEO of weHustle and TECOM Conf Startup Grind. This latter company provides a platform to connect entrepreneurs to share knowledge, experience and support each other. They also organize events for entrepreneurs to meet others.

The next speaker was Rasmus Rasmusson, Founder and CEO of VARM. They are working on a machine for people who can’t cook. It can steam, cook, fries, etc., and it is very precise and can cook many types of food. The goal is to Let the user buy a pack of food and press a single button to cook it. The machine would download a cooking program from the cloud to know how to cook the food perfectly. He mentioned that the machine can collect a lot of data about users such as what they eat and when.

The next speaker is Yoann Delwarde about turning buzzwords such as AI and big data.into business. He has a software company named Bassetti. He mentioned that 50 to 90 percent of startups fail within five years because the lack of market need or they run out of funding. He gave various advices for startups to succeed.

The last speaker was Adam Rush. He talked about opportunities from the Belt and road program, and has shown various statistics.


In the evening there was a nice banquet at the 29th floor of the Novotel.

About the rapid growth of Guiyang

Here are some interesting facts about the growth of Guiyang (pictures from the Guiyang Today newspaper).

and it is interesting to see goals for the growth of the city for the next 15 years, announced during the Fifth plenary session of the tenth Guiyang Municipal Committee of the CPC:


In this blog post, I have talked about the CIBD2019 expo. It was a great event! Hope you have enjoyed reading about it!  If you have comments, please share them in the comment section below. I will be happy to read them.

Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.