Plagiarism by Nasreen Ali A and Arunkumar M at Ilahia College of Engineering and Technology

I have recently found another case of plagiarism from India. It is by  two professors from the Ilahia College of Engineering and Technology  named Nasreen Ali A ( )  and Arunkumar M at Ilahia.

The plagiarized paper

The plagiarized paper is the following:

asreen Ali A.1 , Arunkumar M. Mining High Utility Itemsets from its Concise and Lossless Representations. IOSR Journals (IOSR Journal of Computer Engineering), 1(17), pp.8-14. ( )

Arunkumar M and Nasreen Ali A Ilahia College of Engineering

Plagiarized paper by Arunkumar M and Nasreen Ali A at Ilahia College of Engineering

The paper plagiarized my paper describing the FHM algorithm, which was published at ISMIS 2014.  Basically, these two authors have copied the algorithm and claim that it is their algorithm. They even claim that they have proposed the new EUCP strategy which was proposed in my ISMIS 2014 paper.

This is unacceptable. I will thus contact the editor so that this paper will be removed from this journal named IOSR Journal of Computer Engineering. Moreover, I will also contact the head of this college called the  Ilahia College of Engineering and Technology/ Mahatma Gandhi University, India to report this serious case of plagiarism.

About Nasreen Ali A and Arunkumar M

Who are these two professors named Nasreen Ali A and Arunkumar M who have plagiarized my paper?  I have done a quick search, and according to the website of their university, here are their pictures:

Plagiarists: Nasreen Ali A and Arunkumar M from Ilahia college of engineering

Plagiarists: Nasreen Ali A and Arunkumar M from Ilahia college of engineering

The webpage of the Ilahia College of Engineering and Technology is :

Ilahia College of Engineering and Technology

Ilahia College of Engineering and Technology

About the journal “IOSR Journal of Computer Engineering” 

The journal where the plagiarized paper was pulished is called ( IOSR Journal of Computer Engineering ). It appears to be some unknown journal that charge a fee to get published and obtain a certificate of publication.  I had never heard of that journal or publisher before.  Since they are publishing plagiarized papers, it raises

IOSR Journal of Computer Engineering (IOSR-JCE)

IOSR Journal of Computer Engineering (IOSR-JCE)

About plagiarism in India

Unfortunately, it is not the first time that some researchers from India have plagiarized my papers. I think that it happened three times already only in 2016, and at least five or six times in the last two or three years.  Plagiarism is of course not limited to India. My papers have been plagiarized also by people from France, Algeria, and other countries. And it should be said that there are many very good Indian researchers and universities. From my experience, it seems that the plagiarism issue in India is mostly in small universities.


I will keep you updated about what happens about this case of plagiarism. I am looking forward to see what the Ilahia College of Engineering will do, for these two professors who have committed plagiarism.

Posted in Uncategorized | Leave a comment

Postdoctoral position in data mining in Shenzhen, China (apply now)

The Center of Innovative Industrial Design of the Harbin Institute of Technology (Shenzhen campus, China) is looking to hire a postdoctoral researcher to carry research on data mining / big data.

Harbin Institute of Technology (Shenzhen)

The applicant must have:

  • a Ph.D. in computer Science,
  • a strong research background in data mining/big data or artificial intelligence,
  • demonstrated the ability to publish papers in good conferences or journals in the field of data mining or artificial intelligence,
  • have an interest in the development of data mining algorithms and its applications,
  • can come from any country (but if the applicant is Chinese, s/he should hold a Ph.D. from a 211 or 985 university).

The successful applicant will:

  • work on a data mining project related to sequences, time series and spatial data, with both a theoretical part and an applied part related to industrial design (the exact topic will be open for discussion to take advantage of the applicant strengths),
  • join an excellent research team, led by Prof. Philippe Fournier-Viger, the founder of the popular SPMF data mining library, and have the opportunity to collaborate with researchers from other fields,
  • will have the opportunity to work in a laboratory equipped with state of the art equipment (e.g. brand new powerful workstations, a cluster to carry big data research, virtual reality equipment, body sensors, and much more).
  • will be hired for up to 2  years, at a salary of 51,600 RMB / year  (from the university) + 120,000 RMB / year (from the city of Shenzhen) = 171,600 RMB. Moreover, an apartment can be rent at a very low price through the university.
  • work in one of the top 50 universities in the field of computer science in the world, and one of the top 10 universities in China.
  • work in Shenzhen, one of the fastest-growing city in the south of China, with low pollution, warm weather all year, and close to Hong Kong.

If you are interested by this position, please apply as soon as possible by sending your detailed CV (including a list of publications and references), and a cover letter to:  Note that it is possible to apply for the year 2016-2017 or for the year 2017-2018.


Posted in artificial intelligence, Big data, Data Mining, Research | Leave a comment

The KDDCup 2015 dataset

The KDD cup 2015 dataset is about MOOC dropout prediction. I have  had recently found that the dataset had been offline on the official website. Thus, I have uploaded a copy of the KDD cup 2015 dataset on my website. You can download it below.

I don’t have other information on this dataset besides what is provided above. If you want to share ideas with others or ask questions you can use the comment section below.

Hope that the dataset is useful! 🙂

If you like this blog, you can subscribe to the RSS Feed or my Twitter account ( to get notified about future blog posts.  Also, if you want to support this blog, please tweet and share it!

Posted in Big data, Data Mining, Data science, Research | Leave a comment

What not to do when applying for a M.Sc. or Ph.D position?

This brief blog discusses what not to do when applying for a M.Sc. or Ph.D. position in a research lab. The aim of this post is to give advices to those applying for such positions.

I had previously discussed about this topic in another blog post, where I explained that it is important to send personalized e-mails to potential supervisors rather than sending the same e-mail to many professors. I will thus not explain that again.

In this post, I will rather emphasizes another important aspect, which is to give a good impression of yourself to other people. I discuss this by using an e-mail that I have received today:

From *****@***.sa
Subject: Apply For Scholars ship Ph.d

Sir Philippe Fournier

A person sending this type of e-mails has zero chances of getting a position in my team. Why?

  • It is poorly written. There are many typos and English errors. If this person cannot take the time to write an e-mail properly, then it gives the impression that this person is careless, and would do a bad job.
  • The person did not take the time to spell my full name correctly. Not spelling the name properly shows a lack of respect or shows carelessness. This is something to absolutely avoid.
  • The person asks  to work on web security. I have never published a single paper on that topic. Thus, I would not hire that person. This person should contact a professor working on such topics.
  • The applicant does not provide his CV and very few information about himself. He mentions that he has two publications. But it does not mean anything if I don’t know where they have been published. There are so many bad conferences and journals. An applicant should always send his CV, to avoid sending e-mails back and forth to obtain the information. I will often not answer the e-mails if there are no CV attached to the e-mail, just because it wastes time to send e-mails to ask for the CV, when it should be provided. Besides, when a CV is provided, it should be detailed enough. It is even better when a student can provides transcripts showing his grades in previous studies.
  • The applicant does not really explain why he wants to work with me or how he has found my profile. For a master degree, this is not so important. But when applying for a Ph.D., it is expected that the applicant will chose his supervisor for a reason such as common research interests (as I have discussed above).

That is all I wanted to write for today.

If you like this blog, you can subscribe to the RSS Feed or my Twitter account ( to get notified about future blog posts.  Also, if you want to support this blog, please tweet and share it!

Posted in General, Research | Leave a comment

Tutorial: Discovering hidden patterns in texts using SPMF

This tutorial will explain how to analyze text documents to discover complex and hidden relationships between words.  We will illustrate this with a Sherlock Holmes novel. Moreover we will explain how hidden patterns in text can be used to recognize the author of a text.

The Java open-source SPMF data mining library will be used in this tutorial. It is a library designed to discover patterns in various types of data, including sequences, which can also be used as a standalone software, and to discover patterns in other types of files. Handling text documents is a new feature of the most recent release of SPMF (v.2.01).

Obtaining a text document to analyze


The first step of this tutorial is to obtain a text document that we will analyze. A simple way of obtaining text documents is to go on the website of  Project Gutenberg, which offers numerous public domain books.  I have here chosen the novel “Sherlock Holmes: The Hound of the Baskervilles” by Arthur Conan Doyle.   For the purpose of this tutorial, the book can be downloaded here as a single text file: SHERLOCK.text.  Note that I have quickly edited the book to remove unnecessary information such as the table of content that are not relevant for our analysis. Moreover, I have renamed the file so that it has the extension “.text” so that SPMF recognize it as a text document.

Downloading and running the SPMF software

The first task that we will perform is to find the most frequent sequences of words in the text. We will first download the SPMF software from the SPMF website by going to the download page.  On that webpage, there is a lot of detailed instructions explaining how the software can be installed. But for the purpose of this tutorial, we will directly download spmf.jar, which is the version of the library that can be used as a standalone program with a user interface.

Now, assuming that you have Java installed on your computer, you can double-click on SPMF.jar to launch the software. This will open a window like this:


Discovering hidden patterns in the text document

Now, we will use the software to discover hidden patterns in the Sherlock Holmes novel. There are many algorithms that could be applied to find patterns.  We will choose the TKS algorithm, which is an algorithm for finding the k most frequent subsequences in a set of sequences.  In our case, a sequence is a sentence. Thus, we will find the k most frequent sequences of words in the novel. Technically, this type of patterns is called skip-grams. Discovering the most frequent skip-grams is done as follows.

A) Finding the K most frequent sequences of words (skip-grams)


  1. We will choose the TKS algorithm
  2. We will choose the file SHERLOCK.text as input file
  3. We will enter the name  test.txt as output file for storing the result
  4. We will set the parameter of this algorithm to 10 to find the 10 most frequent sequences of words.
  5. We will click the “Run algorithm” button.

The result is a text file containing the 10 most frequent patterns


The first line for example indicates that the word “the” is followed by “of” in 762 sentences from the novel. The second line indicates that the word “in” appears in 773 sentences from the novel. The third line indicates that the word “the” is followed by “the” in 869 sentences from the novel. And so on. Thus we will next change the parameters to find consecutive sequences of words

B) Finding the K most frequent consecutive sequences of words (ngrams)

The above patterns were not so interesting because most of these patterns are very short. To find more interesting patterns, we will set a minimum pattern length for the patterns found to 4. Moreover, another problem is that some patterns such as “the the” contains gaps between words. Thus we will also specify that we do not want gaps between words by setting the max gap constraint to 1.  Moreover, we will increase the number of patterns to 25. This is done as follows:


  1. We set the number of patterns t0 25
  2. We set the minimum length of patterns to 4 words
  3. We require that there is no gap between words (max gap = 1)
  4. We will click the “Run algorithm” button.

The result is the following patterns:


Now this is much more interesting. It shows some sequences of words that the author of Sherlock Holmes tends to use repeatedly.  The most frequent 4-word sequences is “in front of us” which appears 13 times in this story.

It would be possible to further adjust the parameters to find other types of patterns. For example, using SPMF, it is also possible to find all patterns having a frequency higher than a threshold. This can be done for example with the CM-SPAM algorithm. Let’s try this

C) Finding all sequences of words appearing frequently

We will use the CM-SPAM algorithm to find all patterns of at least 2 words that appear in at least 1 % of the sentences in the text. This is done as follows:


  1. We choose the CM-SPAM algorithm
  2. We set the minimum frequency to 1 % of the sentences in the text
  3. We require that patterns contain at least two words
  4. We will click the “Run algorithm” button.

The result is 113 patterns. Here is a few of them:


Here there are some quite interesting patterns. For example, we can see that “Sherlock Holmes” is a frequent pattern appearing 31 times in the text, and that “sir Charles” is actually more frequent than “Sherlock Holmes”. Other patterns are also interesting and give some insights about the writing habits of the author of this novel.

Now let’s try another type of patterns.

D) Finding sequential rules  between words 

We will now try to find sequential rules. A sequential rule X-> Y is sequential relationship between two unordered sets of words appearing in the same sentence. For example, we can apply the ERMiner algorithm to discover sequential rules between words and see what kind of results can be obtained. This is done as follows.


  1. We choose the ERMiner algorithm
  2. We set the minimum frequency to 1 % of the sentences in the text
  3. We require that rules have a confidence of at least 80% (a rule X->Y has a confidence of 80% if the unordered set of words X is followed by the unordered set of words Y at least 80% of the times when X appears in a sentence)
  4. We will click the “Run algorithm” button.

The result is a set of three rules.


The first rule indicates that every time that 96 % of the time, when “Sherlock” appears in a sentence, it is followed by “Holmes” and that “Sherlock Holmes” appeared totally 31 times in the text.

For this example, I have chosen the parameters to not obtain too many rules. But it is possible to change the parameters to obtain more rules for example by changing the minimum confidence requirement.

Applications of discovering patterns in texts

Here we have shown how various types of patterns can be easily extracted from text files using the SPMF software. The goal was to give an overview of some types of patterns that can be extracted. There are also other algorithms offered in SPMF, which could be used to find other types of patterns.

Now let’s talk about the applications of finding patterns in text. One popular application is called “authorship attribution“. It consists of extracting patterns from a text to try to learn about the writing style of an author. Then the patterns can be used to automatically guess the author of an anonymous text.


For example, if we have a set of texts written by various authors, it is possible to extract the most frequent patterns in each text to build a signature representing each author’s writing style. Then, to guess the author of an anonymous text, we can compare the patterns found in the anonymous text with the signatures of each author to find the most similar signature.  Several papers have been published on this topic. Besides using words for authorship attribution, it is also possible to analyze the Part-of-speech tags in a text. This requires to first transform a text into sequences of part-of-speeches. I will not show how to do this in this tutorial. But it is the topic of a few papers that I have recently published with my student. We have also done this with the SPMF software. If you are curious and want to know more about  this, you may look at my following paper:

Pokou J. M., Fournier-Viger, P., Moghrabi, C. (2016). Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams. Proc. 29th Intern. Florida Artificial Intelligence Research Society Conference (FLAIRS 29), AAAI Press, pp. 86-91.

There are also other possible applications of finding patterns in text such as plagiarism detection.


In this blog post, I have shown how open-source software can be used to easily find patterns in text.  The SPMF library can be used as a standalone program or can be called from other Java program.  It offers many algorithms with several parameters to find various types of patterns.  I hope that you have enjoyed this tutorial. If you have any comments, please leave them below.

Philippe Fournier-Viger is a full professor and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Big data, Data Mining, Data science, open-source | 5 Comments

Why I left Canada to work as a University Professor in China

One year and a half ago, I was working as a professor at a university in Canada. But I took the decision to not renew my contract and move to China. At that time, some people may have thought that I was crazy to leave my job in Canada since it was an excellent job, and I also had a house and a car. Thus, why going somewhere else? However, as of today, I can tell you that moving to China has been one of the best decision that I ever took for my career. In this blog post, I will tell you my story an explain why I moved there. I will also compare the working conditions that I had in Canada with those that I have now in China.

China flag

Before moving to China

After finishing my Ph.D in 2010, I have worked as a post-doctoral researcher in Taiwan for a year. Then, I came back to Canada and worked as a faculty member for about 4 years there. However, in Canada, the faculty positions are very rare.  When I was in Canada, I was hoping to move to another faculty position closer to my hometown to be closer to my family but it has been almost impossible since there are about only five faculty positions that I could apply related to my research area in computer science, every year, for the whole country!  Thus, getting a faculty position in Canada is extremely difficult and competitive. There are tons of people applying and very few positions available.

I had several interviews at various universities in Canada. But getting a faculty position in another university in Canada was hard because of various reasons. Sometimes a job can be announced but the committee can already have someone in mind or may prefer some other candidates for various strange reasons. For example, the last interview that I had in Canada about two years ago was at a university in Quebec, and basically they hired someone else that had almost no research experience due to some “political reasons”. Just to give you a sense of how biased that hiring process was, here is a comparison of the candidate that was hired and me:

Total number of citations:    < 150 (the selected candidate)  1031 (me)
Number of citations (most cited paper):    < 20  (the selected candidate)    134 (me)
Number of citations (last year):  < 30 (the selected candidate)      >300  (me)
Number of papers (this year):   4 (the selected candidate)   >40  (me)

So who would you hire? 🙂 But anyway, I just show this as an example to show that the hiring process is not always fair. Actually, this could have happened anywhere in the world. But when there are very few jobs available, as in Canada, it makes it even harder to find a position. But, it does not bother me, since this has led me to try something else and move to China, which has been one of the best decision  for my career!

Before explaining what happened after this, let me make it clear that I did not leave my previous job in Canada because I did not like it. Actually, I had the chance to work at a great university in Canada and I made many friends there, and had also had some wonderful students. I had my first opportunity to work as a professor there and it was a hard decision to leave. However, to go further in my career as a researcher, I wanted to move to a bigger university.

Moving to China

Thus, at the end of June 2015, I decided to apply for a faculty position at a top university in China.  I quickly passed the interview and started to work there a few months later after quickly selling my house and my car in Canada.  So now let’s talk about what you probably want to know: how my current job in China is compared to my previous position in Canada?

Well, I must first say that I moved to one of the top 10 university in China, which is also one of the top 100 university in the world for computer science. Thus, the level of the students there is quite high and it is also an excellent research environment.  But let’s analyse this in more details.

In terms of research funding:

  • In Canada, it has become extremely difficult to receive research funding due to budget cuts in research and the lack of major investment in education by the government. To give you an idea, the main research funding association called NSERC could only give me 100,000$ CAD for five years, and I was considered lucky to have this funding. But this is barely enough to pay one graduate student and attend one or two conference per year.
  • In China, on the other hand, the Chinese government offers incredible research funding opportunities.  Of course, not everyone is equally funded. The smaller universities do not receive as much funding as the top universities. But there is some very good research program to support researchers, and especially the top researchers. In my case, I applied for a special program to recruit young talents called the Youth 1000 talent program by the  NSFC (National Science Fundation of China). I was awarded 4,000,000 RMB in research funding (about 800,000 $ CAD), for five years. Thus I now receives about eight times more funding than what I received in Canada for my research. This of course now make a huge difference for my research. I can thus buy expensive equipment that I needed such as a big data cluster, hire a post-doc, pay many students, and perhaps even hire a profesionnal programmer eventually to support my research.  Besides, after getting this grant for young talents, I was automatically promoted to Full Professor, and will soon become the director of a research center, and will get my own lab. This is a huge improvement for my career compared to what I had in Canada.

Now let’s compare the salary:

  • In Canada, I had a decent salary for a university professor.
  • In China, my base salary is already higher  than what I received in Canada. This is partly due to the fact that I work in a top university, located in a rich city (Shenzhen) and that I also received a major pay increase after receiving the young talent funding. However, besides the salary, it is possible to receive many bonuses in China that can increase your salary through various program. Just to give you an example, in the city of Shenzhen, there is a program called the Peackock program that can provide more than 2,000,000 RMB (about 400,000 CAD $) for living expenses for excellent researchers working in that city, on five years.  I will not say how much I earn. But by including these special program(s), I can say that my salary is now about twice what I earned in Canada.

In terms of living expenses, living in China is of course much less expensive than living in Canada. And the income tax is more or less similar, depending on the provinces in Canada. In the bigger cities in China, renting an apartment can be expensive. However, everything else is cheap. Thus, the overall living cost is much lower than Canada.

In terms of life in general, of course, the life is different in China than in Canada, in many different ways. There are always some advantages and disadvantages to live in any country around the world, as nothing is perfect anywhere. But I really enjoy my life in China. And since I greatly enjoy the Chinese culture (and speak some Chinese), this is great for me. The city where I work is a very modern city that is very safe (I would never be worried about walking late at night). In terms of work environment, I am also very satisfied. I have great colleagues and everyone is friendly. It is on overall very exciting to work there and I expect that it will greatly improve my research in the next few years.

Also, it is quite inspiring to work and contribute to a top university and a city that are currently very quickly expanding. To give you an idea, the population of that city has almost doubled in the last fifteen year, reaching more than 10 million persons, and 18 millions when including the surrounding areas. There are also many possibilities for projects with the industry and  the government in such a large city.


In this blog post, I wanted to discuss a little bit about the reasons why I decided to move to China, and why I consider that it is one of the best decisions that I ever took for my career, as I think that it would be interesting for other researchers.

By the way, if you are a data mining researcher and are looking for a faculty position in China, you may leave me a message. My research center is currently looking for a professor with a data mining background.

Philippe Fournier-Viger is a full professor and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Data Mining, Research, Uncategorized | Leave a comment

Brief report about the Dexa 2016 and Dawak 2016 conferences

This week, I have been attending the DEXA 2016 and DA‎WAK 2016 conferences, in Porto, Portugal, from the 4th to 8th September 2016, to present three papers. In this blog post, I will give a brief report about these conferences.

DEXA 2016

About these conferences

The DEXA conference is a well-established international conference related to database and expert systems. This year, it was the 27th edition of the conference. It is a conference that is  held in Europe, every year, but still attracts a considerable amount of researchers from other continents.

DEXA is a kind of multi-conference. It actually consists of 6 small conferences that are organized together. Below, I provide a description of each of those sub-conferences and indicate their acceptance rate.

  • DEXA 2016 (27th Intern. Conf.  on Database and Expert Systems Applications).
    Acceptance rate: 39 / 137 = 28% for full papers, and another 29 / 137 = 21 % for short papers
  • DaWaK 2016 (18th Intern. Conf.  on Big Data Analytics and Knowledge Discovery)
    Acceptance rate: 25 / 73= 34% 
  • EGOVIS 2016  (5th Intern. Conf.  on Electronic Government and the Information Systems Perspective)
    Acceptance rate: not disclosed in the proceedings, 22 papers published
  • ITBAM 2016  (7th Intern. Conf.  on Information Technology in Bio- and Medical Informatics)
    Acceptance rate: 9 / 26 = 36 % for full papers,  and another 11 / 26 = 42% for short papers
  • TrustBus 2016 (13th Intern. Conf.  on Trust, Privacy, and Security in Digital Business)
    Acceptance rate: 25 / 73= 43%
  • EC-Web 2016 (17th Intern. Conf.  on Electronic Commerce and Web Technologies)

Thus, the DEXA conference is more popular than DAWAK and the other sub-conferences and also is more competitive in terms of acceptance rate than the other sub-conferences.


The proceedings of each of the first five sub-conferences are published by Springer in the Lecture Notes in Computer Science Series, which is quite nice, as it ensures that the papers are indexed by major indexes in computer science such as DBLP.  The proceedings of the conferences were given on a USB drive.

DEXA 2016 proceedings

badge and proceedings of DEXA

The conference location

The conference was locally organized  by the Instituto Superior de Engenharia do Porto (ISEP) in Porto, Portugal. The location has been great, as Porto is a beautiful European city with a long history. The old town of Porto is especially beautiful. Moreover, visiting Porto is quite inexpensive.



First day of the conference

The first day of the conference started at 10:30 AM. The first day was mostly paper presentations.  The main topics of the papers during the first day of DEXA  were temporal databases, high-utility itemset mining, periodic pattern mining, privacy-preserving data mining, and clustering.  In particular, I had two papers presentations related to itemsets mining:

  • a paper presenting a new type of patterns called minimal high-utility itemsets
  • a paper about discovering high utility itemsets with multiple thresholds.

Besides, there was a keynote entitled “From Natural Language to Automated Reasoning” by Bruno Buchberger from Austria, a famous researcher in the field of symbolic computation. The keynote was about using formal automated reasoners (e.g. math theorem prover) based on logic to analyze texts. For example, the speaker proposed to extract formal logic formulas from tweets to then understand their meaning using automated reasoners and a knowledge base provided by the user. This was a quite unusual perspective on tweet analysis since nowadays, researchers in natural language processing prefer using statistical approaches to analyze texts rather than using approaches relying on logic and a knowledge base. This gave rise to some discussion during the questions period after the keynote.

DEXA 2016 keynote speech

DEXA 2016 first keynote speech

During the evening, there was also a reception in a garden inside the institute were the conference was held.

DEXA 2016 reception

Second day of the conference

On the second day,  I have attended DAWAK. In the morning,  there was several paper presentations. I have presented a paper about recent high-utility itemset mining.  The idea is to discover itemsets (set of items) that have been recently profitable in customer transactions, to then use this insight for marketing decisions.  There was also an interesting paper presentation about big data itemset mining by student Martin Kirchgessner from France.

Then, there was an interesting keynote about the price of data by Gottfried Vossen from Germany. This talk started by discussing the fact that companies are collecting more and more rich data about persons. Usually,  many persons give personal data for free to use services such as Facebook or Gmail.  There also exist several marketplaces where companies can buy data such as the Microsoft Azure Marketplace and also different pricing models for data. For example,  one could have different pricing models to sell more or less detailed views of the same data.  There also exist repositories of public data. Moreover other issues are what happen with the data of someone when he dies. In the future,  a new way of buying products could be to pay for data about the design of an object,  and then print it by ourselves using 3d printers or other tools. Other issues related to the sale of data is DRM,  selling second-hand data,  etc. Overall, it was not a technical presentation,  but it discussed an important topic nowadays which is the value of data in our society that relies more and more on technologies.

z5 z6

Third day of the conference

On the third day of the conference, there was more paper presentations and also a keynote that I have missed.  On the evening, there was a nice banquet at a wine cellar named Taylor. We had the pleasure to visit the cellar, enjoy a nice dinner, and listen to a Portuguese band at the end of evening.

z2 z1


This was globally a very interesting conference. I had opportunity to discuss with some excellent researchers, especially from Europe, including some that I had met at other conferences. There was also some papers quite related to my sub-field of research in data mining. DEXA may not be a first tier conference, but it is a very decent conference, that I would submit more papers to in the future.

DEXA 2017 / DAWAK 2017 will be held in Lyon, France..

Philippe Fournier-Viger is a full professor and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Big data, Conference, Data Mining, Research | 2 Comments

Brief report about the IEA AIE 2016 conference

This week, I have attended the IEA AIE 2016 conference, held at MoriokaJapan from the 2nd to the 4th August 2016. In this blog post, I will briefly discuss the conference.


About the conference

IEA AIE 2016 (29th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems) is an artificial intelligence conference with a focus on applications of artificial intelligence. But this conference also accepts theoretical papers, as well as data science papers.  This year, 85 papers have been accepted. The proceedings has been given on a  USB drive:


There was also the option to buy the printed proceedings, though:

IEA AIE 2016 proceedings

First day of the conference

Two keynote speeches were given on the first day of the conference. For the second keynote, I have seen about 50 persons attending the conference. Many people were coming from Asia, which is understandable since the conference is held in Japan. 

The paper presentations on the first day were about topics such as Knowledge-based Systems, Semantic Web, Social Network (clustering, relationship prediction), Neural Networks, Evolutionary Algorithms and Heuristic Search, Computer Vision and Adaptive Control.

Second day

On the second day of the conference, there was a great keynote talk by Prof. Jie Lu from Australia about recommender systems. She first introduced the main recommendation approaches (content-based filtering, collaborative filtering, knowledge-based recommendation, and hybrid approaches) and some of the typical problems that recommender systems face (cold-start problem, sparsity problem, etc.).  She then talked about her recent work on extensions of the recommendation problem such as group recommendation (e.g. recommending a restaurant that will globally satisfy or not disapoint a group of persons), trust-based recommendation (e.g. a system that recommend products to you based on what friends that you trust have liked, or the friends of your friends),  fuzzy recommender systems (a recommender systems that consider each item can belong to more than one category), and cross-domain recommendation (e.g. if you like reading books about Kung Fu, you may also like watching movies about Kung Fu).

After that there was several paper presentations. The main topics were Semantic Web, Social networks, Data Science, Neural Networks, Evolutionary Algorithms, Heuristic Search, Soft Computing and Multi-Agent Systems.

In the evening, there was a great banquet at the Morioka Grand Hotel, a nice hotel located on a mountain, on the outskirts of the city.


Moreover, during the dinner, there was a live music band:


Third day

On the third day, there was an interesting keynote on robotics by Prof. Hiroshi Okuno from the Waseda University / University of Tokyo. His team has proposed an open-source sofware called Hark for robot audition. Robot audition means the general process by which a robot can process sounds from the environment. The sofware which is the results of years of resutls has been used in robots equipped with arrays of microphones. By using the Hark library, robots can listen to multiple persons talking to the robot at the same time, localize where the sounds came from, and isolate sounds, among many other capabilities.


It was followed by paper presentations on topics such as Data Science (KNN, SVM, itemset mining, clustering), Decision Support Systems,  Medical Diagnosis and Bio-informatics, Natural Language Processing and Sentiment Analysis. I have presented a paper about high utility itemset mining for a new algorithm called FHM+.


The location of the conference is Morioka, a not very large city in Japan. However, the timing of the conference was perfect. It was held during the Sansa Odori festival, one of the most famous festival in Japan. Thus, during the evenings, it was possible to watch the Sansa parade, where people wearing traditional costumes where playing Taiko drums and dancing in the streets.



The conference has been quite interesting. Since it is a quite general conference, I did not discuss with many people that were close to my research area. But I met some interesting people, including some top researchers.  During the conference, it was announced that the conference IEA AIE 2017 will be held in Arras (close to Paris, France).


Philippe Fournier-Viger is a full professor and also the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.


Posted in artificial intelligence, Conference, Data Mining, Data science, Research | Tagged , , | 2 Comments

How to give a good paper presentation at an academic conference?

In this blog post, I will discuss how to give a good paper presentation at an academic conference. If you are a researcher, this is an important topic for you researcher because giving a good presentation of your work will raise interest in your work.  In fact, a good researcher should not only be good at doing research, but also be good at communicating the results of this research in terms of written and oral communication.

Rule 1 : Prepare yourself, and know the requirements

Giving a good oral presentation starts with a good preparation.  One should not prepare his presentation the day before the presentation but a few days before, to make sure that there will be enough time to prepare well. A common mistake is to prepare a presentation the evening before the presentation. In that case, the person may finish preparing the presentation late, not sleep well, be tired, and as a result give a poor presentation.

Preparing a presentation does not means only to design some Powerpoint slides. It also means to practice your presentation. If you are going to give a paper presentation at a conference, you should ideally practice giving your presentation several times in your bedroom or in front of friends before giving the presentation in front of your audience. Then, you will be more prepared, you will feel less nervous,  and you will give a better presentation.

It is also important to understand the context of your presentation: (1) who will attend the presentation? (2) how long the presentation should be ? (3) what kind of equipment will be available to do the presentation (projector, computer, room, etc.) ?, (4) what is the background of the people attending the presentation?   These questions are general questions that needs to be answered to help you prepare an oral presentations.

Who will attend the presentation is important. If you do a presentation in front of experts from your field the presentation should be different than if you present to your friend, your research advisor, or to some kids. A presentation should always be adapted to the audience.

To avoid having bad surprises, it is always better to check the equipment that will be available for your presentation and prepare some backup plan in case that some problems occur. For example, one may bring his laptop and a copy of his presentation on a USB drive as well as a copy in his e-mail inbox, just in case.

It is also important to know the expected length of the presentation. If the presentation at a conference should last no more than 20 minutes, for example, then one should make sure that the presentation will not last more than 20 minutes. At an academic conference, it is quite possible that someone will stop your presentation if you exceed the time limit. Moreover, exceeding the time limit may be seen as disrespectful.

Rule 2 :  Always look at your audience

When giving a presentation, there are a few important rules that should always be followed. One of the most important one is to always look at your audience when talking.

One should NEVER read the slides and turn his back to the audience for more than a few seconds. I have seen some presenters that did not look at the audience for long periods of time at academic conferences, and it is one of the best way to annoy the audience. For example, here are some pictures that I have took at an academic conference.

Not looking at the audience

Not looking at the audience

Turning your back to audience

Turning your back to audience

In that presentation, the presenter barely looked at the audience. Either he was looking at the floor (first picture) when talking or either he was reading the slides (second picture). This is one of the worst thing to do, and the presentation was in fact awful. Not because the research work was not good. But because it was poorly presented. To do a good presentation, one should try to look at the audience as much as possible. It is OK sometimes to look at a slide to point something out, but it should not be more than a few seconds.  Otherwise, the audience may lose interest in your presentation.

Personally, when I give a presentation, I look very quickly at the computer screen to see some keywords and remember what I should say, and then I look at the audience to talk. Then, when I go to the next slide, I will briefly look at the screen of my computer to remember what I should say on that slide and then I continue looking at the audience while talking. Doing this results in much better presentation. But it may require some preparation. If you practice giving your talk several times in your bedroom for example, then you will become more natural and you will not need to read your slides when it will be the time to give your presentation in front of an audience.

Rule 3:  Talk loud enough

Other important things to do is to talk LOUD enough when giving a presentation, and speak clearly. Make sure that even the people in the back of the room can hear your clearly. This seems like something obvious. But several times at academic conferences, there are some presenters who do not speak loud enough, and it becomes very boring for the audience, especially for those in the back of the room.

Rule 4:  Do not sit 

Another good advice is to stand when giving a presentation. I have ever seen some people giving a presentation while seated. In general, if you are seated, then you will be less “dynamic”. It is always better to stand up to give a presentation.

Rule 5:  Make simple slides

A very common problem that I observed in presentations at academic conferences is that presenters put way too much content on their slides. I will show you some pictures that I took at an academic conference for example:

Slides with too many formulas

A slide with too many formulas

In this picture, the problem is that there are too many technical details, and formulas. It is impossible for someone attending a 20 minutes presentations with slides full of formulas to read, understand and remember all these formulas, with all these symbols.

In general, when I give a presentation at a conference, I will not show all the details, formulas, or theorems.  Instead, I will only give some minimum details so that the audience understand the basic idea of my work: what is the problem that I want to solve and the main intuition behind the solution. And I will try to explain some applications of my work and show some illustrations or simple examples to make it easy to understand. Actually, the goal of a paper presentation is that the audience understand the main idea of your work.  Then, if someone from the audience wants to know all the technical details, he can read your paper.

If someone do like in the picture above by giving way too much technical details or show a lot of formulas during a presentations, then the audience will very quickly get lost in the details and stop following the presentation.

Here is another example:

Slides with too much content

A slide with way too much content

In the above slide, there are way too much text. Nobody from the audience will start to read all this text.  To make a good presentation, you should try to make your slides as simple as possible. You should also not put full sentences but rather just put some keywords or short parts of sentences. The reason is that during the presentation you should not read the slides and the audience should also not read a long text on your slides. You should talk and the audience should listen to you rather than be reading your slides.  Here is an example of a good slide design:

A powerpoint slide with just enough content

A powerpoint slide with just enough content

This slide has just enough content. It has some very short text that give only the main points. And then the presenter can talk to explain these points in more details while looking at the audience rather than reading the slides.


There are also a lot of other things that could be said about giving a good presentation but I did not want to write too much for today. Actually, giving good presentations is something that is learned through practice. The more that you practice giving presentations, the more that you will become comfortable to talk in front of many people.

Also, it is quite normal that a student may be nervous when giving a presentation especially if it is in a foreign language. In that case, it requires more preparation.

Hope that you have enjoyed this short blog post.

Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Conference, General, Research | 7 Comments

Brief report about the 12th International Conference on Machine Learning and Data Mining conference (MLDM 2016)

In this blog post, I will provide a brief report about the 12th Intern. Conference on Machine Learning and Data Mining (MLDM 2016), that I have attended from the 18th to 20th July 2016 in New York, USA.

About the conference
This is the 12th edition of the conference. The MLDM conference is co-located and co-organized with the 16th Industrial Conference on Data Mining 2016, that I have also attended this week. The proceedings of MLDM are published by Springer. Moreover, an extra book was offered containing two late papers, published by Ibai solutions.

The MLDM 2016 proceedings

The MLDM 2016 proceedings

Acceptance rate

The acceptance rate of the conference is about 33% (58 papers have been accepted from 169 submitted papers), which is reasonable.

First day of the conference

The first day of the MLDM conference started at 9:00 with an opening ceremony, followed by a keynote on supervised clustering. The idea of supervised clustering is to perform clustering on data that has already some class labels. Thus, it can be used for example to discover sub-class in existing classes. The class labels can also be used to evaluate how good some clusters are. One of the cluster evaluation measure suggested by the keynote speaker is the purity, that is the percentage of instances having the most popular class label in a cluster. The purity measure can be used to remove outliers from some clusters among other applications.

After the keynote, there was paper presentations for the rest of the day. Topics were quite varied. It included paper presentations about clustering, support vector machines, stock market prediction, list price optimization, image processing, automatic authorship attribution of texts, driving style identification, and source code mining.

The conference room

The conference room

The conference ended at around 17:00 and was followed by a banquet at 18:00. There was about 40 persons attending the conference in the morning. Overall, there was  some some interesting paper presentations and discussion.

Second day of the conference

The second day was also a day of paper presentations.

Second day of the conference

Second day of the conference (afternoon)

The topics of the second day included itemset mining algorithms, inferring geo-information about persons, multigroup regression, analyzing the content of videos, time-series classification, gesture recognition (a presentation by Intel) and analyzing the evolution of communities in social networks.

I have presented two papers during that day (one by me and one by my colleague), including a paper about high-utility itemset mining.

Third day of the conference

The third day of the conference was also paper presentations. There was various topics such as image classification, image enhancement, mining patterns in cellular radio access network data, random forest learning, clustering  and graph mining.


It was globally an interesting conference. I have attended both the Industrial Conference on Data Mining and MLDM conference this week. The MLDM is more focused on theory and the Industrial Conference on Data Mining conference is more focused on industrial applications. MLDM is a slightly bigger conference.

Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

Posted in Big data, Conference, Data Mining, Data science | 1 Comment