The Data Blog

A brief report about the IEA AIE 2020 conference

Posted on 2020-09-22 by Philippe Fournier-Viger

In this blog post, I will write a brief report about the 33rd International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems (IEA AIE 2020), held in Kitakyushu, Japan from the 22nd to 25th September 2020.

What is IEA AIE 2020?

The IEA AIE 2020 is a well-established academic conference that has been running for 33 years. It is about artificial intelligence, and it attracts many papers about the applications of intelligent systems.

I have personally attended this conference many times in the past (IEA AIE 2009, IEA AIE 2010, IEA AIE 2011, IEA AIE 2014, IEA AIE 2016, IEA AIE 2018, IEA AIE 2019), and have 13 papers in its proceedings.

This year, I am attending IEA AIE 2020 as author but also as program chair. The organizers this year are:

Proceedings of IEA AIE 2020

This year, 119 papers have been submitted to the conference. Each paper has been reviewed by at least 3 reviewers. The Program Committee who evaluated the papers is composed of 82 researchers from 36 countries.

A total of 62 full papers were accepted and 17 short papers. All of those were published in the Springer proceedings. Moreover, an additional 9 poster papers were published in a separated poster proceedings with ISBN.

The papers covered many applications to real-life problems such as: •engineering, science, •industry, •automation •robotics, •business and finance, •medicine and biomedicine, •bioinformatics, •cyberspace, •human-machine interaction.

This year, in the new format of the IEA AIE conference, some special sessions were organized. A special session is like a special track about a specific topic of interests. All papers from the special track are published in the main conference proceedings. Here is the information about the two tracks:

Best paper awards

Moreover, this year, four types of awards are announced during the conference:

Best paper award: Dolly Sapra, Andy D. Pimentel: Constrained Evolutionary Piecemeal Training to Design Convolutional Neural Networks.
Best theory paper award: Fan Zhang et al.
A new integer linear programming formulation to the inverse QSAR QSPR for acyclic chemical compounds
Best application paper award: Wei Zhang and Chris Challis
Automatic identification of account sharing for video streaming services
Best special session paper award: Wei Song, Lu Liu, Chaomin Huang
TKU-CE: Cross-Entropy Method for Mining Top-K High Utility Itemsets

These awards were selected by a committee based on review scores and a discussion of the top papers. To ensure that the process is fair, papers from the organization committee members were excluded from receiving awards.

A partly virtual conference

Due to the coronavirus pandemy around the world, the conference was held virtually and also on site in Japan at the same time. This required some special organizations from the local organizers and was very well done. I was happy to saw friends in the conference.

Day 1 – Opening session, keynotes and regular papers

In the opening session of the conference, the conference was presented. Each organizers gave some words about the conference. Then, there was two keynote speeches : one by Prof. Tao Wu about healthcare, and the other by Prof. Ee-Peng Lim about AI for social goods.

The talk of Prof. Lim was very interesting as he talked about two projects that can have a positive implications for the society. The first one was about a probabilistic model of the labor market in Singapore. The second one was about an application that can let users take picture of their food to keep track of what they are eating. The system FoodAI can be tested on this website: https://foodai.org/ Here is a few slides from this presentation.

In his conclusion, Prof. Lim also talked about three challenges for the development and adoption of proposed models.

The keynote was followed by several paper presentations on various topics.

Day 2 – regular papers + keynote talks

On the second day, there was more paper presentations and also two keynote talks (one by Prof. Bo Huang and one by Prof. Enrique Herrera Viedma).

The keynote of Prof. Enrique Herrera Viedma was about group decision making, that is how a group of expert can reach an agreement to take decisions.

Generally, group decision making is reached through a concensus reaching process which requires discussion between experts and involve multi stage negotiation. Here are a few slides, describing the main process:

He explained that nowadays group decision making is done in a new context, with social networks and Web 2.0 tools.

Then, he discussed in more details about properties of social networks and how sentiment analysis can play a role in decision-making models. Here are a few of the important properties of social networks:

Sentiment analysis can be used in group decision making to understand how a user feels about a particular topic, and in particular the preferences of experts about different alternatives. Here are some details:

Here is an overview of the proposed group decision making based framework

Then, there was more details but I will not report on everything.

An upcoming special issue in the Applied Intelligence journal

Another great thing this year at IEA AIE is that there will be a special issue in the Applied Intelligence journal (Springer, Q2). The best papers of the conference will be invited for an extension in the special issue. Details will be announced after the conference.

Next year… IEA AIE 2021… in Kuala Lampur

It was announced that the IEA AIE 2021 conference will be held next year in Kuala Lampur (Malaysia).

The website of IEA AIE 2021 is already online at http://ieaaie2021.wordpress.com/ Here are the key dates related to this conference:

Day 3 and 4 – More paper presentations

On the third and fourth days, there was more paper presentations.

Pattern mining papers

This year, there was 7 pattern mining papers, which shows that it is a popular topic at this conference. These papers cover topics such as periodic itemset mining, and high utility itemset mining. Since this is a topic of interest for me and to several readers of this blog, here is the list of papers:

TKU-CE: Cross-Entropy Method for Mining Top-K High Utility Itemsets
Wei Song, Lu Liu and Chaomin Huang
Mining Cross-Level High Utility Itemsets
Philippe Fournier-Viger, Ying Wang, Jerry Chun-Wei Lin, Jose Maria Luna and Sebastian Ventura [ppt]
Maintenance of Prelarge High Average-Utility Patterns in Incremental Databases
Jimmy Ming-Tai Wu, Qian Teng, Jerry Chun-Wei Lin, Philippe Fournier-Viger and Chien-Fu Cheng
Efficient Mining of Pareto-front High Expected Utility Patterns
Usman Ahmed, Jerry Chun-Wei Lin, Jimmy Ming-Tai Wu, Youcef Djenouri, Gautam Srivastava and Suresh Kumar Mukhiya
TKE: Mining Top-K Frequent Episodes
Philippe Fournier-Viger, Yanjun Yang, Peng Yang, Jerry Chun-Wei Lin and Unil Yun
Parallel Mining of Partial Periodic Itemsets in Big Data [ppt]
C. Saideep, R. Uday Kiran, Koji Zettsu, Cheng-Wei Wu, P. Krishna Reddy, Masashi Toyoda and Masaru Kitsuregawa
A Fast Algorithm for Mining Closed Inter-Transaction Patterns
Thanh-Ngo Nguyen, Loan T.T. Nguyen, Bay Vo and Ngoc-Thanh Nguyen

Conclusion

I have enjoyed the conference. Next year, IEA AIE 2021 will be in Malaysia, and then IEA AIE 2022 in Japan.

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

SPMF 3.0: Towards even more efficiency

SPMF's architecture (1) The Algorithm Manager

How to find a good thesis topic in Machine Learning?

Posted in artificial intelligence, Conference, Data Mining | Tagged 2020, ai, artificial intelligence, conference, data mining, ieaaie, japan, springer | 4 Comments

The Imposter Syndrome in Academia

Posted on 2020-08-27 by Philippe Fournier-Viger

In this blog post, I will talk about something that many students or researchers have or are experiencing in academia, which is called the imposter syndrome. It is the feeling of not being worthy of having achieved some success or being in a given position. For example, a new PhD student accepted in a top university may feel that he was just lucky and did not really get accepted because of his skills or efforts. A professor may similarly feel that he received funding but that it is undeserved. The imposter syndrome is something very common in academia. Many people have experienced it at some point in their career.

Personally, when I was first admitted in the master degree in computer science more than 15 years ago, I felt that there was still some gaps in my knowledge. For example, I thought that I had not learnt enough about some topics in computer science or mathematics during the bachelor degree. Although I was a reasonably good programmer, it appeared to me that some other students were better. Moreover, another question that I had when starting the master degree was: Even if I am a good student, will I be successful at research? This is a question that many students have because doing research is something new at that stage.

Then, during the master and Ph.D degree, I published several papers on e-learning and started to attend academic conferences. But when attending the conferences, I felt sometimes that my knowledge of the field was not so deep compared to that of many experts there.

Later, I changed my research direction towards data mining and became very good in some research areas there. However, I still felt that I did not know enough about some hot topics like big data.

The examples above are situations that could be viewed as some form of imposter syndrome.

Now, I would like to talk more about this.

Is the imposter syndrome something bad?

Yes, if it discourage you. No, if it motivates you to work harder and to improve yourself. Personally, when I perceive that I have some weaknesses, I will work harder to try to overcome them, and in the end, it will be positive. Thus, whether the imposter syndrome is something negative of positive depends on your attitude towards it.

How to overcome the imposter syndrome?

A good start is to recognize that you have several skills and to think about your strengths. Moreover, you should remember that although some other people may appear to be better at some things, you are better at other things. For example, another professor may seem to be better at teaching than you are but you may be a better researcher, or a student may seem better programmer than you, but you are better at writing research papers. And in any case, you can work out on your weaknesses to improve yourself.

Another important thing is to not be scared that people “unmask you” and discover that you are an “imposter“. Remember that no one is perfect and you should not be shy to admit htat you have weaknesses. You can then ask for help or questions to other people because this will help to improve yourself. For example, it is OK to ask a question about something that you do not understand during a research seminar.

Related to this, I will tell you another story. I remember some friend of mine that was scared of telling his supervisor that his programming skills were weak during his PhD studies. He did not tell his supervisor during his whole Ph.D but he was stressed that the supervisor may find out about it. In such case, I think that he should have been honest with the supervisor (and that is what I told him at that time). If he had done that, perhaps that the supervisor could have gave him some suggestions to improve his skills and my friend would have felt less stressful. But my friend found another solution. He instead worked hard and asked for help from many other students, and finally improved himself.

How long the imposter syndrome last?

There is no answer that is suitable for everyone. Some people overcome that syndrome by receiving some recognition from other people such as some award, a prize or obtaining a degree. But sometimes, the imposter syndrome stays there for a long time. For example, I have read some story about a tenured professor in a top level university that mentioned that he felt the imposter syndrome until he retired. After completing a paper, he was always thinking that he could maybe not find good ideas anymore for his next research projects.

Conclusion

In this blog post, I talked about the imposter syndrome and told you a few stories about it. The imposter syndrome is something very common at all levels from students to professors. The important is to know that you are not alone that you have strengths, and to think about this in a positive way to help you grow and improve yourself rather than discourage you. Don’t be afraid that people “unmask you” but instead ask questions, and work on improving yourself.

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

10 ways of becoming more efficient at doing research

Turnitin, a smart tool for plagiarism detection?

How to write an academic book?

Posted in Academia, Research | Tagged imposter syndrome, Research | 1 Comment

Big problem on my website on IONOS webhosting!

Posted on 2020-08-25 by Philippe Fournier-Viger

Hi all,

A bad news is that the database of this blog was reverted to 3 years ago due to some technical problem. I have used 1and1 IONOS as hosting service for my websites for the last 10 years, but now it seems that the database for the blog was overwritten with an old backup because everything is as it was 3 years ago in January 2017. How could it have happened?

I have contacted 1and1 IONOS to try to fix the issue, but they denied that it is their fault and did not have any backup older than 7 days. And my own backup is a little bit old… This is unfortunate. Thus, I think that maybe all blog posts of the last three years are lost (maybe 50+ posts). Anyway, this can of things happen, and I will continue the blog again soon…

But this time, I will not trust the 1and1 hosting service and do my own backups more often.

I am now trying to recover old posts through the Internet Wayback Machine and the cache of web search engines… I have recovered a dozen posts already and will continue but it may take some time.

Update: After several hours, I think that I have recovered most of the missing blog posts… but maybe there are some broken links. At least, most of the posts are not lost.

Philippe

Posted in Uncategorized | 9 Comments

An Introduction to Data Mining

Posted on 2020-08-22 by Philippe Fournier-Viger

In this blog post, I will introduce the topic of data mining. The goal is to give a general overview of what is data mining.

What is data mining?

Data mining is a field of research that has emerged in the 1990s, and is very popular today, sometimes under different names such as “big data” and “data science“, which have a similar meaning. To give a short definition of data mining, it can be defined as a set of techniques for automatically analyzing data to discover interesting knowledge or pasterns in the data.

The reasons why data mining has become popular is that storing data electronically has become very cheap and that transferring data can now be done very quickly thanks to the fast computer networks that we have today. Thus, many organizations now have huge amounts of data stored in databases, that needs to be analyzed.

Having a lot of data in databases is great. However, to really benefit from this data, it is necessary to analyze the data to understand it. Having data that we cannot understand or draw meaningful conclusions from it is useless. So how to analyze the data stored in large databases? Traditionally, data has been analyzed by hand to discover interesting knowledge. However, this is time-consuming, prone to error, doing this may miss some important information, and it is just not realistic to do this on large databases. To address this problem, automatic techniques have been designed to analyze data and extract interesting patterns, trends or other useful information. This is the purpose of data mining.

In general, data mining techniques are designed either to explain or understand the past (e.g. why a plane has crashed) or predict the future (e.g. predict if there will be an earthquake tomorrow at a given location).

Data mining techniques are used to take decisions based on facts rather than intuition.

What is the process for analyzing data?

To perform data mining, a process consisting of seven steps is usually followed. This process is often called the “Knowledge Discovery in Database” (KDD) process.

Data cleaning: This step consists of cleaning the data by removing noise or other inconsistencies that could be a problem for analyzing the data.
Data integration: This step consists of integrating data from various sources to prepare the data that needs to be analyzed. For example, if the data is stored in multiple databases or file, it may be necessary to integrate the data into a single file or database to analyze it.
Data selection: This step consists of selecting the relevant data for the analysis to be performed.
Data transformation: This step consists of transforming the data to a proper format that can be analyzed using data mining techniques. For example, some data mining techniques require that all numerical values are normalized.
Data mining: This step consists of applying some data mining techniques (algorithms) to analyze the data and discover interesting patterns or extract interesting knowledge from this data.
Evaluating the knowledge that has been discovered: This step consists of evaluating the knowledge that has been extracted from the data. This can be done in terms of objective and/or subjective measures.
Visualization: Finally, the last step is to visualize the knowledge that has been extracted from the data.

Of course, there can be variations of the above process. For example, some data mining software are interactive and some of these steps may be performed several times or concurrently.

What are the applications of data mining?

There is a wide range of data mining techniques (algorithms), which can be applied in all kinds of domains where data has to be analyzed. Some example of data mining applications are:

fraud detection,
stock market price prediction,
analyzing the behavior of customers in terms of what they buy

In general data mining techniques are chosen based on:

the type of data to be analyzed,
the type of knowledge or patterns to be extracted from the data,
how the knowledge will be used.

What are the relationships between data mining and other research fields?

Actually, data mining is an interdisciplinary field of research partly overlapping with several other fields such as: database systems, algorithmic, computer science, machine learning, data visualization, image and signal processing and statistics.

There are some differences between data mining and statistics although both are related and share many concepts. Traditionally, descriptive statistics has been more focused on describing the data using measures, while inferential statistics has put more emphasis on hypothesis testing to draw significant conclusion from the data or create models. On the other hand, data mining is often more focused on the end result rather than statistical significance. Several data mining techniques do not really care about statistical tests or significance, as long as some measures such as profitability, accuracy have good values. Another difference is that data mining is mostly interested by automatic analysis of the data, and often by technologies that can scales to large amount of data. Data mining techniques are sometimes called “statistical learning” by statisticians. Thus, these topics are quite close.

What are the main data mining software?

To perform data mining, there are many software programs available. Some of them are general purpose tools offering many algorithms of different kinds, while other are more specialized. Also, some software programs are commercial, while other are open-source.

I am personally, the founder of the SPMF open-source data mining library, which is free and open-source, and specialized in discovering patterns in data. But there are many other popular software such as Weka, Knime, RapidMiner, and the R language, to name a few.

Data mining techniques can be applied to various types of data

Data mining software are typically designed to be applied on various types of data. Below, I give a brief overview of various types of data typically encountered, and which can be analyzed using data mining techniques.

Relational databases: This is the typical type of databases found in organizations and companies. The data is organized in tables. While, traditional languages for querying databases such as SQL allow to quickly find information in databases, data mining allow to find more complex patterns in data such as trends, anomalies and association between values.
Customer transaction databases: This is another very common type of data, found in retail stores. It consists of transactions made by customers. For example, a transaction could be that a customer has bought bread and milk with some oranges on a given day. Analyzing this data is very useful to understand customer behavior and adapt marketing or sale strategies.
Temporal data: Another popular type of data is temporal data, that is data where the time dimension is considered. A sequence is an ordered list of symbols. Sequences are found in many domains, e.g. a sequence of webpages visited by some person, a sequence of proteins in bioinformatics or sequences of products bought by customers. Another popular type of temporal data is time series. A time series is an ordered list of numerical values such as stock-market prices.
Spatial data: Spatial data can also be analyzed. This include for example forestry data, ecological data, data about infrastructures such as roads and the water distribution system.
Spatio-temporal data: This is data that has both a spatial and a temporal dimension. For example, this can be meteorological data, data about crowd movements or the migration of birds.
Text data: Text data is widely studied in the field of data mining. Some of the main challenges is that text data is generally unstructured. Text documents often do no have a clear structure, or are not organized in predefined manner. Some example of applications to text data are (1) sentiment analysis, and (2) authorship attribution (guessing who is the author of an anonymous text).
Web data: This is data from websites. It is basically a set of documents (webpages) with links, thus forming a graph. Some examples of data mining tasks on web data are: (1) predicting the next webpage that someone will visit, (2) automatically grouping webpages by topics into categories, and (3) analyzing the time spent on webpages.
Graph data: Another common type of data is graphs. It is found for example in social networks (e.g. graph of friends) and chemistry (e.g. chemical molecules).
Heterogeneous data. This is some data that combines several types of data, that may be stored in different format.
Data streams: A data stream is a high-speed and non-stop stream of data that is potentially infinite (e.g. satellite data, video cameras, environmental data). The main challenge with data stream is that the data cannot be stored on a computer and must thus be analyzed in real-time using appropriate techniques. Some typical data mining tasks on streams are to detect changes and trends.

What types of patterns can be found in data?

As previously discussed, the goal of data mining is to extract interesting patterns from data. The main types of patterns that can be extracted from data are the following (of course, this is not an exhaustive list):

Clusters: Clustering algorithms are often applied to automatically group similar instances or objects in clusters (groups). The goal is to summarize the data to better understand the data or take decision. For example, clustering techniques such as K-Means can be used to automatically groups customers having a similar behavior.
Classification models: Classification algorithms aims at extracting models that can be used to classify new instances or objects into several categories (classes). For example, classification algorithms such as Naive Bayes, neural networks and decision trees can be used to build models that can predict if a customer will pay back his debt or not, or predict if a student will pass or fail a course. Models can also be extracted to perform prediction about the future (e.g. sequence prediction).
Patterns and associations: Several techniques are developed to extract frequent patterns or associations between values in database. For example, frequent itemset mining algorithms can be applied to discover what are the products frequently purchased together by customers of a retail store. Some other types of patterns are for example, sequential patterns, sequential rules, periodic patterns, episode mining and frequent subgraphs.
Anomalies/outliers: The goal is to detect things that are abnormal in data (outliers or anomalies). Some applications are for example: (1) detecting hackers attacking a computer system, (2) identifying potential terrorists based on suspicious behavior, and (3) detecting fraud on the stock market.
Trends, regularities: Techniques can also be applied to find trends and regularities in data. Some applications are for example to (1) study patterns in the stock-market to predict stock prices and take investment decisions, (2) discovering regularities to predict earthquake aftershocks, (3) find cycles in the behavior of a system, (4) discover the sequence of events that lead to a system failure.

In general, the goal of data mining is to find interesting patterns. As previously mentioned, what is interesting can be measured either in terms of objective or subjective measures. An objective measure is for example the occurrence frequency of a pattern (whether it appears often or not), while a subjective measure is whether a given pattern is interesting for a specific person. In general, a pattern could be said to be interesting if: (1) it easy to understand, (2) it is valid for new data (not just for previous data); (3) it is useful, (4) it is novel or unexpected (it is not something that we know already).

Conclusion

In this blog post, I have given a broad overview of what is data mining. This blog post was quite general. I have actually written it because I am teaching a course on data mining and this will be some of the content of the first lecture. If you have enjoyed reading, you may subscribe to my Twitter feed (@philfv) to get notified about future blog posts.

—
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Analyzing the source code of SPMF (5 years later)

Discovering the Top-K Stable Periodic Patterns in a Sequence of Events

SPMF: upcoming feature: The Memory Viewer

Posted in Big data, Data Mining, Data science | Tagged artificial intelligence, big data, data mining, data science, machine learning | 23 Comments

Unethical Reviewers in Academia!

Posted on 2020-08-15 by Philippe Fournier-Viger

In this blog post, I will talk about a common problem in academia, which is the unethical behavior of some reviewers that ask authors to cite several of their papers.

It is quite common that some reviewer will ask authors to cite his papers to increase his citation count. I have encountered this problem many times for my own papers when submiting to journals. Sometimes the reviewer will try to hide his identify by asking to cite four or five papers and include one or two from himself among those. But sometimes, it is very obvious as the reviewer will directly ask to cite many papers and they will all be from the same author. For example, just a few weeks ago, I received a notification for one of my papers where the reviewer wrote:

The related work needs improvement: Please add the following works:
…. title of paper 1 …
…. title of paper 2 ..
…. title of paper 3 ….
…. title of paper 4 …

That reviewer asked to cite four papers by the same person. In that case, it is very easy to guess who is the reviewer. In some cases, I have even seen two reviewers of the same papers both asking the author to cite their papers. Each of them was asking to cite about five of their papers. This was completely ridiculous and gave a very bad impressionabout the review process. This unethical behavior is quite common. If you submit many papers to journals, you will sooner or later encounter this problem, even for top 20 % journals.

Why it happens? The reason is that many universities consider citation count as an important metric for performance evaluation. Thus, some authors will try to artificially increase their citation count by forcing other authors to cite their papers.

So what are the solutions?

Authors facing this problem will often accept to cite the papers from the reviewer because they are afraid that the reviewer will reject the paper if they don’t. This is understandable. However, if the authors accept, this will encourage the reviewer to continue this unethical behavior for other papers. Thus, the best solution is to send an e-mail to the editor to let them know about it. This is what I do when I am in this situation. If you let the editor knows, the editor will normally take this into account and may even take some punitive actions like removing the reviewer from the journal.
To avoid this problem before it happens, some editors will read carefully the reviews and delete unethical requests by reviewers. However, this does not always happen because editors are often very busy and may not spend the time to read all comments made by reviewers. But it is good that some journal such as IEEE Access will put a disclaimer in the notification to inform authors that they are not required to cite papers that are not relevant to the article. This is a good way of preventing this problem.
Reviewers should only ask to cite papers that are relevant to the paper and will contribute to improving the quality of the paper. To avoid conflict of interests, a reviewer can suggest to cite a paper rather than tell authors that they must cite paper. This is more acceptable.

Conclusion

In this blog post, I have talked about some unethical behavior that many people have encountered when submiting to journals, and sometimes also for conferences. The reason why I wrote this blog post is that I have encountered this situation for two of my papers in the last two months and I have become quite tired to see this happen in academia.

If it also happened to you, please leave a comment below with your story. I will be happy to read it!

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

How to publish in top conferences/journals? (Part 1) – The Blue Ocean Strategy

Real Conferences VS Virtual Conferences

What I learned from being a journal editor for two years?

Posted in Academia, Research | Tagged academia, ethics, review, reviewer | 1 Comment

How to Improve Work Efficiency for Researchers?

Posted on 2020-08-10 by Philippe Fournier-Viger

In this blog post, I will talk about the topic of increasing work efficiency for researchers. This is an important topic as during a researcher’s career, the workload tend to increase over time but there is always only 24 hours every day. Thus, becoming more efficient is important. Being efficient also means to have more time to do other things after work such as spending time with your family and friends. I will share a few ideas below about how to improve efficiency for researchers.

Working on what is important

To improve efficiency, it is important to work on what is really important. For every task that a researcher wants to do, he should first evaluate how much time he will spend on the task and what will be the expected benefits. The reason is that sometimes the time spent on a task could be used to do something else that would bring more benefits for the same amount of time. For example, if someone is writing a research paper, he could spend a day on improving the quality of some figure or instead spend that day to proof-read the paper and improve the writing style. There are sometimes some tasks that we want to do that are not really important and require a lot of time. In that case, we maybe don’t need to do them.

Having a schedule and planning tasks

It is also a good habit to have a schedule to keep track of all the things that you need to do. Moreover, you can order tasks by priority to focus on the more important ones. It is also important to set goals and then try to make a plan of all the tasks that need to be done to achieve these goals.

For scheduling and planning, one can have a calendar and also a to-do list of important things to do. It is also good to keep a small book to write your research ideas when you have some to not forget them.

It is also good to do all the similar tasks on the same day. For example, if you have many papers to review, you can decide to review all of them in one afternoon rather than doing one every few days. Generally, this will be more efficient.

Working in a better environment

The work environment is also very important. It can be good for example to clean your desk, or find a quiet environment to work such as a library, to be more efficient. If you are in a noisy environment, it can also be useful to use some noise cancelling earphones or noise blocking earmuffs.

And of course, one should avoid working in a distracting environment such as while watching TVs or working in positions that decrease productivity such as laying on the bed.

Using software to reduce distractions

There are also exists some software that helps to get more focused. For example, on Windows, I use a software called AutoHideDesktopIcons that will hide the desktop, the taskbar and all opened windows except the current window. This helps to remove many distractions.

There are also exists some software for writing that have minimal user interface to make sure that one can focus on writing. This is the case for example of WriteMonkey on Windows. The user interface of WriteMonkey is basically just a blank page, which can really help to concentrate on writing (see below).

Collaborating with others and giving work to others

Another way of becoming more efficient is to share your workload with other people. For example, if you invite someone else to participate to your paper, then this person will do some work and thus your work will be reduced. If you are a team leader, you can also give some work to your team members to reduce your own work, or even hire a personal assistant or someone else to do some work for your (e.g. paying someone to proofread your papers).

Conclusion

In this blog post, I gave a few tips about how to become more efficient at research. I could certainly say much more about this but I wanted to give a few ideas. Please share your other ideas or views in the comment section, below.

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

The problem with Short Papers

Using LaTeX for writing research papers

How to become a journal or conference reviewer?

Posted in Academia, Research | Tagged academia, efficiency, Research, researcher, work | 1 Comment

Five recent books on pattern mining

Posted on 2020-08-07 by Philippe Fournier-Viger

In this blog post, I will list a few interesting and recent books on the topic of pattern mining (discovering interesting patterns in data). This mainly lists books from the last 5 years.

High utility pattern mining: Theory, Applications and algorithms (2019). This is the most recent book, edited by me. It is about probably the hottest topic in pattern mining right now, which is high utility pattern mining. The book contains 12 chapters written by experts from this field about discovering different kinds of high utility patterns in data. It gives a good introduction to the field, as it contains five survey papers, and also describe some of the latest research. Link: https://link.springer.com/book/10.1007/978-3-030-04921-8

Supervised Descriptive Pattern Mining (2018). A book that focuses on techniques for mining descriptive patterns such as emerging patterns, contrast patterns, class association rules, and subgroup discovery, which are other important techniques in pattern mining. https://link.springer.com/book/10.1007/978-3-319-98140-6

Pattern Mining with Evolutionary Algorithms (2016). A book that focuses on the use of evolutionary algorithms to discover interesting patterns in data. This is another emerging topic in the field of pattern mining. https://link.springer.com/book/10.1007/978-3-319-33858-3

Frequent pattern mining (2014). This book does not cover the latest research as it is already almost five years old. But it gives an interesting overview of some popular techniques in frequent pattern mining. http://link.springer.com/book/10.1007%2F978-3-319-07821-2

Spatiotemporal Frequent Pattern Mining from Evolving Region Trajectories (2018). This is a recent book, which focus on spatio-temporal pattern mining. Adding the time and spatial dimension in pattern mining is another interesting research issue. https://link.springer.com/book/10.1007/978-3-319-99873-2#about

That is all I wanted to write for today. If you know about some other good books related to pattern mining that have been published in recent years, please let me know and I will add them to this list. Also, I am looking forward to edit another book related to pattern mining soon…. What would be a good topic? If you have some suggestions, please let me know in the comment section below!

—
Philippe Fournier-Vigeris a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

Mining Episode Rules (video)

Brief report about the ECML PKDD 2021 conference

What are Generator Itemsets?

Posted in Big data, Data Mining, Pattern Mining, Utility Mining | Tagged big data, book, data, data mining, data science, pattern mining | Leave a comment

How to convert Latex to HTML?

Posted on 2020-08-01 by Philippe Fournier-Viger

In this blog post, I will explain a simple way of transforming a Latex document to HTML. Why doing this? There are many reasons. For example, you may have formatted some text in Latex and would like to quickly integrate it in a webpage.

The wrong way

First, there is a wrong way of doing this. It is to first create a PDF from your Latex document, and then use a tool to convert from PDF to HTML. If you try this and the document is even just slightly complex, the result may be very bad… and the HTML code may be horrible with many unecessary tags.

The good way

Thus, the best way to convert Latex to HTML is to use some dedicated tool. There are several free tools, but many are designed to run on Linux. If you are using Windows, it may thus take you some time to find the right tool.

Luckily the popular Latex distributions like MikTek and TexLive include an executable of a softwate to convert from Latex to HTML that works on Windows. Thus, if you have the full TexLive distribution, you do not need to download or install anything else. Below, I will describe how to do with TexLive on Windows.

Using TexLive on Windows

First, you need to open the command line and go to the directory containing your Latex document. Let say that your Latex document is called article.tex. Then, you can run this command:

   htlatex article.tex

The result will be a new file article.html

The result is usually quite good. For example, I have converted a research paper that I wrote about high utility episode mining and the results looks like this:

I would say that 90 % of the paper was converted correctly. There are some other parts that I have not shown like some pseudocode for some algorithms that were not formatted properly. But I would say that the conversion is on overall really good.

Conclusion

In this blog post, I have shown a simple way of converting Latex to HTML on Windows using the TexLive distribution. If you are using MikTex or Linux, similar commands can be used.

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

How to become a journal or conference reviewer?

Writing a research paper (6) – presenting data with tables

Why doing a Ph.D.?

Posted in Academia, Latex, Research | Tagged academia, html, latex, Research, writing | 3 Comments

Datasets of 30 English novels for pattern mining and text mining

Posted on 2020-05-30 by Philippe Fournier-Viger

Today, I want to announce that I have just made public datasets of 30 novels from English Novels from 10 authors of the XIX century. These datasets can be used for testing algorithms for sequential pattern mining, sequential rule mining, as well as for some text mining applications such as authorship attribution (guessing the authors of an anonymous text) and sequence prediction.

All the datasets were public domain texts that have been prepared and converted to a suitable format for text analysis by Jean-Marc Pokou et al. (2016) so that they can be used with the SPMF library.

These books are written by 10 different English novelists from the XIX century. The total number of words/sentences in the corpus of each author is as follows:
Catharine Traill (276,829/ 6,588),
Emerson Hough (295,166/ 15,643),
Henry Addams (447,337/ 14,356),
Herman Melville (208,662/ 8,203),
Jacob Abbott (179,874/ 5,804),
Louisa May Alcott (220,775/ 7,769),
Lydia Maria Child (369,222/ 15,159),
Margaret Fuller (347,303/ 11,254),
Stephen Crane (214,368/ 12,177),
Thornton W. Burgess (55,916/ 2,950).

The list of books is:

Author	Datasets (books) in SPMF format
Catharine Traill	– A Tale of The Rice Lake Plains -Lost in the Backwoods – The Backwoods of Canada
Emerson Hough	– The Girl at the Halfway House – The Law of the Land – The Man Next Door
Henry Addams	– Democracy, an American novel – Mont-Saint-Michel and Chartres – The Education of Henry Adams
Herman Melville	– I and My Chimney -Israel Potter -The Confidence-Man His Masquerade
Jacob Abbott	– Alexander the Great – History of Julius Caesar – Queen Elizabeth
Louisa May Alcott	– Eight Cousins – Rose in Bloom – The Mysterious Key and What Opened
Lydia Maria Child	– A Romance of the Republic -Isaac THoppe -Philothea)
Margaret Fuller	– Life Without and Life Within -Summer on the Lakes, in 1843 – Woman in the Nineteenth Century
Stephen Crane	– Active Service – Last Words – The Third Violet
Thornton WBurgess	– The Adventures of Buster Bear – The Adventures of Chatterer the Red Squirrel -The Adventures of Grandfather Frog

Each dataset has two versions: (1) sequences of words and (2) sequences of Part-of-Speeches (POS) tags (obtained using the Stanford NLP Tagger).

Here are the links to download the books:

If you use the above book datasets, you may want to cite this paper:

Pokou J. M., Fournier-Viger, P., Moghrabi, C. (2016). Authorship Attribution Using Small Sets of Frequent Part-of-Speech Skip-grams. Proc. 29th Intern. Florida Artificial Intelligence Research Society Conference (FLAIRS 29), AAAI Press, pp. 86-91

In that paper, we have discovered skip-grams (sequential patterns) and n-grams (consecutive sequential patterns) of part-of-speech tags to guess the authors of books.

More datasets can also be found on the dataset webpage of the SPMF software.

—
Philippe Fournier-Viger is a computer science professor and founder of the SPMF open-source data mining library, which offers more than 170 algorithms for analyzing data, implemented in Java.

New version of SPMF (2.44): 4 new algorithms, datasets and features

Subgraph mining datasets

How to encourage data mining researchers to share their source code and datasets?

Posted in Data Mining, Data science | Tagged dataset, novel, text | Leave a comment

More problems on IONOS web hosting… 4 days of downtime!

Related posts:

A brief report about the IEA AIE 2020 conference

Related posts:

The Imposter Syndrome in Academia

Related posts:

Big problem on my website on IONOS webhosting!

An Introduction to Data Mining

Related posts:

Unethical Reviewers in Academia!

Related posts:

How to Improve Work Efficiency for Researchers?

Related posts:

Five recent books on pattern mining

Related posts:

How to convert Latex to HTML?

Related posts:

Datasets of 30 English novels for pattern mining and text mining

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Number of visitors:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Related posts:

Archives

Categories

Recent Posts

Recent Comments

Tag cloud

Number of visitors: