The future of pattern mining

In this blog post, I will talk about the future of research on pattern mining. I will also discuss some lessons learnt from the decades of research in this field and talk about research opportunities.

patterns

What is the state of research on pattern mining?

Over the last decades, many things have been discovered in pattern mining. The field has become more mature. For example,  algorithms for pattern mining generally always follow the same general approaches, established more than a decade ago. The main types of algorithms in pattern mining are the Apriori based algorithms, pattern growth algorithms and vertical algorithms. The proposal of these fundamental approaches has facilitated the development of new algorithms.

However, although traditional pattern mining problems have been well-studied such as frequent itemset mining, novel pattern mining problems are constantly proposed, and these problems often have unique challenges that require new tailored solutions. For example, this is the case for subgraph mining, where a subgraph mining algorithm must be able to deal with the problem of subgraph isomorphism checking, which does not exist in traditional pattern mining problems such as itemset mining. Another example is the design of efficient algorithms for novel architecture such as cloud systems, parallel systems, GPUs, and FPGAs, which requires to rethink traditional algorithms and their data structures.

A second observation about the state of research on pattern mining is that not all research areas of pattern mining have been explored equally. For example, some topics such as frequent itemset mining and association have received a lot of attention while other problems such as sequential rule mining and periodic pattern mining have been much less explored. In my opinion, this is not because these latter problems are less useful but perhaps because the problem of frequent itemset mining is simpler.

A third observation is that the field of pattern mining seems to be less popular in the last decade.  This is certainly true but it is not something to worry about because there are countless research problems that have not been solved in this field. Besides, all fields of computer science follow some trends that are cyclic.  This is the case for example for research on artificial intelligence which currently receives a lot of attention but was previously met with disinterest and lack of funding opportunities during specific time periods in the last decades (the “AI winters”). Besides, although pattern mining may seem to be less studied than before, some subfields of pattern mining are actually becoming more and more popular. For example, this is the case for high utility pattern mining, which has been growing steadily since the last 15 years. Here is a plot of the number of papers per year on utility mining (a figure prepared by Gan et al (2018):

This figure clearly shows a growing interest on the topic of utility pattern mining. Besides, quality papers in the field of pattern mining are still published in top conferences and journals.

What lessons can we learn?

Several lessons can be learnt. The first one is that too much research have in my opinion focused on improving the performance of algorithms in the last decades, while neglecting the applications of these algorithms. Don’t get me wrong. Performance is very important as one does not want to wait several hours to find patterns. However, considering the usefulness of the discovered patterns ensure that these algorithms will actually be used in real applications.  If researchers would think more about the usefulness of patterns, I think that this could help grow the field of pattern mining further.

There are several pattern mining problems, which have not been applied in real life. Why? A first reason is that the assumptions of some of these problems are unrealistic or too simple.

For researchers working on pattern mining, I think that potential applications should always be considered first.  Working on problems that have many potential applications or are more useful should be preferred. Thus a key lesson is to not forget the user and the applications. If possible discussions with potential users should be carried to learn about their needs. In general, a principle is that the more a problem is specialized, the less likely it will be to be used in real-life. For example, if someone would propose a very specialized problem such as “mining recent high utility episode patterns in an uncertain data streams when considering a sliding window and a gap constraint”, it is certainly less likely to be useful than the more general problem of “mining high utility episodes”.

A second reason why many algorithms are not used in real life is that many researchers do not provide their source code or applications. Sometimes, it is because the authors cannot share them due to restrictions from their institutions or collaborators. And sometimes, it is simply because researchers are worried that someone could design a better algorithm. There are also other reasons such as the lack of time to release the algorithms.  But sharing the source code of algorithms could greatly help other researchers and people interesting in using the algorithms. I previously wrote a detailed blog post about why researchers should share their implementations.

Research opportunities

Having discussed the state of research on pattern mining, there are actually many research opportunities such as:

  • Proposing faster and more memory efficient algorithms,
  • Proposing algorithms having more features or more user-friendly (e.g. interactive algorithms, visualization or algorithms offering to specify additional constraints that are useful for the user)
  • Proposing new pattern mining tasks that have novel challenges,
  • Proposing new applications of existing algorithms,
  • Proposing variations of existing problems (e.g. mining patterns in big data, using parallel architectures, etc.)

I personally think that pattern mining is a good research area because it is challenging and many things can be done.

Conclusion

This is what I wanted to talk about for today. Hope you will have enjoyed this blog post. If you have any other ideas or comments, please leave them in the comment section.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Big data, Data Mining, Data science | Tagged , , , | Leave a comment

Top 10 mistakes made by young researchers

Today, I will discuss 10 mistakes that some researchers do in their early career.

mistakes by researchers

1. Not building social relationships with other researchers. Although one can do research by himself, there are many advantages to collaborating with others such as: getting the feedback of others, writing more articles, learning from others, creating more opportunities such as being invited to participate in committees, and even getting help to find a job. Thus, it is important to build connections with others, for example, by attending conferences. For the young researchers, it is also important to work in a good team with a good supervisor.

2. Publishing in inappropriate conferences and journals. Publications in weak conferences and journals will reduce the potential impact that a paper can have. When evaluating a CV for a job position, weak publications may even not be considered or considered as even detrimental. To ensure a maximum impact, a researchers should publish not only in good journals ad conferences but also in those related to his field.

3. Focusing on quantity rather than quality. Some researchers will focus too much on quantity. Although quantity is somewhat important, having a paper in a top conference or journal is often worth more than having many papers in weak conferences and journals. Ideally, a researcher should have some excellent publications to prove his research ability and also have stable research output. To improve quality, a researcher should not be too much in a hurry to publish papers and should spend time to develop his ideas well.

4. Having unethical behavior. In research, a researcher’s name is his brand. Any unethical behavior is unacceptable and can also irreversibly damage the career of a researcher. A researcher should never have unethical behavior such as producing fake results, plagiarizing other papers and inaccurately describing experiments.

5. Not having a website. It is not difficult for a researcher to create a simple website with links to his papers, and information about his projects but many people don’t do it. Having a website increase research visibility. Besides, one can also provides codes and data to replicate experiments. This can also increase research impact.

6. Not working on topics that can have an impact. Many topics are good but some can have a greater impact than others. Researchers working on very narrow topics with few applications will typically be less cited than topics that can have more applications. Hence, choosing research topics is important.

7. Not working on developing writing and presentation skills. Those skills are essential for a successful researcher. Time spent to improve these skills is never wasted.

8. Not working hard enough. Learning to become a researcher requires to spend thousands of hours as numerous skills must be mastered such as identifying problems, developing original solutions, writing and presenting, as well as getting familiar with existing work. Thus, working hard and being organized is a common feature of top researchers. But although working hard is important, staying healthy is also.

9. Not staying up-to-date with research. This is problem for many researchers in academia. After starting to work in a professor position, one may become too busy or lazy and not keep up with new research. This may lead to difficulty in identifying good research problems and inability to write good papers.

10. Not having a clear career goal or plan to achieve it. Having a clear goal allows to think about the best ways to attain the goal, and to set some subgoals. Having a plan allows to be more efficient to reach a goal.

That is all for today. Hope you have enjoyed reading.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, Research | Tagged , , | Leave a comment

An interview with P. Fournier-Viger about AI and data mining

I recently was interviewed by Djavan de Clercq, a graduate student from Tsinghua University, working on Machine Learning and Optimization. The interview can be read here (on LinkedIn).   I answer ten questions related to data mining and AI research.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

 

Posted in Uncategorized | Leave a comment

Writing a research paper (6) – presenting data with tables

Tables are often used in research papers to present data. In this blog post, I will explain when tables should be used, discuss common mistakes when using tables, and give a few tips to improve tables.

research paper table

When tables should be used?

Tables are useful to display data.  Tables should be used when there is enough data to display, and it is easy to read the table. Moreover, tables should be used when data is easily read in a tabular form.

Common problems with tables in research papers

Here are a few problems related to tables,  often encountered in research papers:

1) The table does not have enough content. Generally, if a table is small or do not have enough content, then one should replace the table by text. For example, consider this table:

Algorithm Runtime (s)
EFIM 10.0
FHM 11.5
EFIM-Closed 16.4

This table does not contain enough information since it can be replaced by a sentence “The EFIM, FHM and EFIM-Closed algorithms spent 10 s, 11.5 s and 16.4 s, respectively”. In this case, text should be used instead of a table.

2) The table is designed to be read horizontally. A table should never be designed to be read horizontally. For example, consider this table:

Algorithm EFIM FHM UPTree
Runtime (s) 10.0 15 20
Memory (Mb) 45 60 70

It should be replaced by a table that can be read vertically:

Algorithm Runtime (s) Memory (Mb)
EFIM 10.0 45
FHM 15.0 60
UPTree 20.0 70

This table contains three columns, each representing an attribute. The first line shows the column titles. Then the following lines provide the data, described using these attributes.

3) Data that is also given in the text or in a figure. Normally, one should not present the same data using text, and also using a table or figure. If the data is given in the text. Then, there is no need to make also show a figure or table.

4) The data presented in the table is not meaningful. A table should be used to present data that is meaningful and relevant to the paper. If the content of a table is unimportant or irrelevant, it can be removed.

5) The text in the table is difficult to understand. Ideally, one should be able to understand a table without having to read the text. Thus, the table title and column titles should be chosen appropriately to facilitate understanding. In particular, if one uses abbreviations in a table, these abbreviations should be defined.

6) Numerical data that is not properly formatted or without units. A table should indicate the units used to represent numerical data such as seconds and megabytes. Moreover, numbers in a table should be properly formatted. In particular the number of digits after the decimal point should be appropriate for the data in terms of significant digits. Moreover, how numbers are formatted should remain consistent through the paper.

7) The table title is not meaningful or too long. Each table must have a title and the title must be meaningful, and not too long. For example, the following table has a short and meaningful title:

Table 1. Performance comparison of EFIM, FHM and UPTree

Algorithm Runtime (s) Memory (Mb)
EFIM 10.0 45
FHM 15.0 60
UPTree 20.0 70

A title is normally a short sentence or a part of a sentence.

8) The table is not formatted according to the requirements of the publisher. Before submitting a paper, one should also make sure that the paper meet the requirements of the publisher. Publishers often have a preferred table format.

A few more tips

  • To make a table more beautiful, one can align the table content. Typically, text is aligned to the left, while numbers may be centered or aligned to the right. For example:
Algorithm Runtime (s) Memory (Mb)
EFIM 10.0 45
FHM 15.0 60
UPTree 20.0 70
  • When a table is used to compare multiple objects, the best value(s) in each column may be highlighted in bold. For example, the following table compares three algorithms. The smallest value is considered to be the best one for each column, and thus it is formatted in bold. This allows to quickly see which algorithm has the best performance.
Algorithm Runtime (s) Memory (Mb)
EFIM 10.0 45
FHM 15.0 60
UPTree 20.0 70
  • Tables are great. But sometimes a chart is more appropriate than a table. It is thus important to consider other possibilities for presenting the data such as a chart or text.

Conclusion

This is all for today!  Hope you have enjoyed this blog post about creating tables for research papers. If you have comments and other ideas or suggestions, please write a message in the comment section below.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Research, Uncategorized | Tagged , , | Leave a comment

Writing a research paper (5) – the bibliography

This is the fifth blog post about writing research papers. Today, I will discuss how to select references and write the bibliography of a research paper. I will also explain some common errors and give additional tips.

writing paper references

Why the bibliography is important?

Every research paper must have a bibliography. The purpose of the bibliography is to let the reader know about the sources of information (e.g. research papers, books, webpages) that have been used to write the paper.  More specifically, the bibliography is used to give credit to some ideas that are reused or adapted in the current paper and to cite relevant related papers.

A good bibliography will give a good overview of related work and will give credit to other papers when credit is due. A bad bibliography may contain errors and may omit some important references. Moreover, the references may be cited in an inappropriate way. It is important to prepare well the bibliography of a research paper to increases its chances of getting accepted.

Some common problems

Here are a few common problems found in bibliographies:

  • References contain incorrect information.  For example,  a paper may be cited with the wrong page numbers, or incorrect information about the publisher. To avoid this problem, it is important to double-check all the information to ensure that it is correct. Many researchers will use some website such as Google Scholar to generate bibliographical entries in the appropriate format. Although this can save some time, the generated references often contain errors either in terms of information or format because they are machine-generated.
  •  References are not formatted properly. Generally, a publisher will require that references are formatted in a specific format, or sorted in a specific order (e.g. alphabetical order). It is important to get familiar with and follow these rules. I have seen some journals that have rejected papers just because the references were not in the correct format, and asked to submit them again. Besides, the format must be consistent for all bibliographical entries.
  • References are too old. A bibliography should be up-to-date. If a paper does not cite any papers from the last few years, it generally means that authors are unaware of recent papers. I often see this problem when some authors publish papers a few years after writing their Ph.D. thesis. Often, they will not update the bibliography. Some reviewers think that an out-of-date bibliography is a good enough reason to reject a paper.
  • References are not cited in the text.  All references from a bibliography must be cited in the text.
  • References to Wikipedia or similar websites.  One should avoid citing Wikipedia in a scientific paper.
  • Low quality references.  When possible, it is recommended to not cite papers having a weak research content, or published in journals that do not have a good reputation such as predatory journals (unless there is a good reason to do so). One should prefer citing papers that have good research content or are published in good journals and conferences.
  • Not citing papers in a correct way in the text. There are some researchers that will cite a paper and then copy and paste some text from that paper.  If it is not explicitly mentioned that the text is copied from the original paper using quotation marks and with a citation, then this is plagiarism. This is a very serious issue. There also exists other types of plagiarism such as copying an idea but rewriting it differently without citing the paper. One should avoid doing this.
  • Too many self-citations. It is OK for one to cite his own papers. However, there should not be too many of those self-citations, except when there is a good reason. For example, I have reviewed some papers containing about 30 references but where more than 15 were by the authors. This is way too many self-citations.  It can lead to directly rejecting the paper. A rule of thumb is that not more than about 10 % of the references should be self-citations.
  • Irrelevant references. This is another problem often related to the previous one.  In some cases, an author will cite many of his own papers to increase his citation count, although these papers are not relevant to the current paper. In general, one should only cite relevant papers.

Tips for preparing a bibliography

Some tips:

  • There exists many websites and software that can help to prepare a bibliography.  However, one should be careful when using tools that automatically generate a bibliography, as it may produce errors.
  • When starting a new research project, it is a good habit to keep track of all the papers that one reads and take notes about them. This will facilitate writing about related work and preparing the bibliography.
  • Before submitting a paper, always double-check the requirements of the publisher in terms of format and make sure that the paper is following them.

Conclusion

I have discussed how to write the bibliography of a research paper, common problems and given a few tips. Hope this will be useful. If you think I have missed something, please share it in the comment section.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Uncategorized | Tagged , , , | Leave a comment

Writing a research paper (4) – the introduction

This is the fourth blog post on how to write research papers.  Today, I will discuss how to write the introduction of a scientific research paper, some common errors, and give some tips.

introduction of research paper

Why the introduction is important?

The introduction plays a very important role in a paper. If the introduction is not well-written and convincing, the reader may decide to stop reading the paper. The role of the introduction is to explain the context of the paper, describes the problem that will be addressed in the paper, briefly mention why previous studies have limitations with respect to that problem, and then briefly explain what will be the contribution of the paper.

A good introduction will have a clear logical structure, and will convince the reader that the problem addressed in the paper is worthy of being investigated (it is a novel problem that cannot be solved using existing approaches, it is not easy to solve, and solving this problem is useful).  Moreover, the introduction will give an overview of the paper.

On the other hand, a bad introduction will be poorly organized and will not convince the reader that the problem addressed in the paper is useful, important or challenging. Thus, after reading the introduction, the reader may lose his interest in the paper. Writing a good introduction is thus very important.

What is the typical structure of an introduction?

Generally, the introduction of research papers always have more or less the same structure:

  • PART 1 (context): The first paragraph introduces the broad context of the paper, and then progressively goes from that broad context to a more specific context
  • PART 2 (problem): Then, a problem is mentioned and why it must be solved.
  • PART 3 (limitations): Then, the introduction briefly mentions that previous studies failed to solve that problem or have limitations. Hence, a new solution is needed (which will be described in the paper).
  • PART 4 (contributions): The following paragraph briefly mention the main contributions of the paper and the key features of the proposed solutions. This may include one or two sentences about the results and conclusion drawn from these results.
  • PART 5 (plan of the paper): Then, often there will be a short paragraph explaining how the rest of the paper is organized.  For example: “The rest of this paper is organized as follows…  Section 2 discussed related work. Section 3 ….  Finally, a conclusion is drawn in Section 5.

Some common errors

Here is a few common errors found in introductions:

  • English errors: An introduction should be well-written and devoid of English errors. 
  • Poor structure: Some introductions do not follow the typical structure of an introduction, and are not organized in a logical way. In this case, the reader may feel lost, may become uninterested or may not be convinced that the research presented in the paper is worthy of being investigated. As a result, the reader may stop reading.
  • A very long introduction, with unnecessary details: Another common mistake is to write a very long introduction that contains too many details.  But an introduction should generally be no longer than a page. Often, an introduction  will contain too many details about related work that are not relevant for the purpose of the introduction. The introduction should only briefly discuss related work  to explain the motivation of the paper. More details about related work can be given in other parts of the paper such as a dedicated “related work” section.
  • An introduction that is not convincing.  The introduction needs to convince the reader that the research problem studied in the paper is important, useful and not trivial to solve. In many papers, a mistake is to not explain why the studied problem is useful. For example, in data mining research, I have read many papers that proposed some new algorithms, evaluated the algorithms with synthetic data, but did not explain clearly or show what are the real applications of the proposed algorithms.
  • An introduction that omit some relevant related work. Sometimes, the introduction of a paper will not cite some relevant studies. This happens when the author is not very familiar with his field of research, and sometimes authors will purposely not cite some relevant papers for various reasons. This can cause a paper to be rejected by reviewers.

A few tips

To write a good introduction:

  • Make a plan of the main ideas that you want to talk and the structure of your introduction before writing it. This will help to organize your ideas, and will help to create an introduction that is  logically organized.
  • When planning or writing your introduction, think about your target audience. Choose words and expressions that are appropriate for that audience. In the first paragraph, you can also explain the context of your work in a more general way to try to reach a broader audience.
  • Reading the introductions of other papers, and studying their structure, can help to write better introductions.
  • If necessary, ask a native English speaker to proofread your text.
  • After writing the introduction, read it again, and spend some time to think about how you can improve it. Generally, taking time to read your text again will help to improve your writing skills.

Conclusion

That is all for this topic! In this  blog post, I provided some key ideas about how to write introductions of research papers. If you have any additional comments, please leave them in the comment section below. Hope you have enjoyed this post.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, General, Research | Tagged , , | Leave a comment

Writing a research paper (3) – the abstract

In this blog post, I will continue the discussion of how to write research papers.  I will discuss the importance of writing a good abstract for research papers, common errors, and give some tips.

writing research paper

Why the abstract is important?

The abstract is often overlooked but it is one of the most important part of a paper. The purpose of the abstract is to provide a short summary of a paper.  A potential reader will often only look at the abstract and title to decide to read a paper or not.  A good abstract will increase the probability that a paper is read or cited, while a bad abstract will have the opposite effect.

The abstract is also very important because many papers are behind a paywall (a publisher will only provide the abstract and ask readers to pay to read the full paper).

What is the typical structure of an abstract?

The structure of an abstract is always more or less the same. Typically, it is a single paragraph, written using the past tense, containing five parts:

  • PART 1 (context): The first sentences talk about the context of the paper from a very general perspective.
  • PART 2 (problem): Then, a problem is mentioned and why it must be solved.
  • PART 3 (limitations): Then, the abstract briefly mentions that solutions proposed in previous studies are insufficient to solve the problem due to some limitations. Thus, we need a new solution.
  • PART 4 (contributions): Then, the abstract mentions the contributions of the paper, which is to propose a new solution, and what are the key features of that solution.
  • PART 5 (results and conclusion): Then, one or two sentences are used to briefly mention the experiment results, and conclusion that can be drawn from these results.

This type of structure gives a concise overview  of the content of the paper.  The next paragraph gives an example of an abstract, which adopts this structure, from the paper describing the EFIM algorithm:

PART 1: In recent years, high-utility itemset mining has emerged as an important data mining task. PART 2 and 3: However, it remains computationally expensive both in terms of runtime, and memory consumption.  It is thus an important challenge to design more efficient algorithms for this task. PART 4: In this paper, we address this issue by proposing a novel algorithm named EFIM (EFficient high-utility Itemset Mining), which introduces several new ideas to more efficiently discover high-utility itemsets. EFIM relies on two new upper-bounds named revised sub-tree utility and local utility to more effectively prune the search space. It also introduces a novel array-based utility counting technique named Fast Utility Counting to calculate these upper-bounds in linear time and space.  (… )  PART 5: An extensive experimental study on various datasets shows that EFIM is in general two to three orders of magnitude faster than the state-of-art algorithms d2HUP, HUI-Miner, HUP-Miner, FHM and UP-Growth+ on dense datasets and performs quite well on sparse datasets. Moreover, a key advantage of EFIM is its low memory consumption.

For some other types of paper such as survey papers the structure is similar but some parts are omitted. Here is an example from a survey paper about frequent itemset mining:

PART 1: Itemset mining is an important subfield of data mining, which consists of discovering interesting and useful patterns in transaction databases. The traditional task of frequent itemset mining is to discover groups of items (itemsets) that appear frequently together in transactions made by customers. Although itemset mining was designed for market basket analysis, it can be viewed more generally as the task of discovering groups of attribute values frequently co-occurring in databases. Due to its numerous applications in domains such as bioinformatics, text mining, product recommendation, e-learning, and web click stream analysis, itemset mining has become a popular research area. PART 4: This paper provides an up-to-date survey that can serve both as an introduction and as a guide to recent advances and opportunities in the field. The problem of frequent itemset mining and its applications are described. Moreover, main approaches and strategies to solve itemset mining problems are presented, as well as their characteristics. Limitations of traditional frequent itemset mining approaches are also highlighted, and extensions of the task of itemset mining are presented such as high-utility itemset mining, rare itemset mining, fuzzy itemset mining and uncertain itemset mining. The paper also discusses research opportunities and the relationship to other popular pattern mining problems such as sequential pattern mining, episode mining, sub-graph mining and association rule mining. Main open-source libraries of itemset mining implementations are also briefly presented.

Some common errors

I will now discuss some six common errors found in abstracts:

  • An abstract containing English errors. The abstract and title are generally the first things that are read in a paper. If an abstract contains English errors, it may give a bad impression to readers.
  • An abstract that does not accurately describe the content of the paper. Sometimes, only the abstract is available to the reader. If the abstract does not give a good overview of the paper, one may not try to access the full paper.
  • An abstract that does not follow the typical structure and is not logically organized.  A good abstract will follow the standard structure described in this post, to ensure that ideas are presented in a logical way.
  • An abstract that contains abbreviations and acronyms. Generally, it is recommended to avoid using acronyms and abbreviations in an abstract since the reader may not be familiar with them. Moreover since abstracts are short, it is typically unnecessary to define abbreviations in an abstract.
  • An abstract that contains citations, or refer to tables and figures.  An abstract should never contain citations, except in some exceptional cases. Moreover, an abstract should never refer to figures or tables.
  • An abstract that contains irrelevant details. Given that an abstract is often restricted to a maximum length, it is important to avoid wasting this space by discussing details that are not important. Thus, the abstract should be concisely written and focus on the key points of the paper.

A few tips

Here are a few additional tips about writing an abstract:

  • Before writing, check if there is a maximum length constraint for the abstract, specified by the publisher.
  • Think about your target audience and use appropriate keywords and expressions in your abstract to ensure that other people in your field can find your paper.
  • A good way to learn how to write abstracts is to look at the abstracts of other papers in your field.
  • Take your time to write an abstract. If necessary, show it to a peer and ask his opinion.
  • If necessary, ask someone to proofread your abstract to remove all English errors.
  • Write sentences that are not too long, and are concise.
  • Several researchers prefer to write an abstract after all the other parts of the paper have been written.  This make sense since the abstract is a summary of the content of a paper.

Conclusion

That is all for this topic. I hope that you have enjoyed this blog post.  I will continue discussing writing research papers in the next blog post. Looking forward to read your opinion and comments in the comment section below!

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, General, Research | Tagged , , | Leave a comment

Writing a research paper (2) – choosing a good title

Today, I will continue my series of blog posts about how to write research papers.  I will discuss the importance of choosing a good title for research papers, some common errors, and give some tips.

select title of research paper

Why the title is important?

The title and abstract are some of the most important parts of a paper because this is what one will typically first look at when searching for papers.  A good title will make it easy to find the paper, and increases its visibility. What is a good title? I will discuss this in details in this post. But generally speaking, it should describe the content of the paper well and make it easy for the target audience to find the paper.  Moreover, a good title should not be misleading or contain errors, as it may give a bad impression to potential readers.

Some common errors

Let’s first discuss some common errors in titles of research papers:

  • A title containing English errors. I often see titles containing English errors. This is a very serious problem because not only it gives a bad impression to potential readers but also it gives a bad impression when applying for jobs, as the title will appear in the CVs of authors, and cannot be changed after it is published.  Thus, ensuring that there is no English errors is crucial.  An example is the title of a paper  in the GESTS International Transactions on Computer Science and Engineering  journal: “Association rules mining: A recent overview“.   Here “rules” should be written as “rule”.
  • A title that is too short or too broad. A title should give a good overview of the content of a paper. For example, a title such as “Methods to analyze data” would be too broad, as it does not tell what kind of methods is used or what kind of data is analyzed. A better title could be “Clustering algorithms to analyze e-learning data”. This is already much better.
  • A title that is too long. A common mistake for young researchers is to write titles that are very long to try to make the title very specific.  Just like titles that are too broad, a title that is too specific is bad. A good rule of thumb is to try to write titles containing no more than 10 words. An example of title that is too long  is:  “Management and analytic of biomedical big data with cloud-based in-memory database and dynamic querying: a hands-on experience with real-world data” published in the proceedings of the KDD 2014 confeence. This title contains about 24 words and in my opinion contains a lot of unnecessary information. For example, “a hands-on experience with real-world data” could be removed to make the title shorter. I would rewrite the title as:  “Biomedical data analytics with distributed in-memory databases”. This is only 7 words instead of 24 and it keeps the main information.
  • A title that contains unnecessary information. A good title should not contain unnecessary information. An example is the word “new” in this title “New algorithms for parking demand management and a city-scale deployment.” from a KDD 2014 paper. The word “new” should be removed since the algorithms are always new in a research paper. Another example is this title “Applying data mining techniques to address critical process optimization needs in advanced manufacturing.” from KDD 2014. In that title, the words “applying”, “needs” and “advanced” are unnecessary. The title could be rewritten as : “Data mining techniques for critical process optimization in manufacturing”.

A few more tips

Here are a few additional tips about writing a title:

  • As shown in the above example, a title can often be made shorter. Thus, after writing a title, one should spend some time to think  about how to make it shorter.
  • A good idea is to write several potential titles and then choose the best one.  One may also ask a colleague to help select the best title.
  • When choosing the title, one should think about the target audience of the paper. In particular, one should think about which keywords may be used to search for the paper, to ensure that other researchers will find it. Sometimes a paper may have more than one target audience. Thus, one may have to take a decision.
  • Avoid using abbreviations in titles as the reader may not be familiar with them. If the abbreviations are well-known such as H2O, it is no problem. But if the abbreviations are not well-known for the target audience, it will reduce the visibility of the paper.
  • A title does not need to be a complete sentence. A good title is short and descriptive.
  • A good way to learn how to write titles is to look at the titles of other papers in the same field.

Conclusion

I hope that you have enjoyed this blog post.  If you have other suggestions or questions, please share them in the comment section below. If there is interest, I will continue writing more blog posts on paper writing in the future.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, General, Research | Tagged , , | Leave a comment

IEA AIE 2018 conference (a brief report)

This week, I am attending the IEA AIE 2018 conference ( 31st International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems) in Montreal, Canada.

About the conference

The IEA AIE conference is an international conference on artificial intelligence and related topics that has been established more than 30 years ago.  It is a decent conference with proceedings published by Springer in the LNAI proceedings series.

Conference opening

On Tuesday morning, it was the conference opening.

iea aie conference

This year,  146 papers have been submitted from 44 countries. From those 35 papers had been accepted as long papers, and 29 as short paper, and 22 papers in some special tracks.  Thus, the acceptance rate for long paper is about 23%, while the global acceptance rate is around 59 %. More details about the review process:iea aie conference review

Here is a breakdown of topics of papers submitted to the conference (from the conference opening):

iea aie topics

It was also announced that next year, IEA AIE 2019 will be held in Graz, Austria http://ieaaie2019.ist.tugraz.at/ ).  Then, IEA AIE 2020 will be held in Japan.

I estimate that there was about 80 persons attending the conference.

Conference location

The conference was held at the Concordia University in Montreal, Canada.  The university is in the nice downtown area of Montreal. The talks where held in some meeting rooms or classrooms of the university.

Conference materials

After registering, I received a conference bag with a few papers, a pen, a notepad and the proceedings on a USB stick.

iea aie conference bag

At the end of the conference, free copies of the proceedings were offered on a first come first serve basis.

Keynote talk by Guy Lapalme on Question Answering Systems

The first keynote talk was given by Guy Lapalme from the University of  Montreal, and was titled “Question-Answering Systems: challenges and perspectives“.  It gave an overview of research on Query-Answering Systems.  I will give a brief description of this talk, below.

The speaker explained what is query answering by comparing it with information retrieval. Information retrieval is about finding documents.  On the other hand, question-answering is not about finding documents but about finding answers to questions.  A question-answering system according to Maybury (2004) is an interactive system, which can understands the user needs given in natural language, search for relevant data, knowledge and document, extract and prioritize answers, and shows and explain answers to the user.

Building a question-answering system requires to consider several aspects related to natural language processing (question/document analysis, information extraction, natural language generation and discourse analysis), information retrieval (query building, document parsing and providing relevant feedback) and computer-human interaction (user preferences and interaction).

Several question answering systems will be build based on some assumptions. Some of these assumptions are that the user prefers an answer rather than a document,  it is not necessary to look at the context,  we are dealing with closed questions (not questions such as: what is the purpose of life?),  answers are given as nominal phrases or numbers instead of a procedure.

Question-answering was studied since the 1970.  Researchers found that obtaining the general understanding of a question and its answers is a very difficult problem. Thus, in the nineties, research was more focused on answering simple questions about facts such as “Where is the Taj Mahal?” or “Name a film which actor A acted”. Nowadays these questions are easily answered by some Web search engines. Then, how to answer more complex questions have been studied by researchers.

There are different types of question answering system. The most simple ones try to directly find verbatim answers in documents by keyword matching, and do simple reasoning.  More advanced system can do analogical, spatial and temporal reasoning to answer more complicated questions such as “Is Canada still in recession?”. Some systems are also interactive (the system can remember previous questions and answers).

A simple question answering system has three main components respectively  (1) to analyse questions to find the type of expected answers (who, what where, how? ) , (2) to analyze documents and find interesting sentences, and (3) to extract answer, evaluate the answers (correct vs incorrect vs correct but without backing from a document).

Commercial systems like Alexa, Google Assistant and Cortana can usually answer simple questions that are task oriented, and can sometimes maintain a conversation context.

There was then more explanations. The conclusion of the talk was that question answering is hard and is far from being a solved problem.  Moreover,  even if many question answering systems are restricted to specific tasks, they can already answer interesting services.

Conference reception

One of the highlight of this conference was that the conference reception was held at Musée Grévin, a wax museum.  This was quite special and was also a good opportunity to discuss with other researchers.

iea aie conference reception

Keynote talk by Far H. Behrouz on Autonomous and Connected Vehicles

The second keynote talk was by Far H. Behrouz from the University of Calgary.  It gave an overview of technologies for autonomous and connected vehicles and prospects in that field.

The main motivations for autonomous vehicle development is to reduce pollution and improve safety, and increase transportation efficiency.  The introduction of connected and autonomous vehicles can bring benefits but can also have impacts such as disruption  and loss of jobs (e.g. truck drivers) and resources.

Some terms used to describes autonomous vehicles are “Autonomous Ground Vehicles (AGVs), “Unmaned Ground Vehicles (UGVs)” (autonomous, self-driving, driverless vehicles), and Intelligent Transportation Systems (ITSs). Some studies have suggested that autonomous vehicles could reduce road fatalities by 1,600 a year in Canada, and bring billions of dollars in economic benefits. Manufacturers have said that UGVs would be available in 2020-2025.

Developing autonomous vehicles requires many advanced technologies, including sensor technology, navigation technology, communication technology, algorithms (control, guidance), data technology, software technology (artificial intelligence, machine learning, personalized guidance), and computing infrastructure (e. cloud). Below, more details will be provided about each of them.

1) Sensor technology: many types of sensors are available from GPS, camera, odometer, laser scanner, LiDAR and Radar. A modern automobile may have 1500+ sensors.

2) Navigation technology: There are both visual and non visual navigation systems. Visual systems use technologies such as pattern recognition, and feature tracking and matching.

3) Communication technology: Initially, a major problem was the range of communication. Nowadays several standards have been proposed for long range communication, including cellular technology.

4) Algorithms: This includes algorithms for localization, guidance and control, cooperative multi-sensor localization, advanced driver assistance systems, etc.

5) Data technology: Large amount of structured and unstructured data must be handled. Performance, availability, data security, resiliency, management and monitoring are important.

6) Software technology: This includes artificial intelligence and machine learning techniques, multi-agent systems, personalized intelligent transportation solutions (monitoring and warning systems). It is also desirable to combine sensory information with real time data, navigation technology and artificial intelligence.

Personalized Intelligent Transportation System solutions

7) Computing infrastructure

The speaker then explained that cheap Unmanned Ground Vehicles can be designed with technology that is available today.  He has shown some simple examples.

UGV example

The speaker then discussed about more technical details and more complex prototypes that were built in his lab.

Conference banquet

The conference banquet was at an archaeology museum called “Musée Pointe à Callières”.  The dinner was fine.iea aie banquet

Conclusion

That is all I wanted to write about the conference.  It was not the first time that I have attended the IEA AIE conference. I have also attended it in 2009 and 2016. If you are curious, here is my report about IEA AIE 2016 in Japan.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, artificial intelligence, Conference | Tagged , , | Leave a comment

Writing a research paper (1) – keeping it simple

Today, I will give some advice about how to write research papers.  This will be the first post of a series of posts on this topic. In this post, I will focus on one aspect, which is to use simple structures and expressions to present the content of a paper. This is a very simple idea. But it very important.

writing research paper

Generally, the goal of writing a paper is to communicate some idea (the research contributions) to the reader. To do that, it is important to write in a clear way by using simple structures. Otherwise, the reader may not understand well your paper.  This is different from other types of texts such as poetry or novels, where writing using complex structures, rare words and expressions can be considered better.

Let me give you some examples. Below is a table with some unnecessarily complex structures (left) which can be replaced by more simple structures (right) to express the same idea:

Complex structure Simple structure
the great majority most
an example of this is the fact that for example
based on the fact that because
for the purpose of for
in a number of cases sometimes
in order to to
with the exception of except

As shown above, it is often possible to replace four or five words by fewer words to express the same idea.

Why is it important to do that?

  • It saves space.
  • Readers better understand the content of the paper. In particular, the paper is easier to read for non native English speakers.

If you want to know more about this, there is a good list of complex structures that can be replaced by more simple structures in the book “How to write an publish a Scientific paper” by R. A. Day et al.) . You can click here for the list (PDF). I  have used this list to prepare the above examples.

Besides, what I discussed today, I also recommend to avoid writing very long sentences. A sentence should not contain more than a single idea.  Also, the structure of a paper is very important. I will discuss this in another blog post.

—-
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, Research | Tagged , , | Leave a comment