How to cite equations in a research paper?

Today, I will talk about how to cite equations in research papers, and give a few advices that may be useful to researchers. The idea of this blog post comes from a discussion that I had with some colleagues about the proper way of citing equations. As you know, some papers will contains equations like this:

   u= q × p     (1)

And then authors will also refer to this equation in the text such as “Equation (1) explains how to calculate the utility of a pattern“. This way of citing equations is generally viewed as acceptable and very common. But is this the best way of refering to equations?

There are a few things that need to be discussed.

First, there are researchers that will say that “equation” is a word with a broad meaning and that it is better to use a more specific term when possible. Following this principe, one could say “Formula (1) explains how to calculate the utility of a pattern” in the above example. There are several terms that can be used depending on the context such as “formula“, “reccurrence“, “inequality” and “identify“. So what is the difference?

  • A mathematical expression is an mathematical phrase that contains some numbers or variables connected by some operators. For example, 1+1 is an expression, and 1+1=2 is also an expression.
  • An equation is a mathematical expression that contains the = symbol. For example 1 = 1 is an equation, and a = b + c is also an equation. Such statements can also be called an equality.
  • An identify is an equality that is true no matter what values are given to the variables that it contains. For example, x + x = x × 2 is an identify.
  • An inequality is a matematical expression where we compare two expression that are not equal by using the symbols such as < > = ≥ ≤ ≠. For example, x+c < d is an inequality. It is not an equation!
  • The definition of a formula is quite broad and some will disagree on the exact meaning. But generally, a formula expresses how to calculate some variable based on one or more variables. For example, equation (1) in the example is a formula for calculating the utility of a pattern. It tells how to find the value of a variable “u” from the values of two variables “q” and “p”. Thus, it can be called a formula. Another example is the Pythagorean theorem which is a formula that can be written as an equation: a^2 + b^2 = c^2.
  • and there are others…

Second, some researchers will suggest to avoid using abbreviations to refer to equations such as “eq. (1)” and “eqn. (1) but instead to write in full “equation (1)” or “formula (1)”.

Third, it is recommended to not just refer to equations in the text but also to add some words to help the reader to remember what the equation is about. This is explained clearly with some examples in the Handbook of writing for the mathematical sciences of Higam:

“When you reference an earlier equation it helps the reader if you add a word or phrase describing the nature of that equation. The aim is to save the reader the trouble of turning back to look at the earlier equation. For example, “From the definition (6.2) of dual norm” is more helpful than “From (6.2)”; and “Combining the recurrence (3.14) with inequality (2.9) is more helpful than “Combining (3.14) and (2.9). Mermin [200] calls this advice the “Good Samaritan Rule”. As in these examples, the word added should be something more informative than just “equation” (or the ugly abbreviation “Eq.”), and inequalities, implications and lone expressions should not be referred to as equations.” (color and bold formatting added by me)

That is all for today! Hope that this has been interesting and will be useful in your papers.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in Academia, General, Research | Leave a comment

Analyzing the COVID-19 genome with AI and data mining techniques (paper + data + code)

Recently, my team has been working on analyzing COVID-19 genome sequences using pattern mining and other data mining and AI techniques. We have recently published a paper in the Applied Intelligence journal about this.

In this blog post, I will give some brief overview of this. The PDF of the paper can be found here:


Nawaz, S., Fournier-Viger, P., Shojaee, A., Fujita, H. (2021). Using Artificial Intelligence Techniques for COVID-19 Genome Analysis. Applied Intelligence, to appear.

And the source code and data: https://github.com/saqibdola/SPM-MA4GSA

The main idea of the paper is the following. We have obtained genome sequences of different strains of the COVID-19 virus. These genome sequences can be viewed as strings of letters (nucleotides). For example, below is four sequences of nucleotides:

Then, after preprocessing these sequences, it is possible to analyze them using pattern mining algorithms and other artificial intelligence techniques. The main process is the following:

First we prepare the data (step 1), and then we apply different techniques (step 2). First, we have applied itemset mining, sequential pattern mining and sequential rule mining techniques to find patterns that are common to many genome sequences. Some examples of sequential patterns (sequences of nucleotides that appear often) find by the CM-SPAM algorithm are below:

This is just to give an overview. Other types of patterns are discussed in the paper in more details.

Second, we also tested sequence prediction models to see if the nucleotides in genome sequences can be predicted. We compared various models offered in the SPMF data mining software and got results like this:

In general, prediction of genome sequence does not give a high accuracy but still better than a random prediction. We discuss these results in more details in the paper.

Third, we also designed some mutation analysis algorithm to compare different strains of the coronavirus. For example, by comparing two strains, we identified some mutations:

That is just a brief overview of the paper!

There are many possibilities for extensions of this research. In particular, various other pattern mining algorithms and machine learning algorithms could be applied as well. Using the code and data provided above, you can also make your own research on this topic! Besides, the tools presented in this paper can also be applied to other genome sequences beside the COVID-19 virus.

Hope this has been interesting.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in artificial intelligence, Big data, Bioinformatics, Data Mining, Data science, Machine Learning, Pattern Mining | Tagged , , , , , , , , , | 5 Comments

New version of SPMF (2.44): 4 new algorithms, datasets and features

Today, I am happy to announce that a new version of the SPMF open-source data mining software is released (v. 2.44). This is the download page. This new version was made possible due to several contributors.

SPMF

What is new?

New data mining algorithms:
– LTHUI-Miner for mining the locally-trending high utility itemsets (by Yanjun Yang)
– MLHUI-Miner for discovering the multi-level high utility itemsets (by Ying Wang)
– AER-Miner for mining attribute evolution rules in a dynamic attributed graph (by Ganghuan He)
– TSPIN for mining the top-k stable periodic patterns in a transaction database
(by Ying Wang, Peng Yang)

New datasets:
Fruithut, Liquor and E-commerce (prepared by Ying Wang)

Some bug fixes:
– Fixed a small bug in DFIN (thanks to Nader Aryabarzan)
– Fixed a bug in the user interface for TKG (thanks to..)
– Fixed a small error in the DB_LHUI.txt example file

Some new tool:
– Added a new tool to fix problems in transactions databases with time and utility information

Some new feature:
– Added a feature for saving or loading a trained sequence prediction model from file (for the AKOM, DG, TDAG, LZ78, PPM, CPT, and CPT+ algorithms)

Wrappers for other languages
– Added a webpage with a list of wrappers for calling some SPMF algorithms from R, Python, Weka and Spark (note: those wrappers were designed by other people and are not an official project of SPMF so be aware that not all algorithms may work… but I think it can be quite useful for some)

What is for the future?

The SPMF project is still under active development. The next version will be released in a few weeks as I still have received several algorithms that are waiting to be added to SPMF. The next release will include several new sequential pattern mining algorithms.

Would you like to contribute to SPMF?

The SPMF project is quite successful due to the numerous contributors that are provided some code, reported bugs and also provided many comments. If you are interested to contribute something (code of some algorithms or other things), please leave me an e-mail at philfv8 AT yahoo DOT com.

Thanks again to all for your support!

Posted in Data Mining, Data science, open-source, Pattern Mining, spmf, Utility Mining | Tagged , , , , , , , | Leave a comment

How to write an academic book?

Have you ever wanted to write an academic book or wondered what are the steps to write one? In this blog post, I will give an overview of the steps to write an academic book, and mention some lessons learned while writing my recent book on high utility pattern mining.

Step 1. Think about a good book idea.
The first step for writing a book is to think about the topic of the book and who will bethe target audience. The topic should be something that will be interesting for an audience. If a book focuses on a topic that is too narrow or target a small audience, the impact may be less than if a more general topic is chosen or if a larger audience is targeted.

One should also think about the content of the book, evaluate how much time it would take to write the book, and think about the benefits of making the book versus spending that time to do something else. It is also important to determine the book type. There are three main types of academic books:

  • First, one may publish a textbook, reference book or handbook. Such book must be carefully planned and written in a structured way. The aim is to write a book that can be used for teaching or used as a reference by researchers and practitioners. Because such book must be well-organized, all chapters are often written by the same authors.
  • Second, one may publish an edited book, which is a collection of chapters written by different authors. In that case, the editors typically write one or two chapters and then ask other authors to write the remaining chapters.This is sometimes done by publishing a “call for chapters” online, which invite potential authors to submit a chapter proposal. Then, the editor evaluates the proposal and select some chapters for the book. Writing such book is generally less time-consuming than writing a whole book by oneself because the editors do not need to write all the chapters. However, a drawback of such book is that chapters may contain redundancy and have different writing styles. Thus, the book may be less consistent than a book entirely written by the same authors. A common type of edited book is also the conference or workshop proceedings.
  • Third, one may publish his Ph.D. thesis as a book if the thesis is well-written. In that case, one should be careful to choose a good publisher because several predatory publishers offer to publish theses with a very low quality control, while taking all the copyrights, and then selling the theses at very expensive prices.

Step 2. Submit a book proposal
After finding a good idea for a book, the next step is to choose a publisher. Ideally, one should choose a famous publisher or a publisher that has a good reputation. This will give credibility to the book, and will help to convince potential authors to write chapters for the book if it is an edited book.

After choosing a publisher, one should write a book proposal and send it to the publisher. Several publishers have specific forms for submitting a book proposal, which can be found on their website or by contacting the publisher. A book proposal will request various information such as: (1) information about the authors or editors, (2) some sample chapter (if some have been written), (3) is there similar books on the market?, (4) who will be the primary and secondary audience?, (5) information about the conference or workshop if it is a proceedings book, (6) how many pages, illustrations and figures the book will contain?, (7) what is the expected completion date?, and (8) a short summary of your book idea and the chapter titles.

The book proposal will be evaluated by the publisher and if it is accepted, the publisher will ask to sign a contract. One should read the contract carefully and then sign it if it is satisfying.

Step 3. Write the book
Then the next step is to write the book, which is generally the most time-consuming part. In the case of a book written all by the same authors, this can require a few months. But for an edited book, it can take much less time. Editor must still find authors for writing the chapters and perhaps also write a few chapters.

After the book have been written, it should be checked carefully for errors and consistency. A good idea is to ask peers to check the book to see if something need to be improved. For an edited book, a review process can be organized by recruiting reviewers to review each chapter. The editors should also spend some time to put all the chapters together and combine them in a book. This can take quite a lot of time, especially if the authors did not respect the required format. For this reason, it is important to give very clear instructions to authors with respect to the format of their chapters before they start writing.

Step 4. Submit the book the publisher
After the book is written, it is submitted to the publisher. The publisher will check the content and the format and may offer other services such as creating a book index or revising the English. A publisher may take a month or two to process a book before publishing it.

Step 5. Promote the book
After writing a book, it is important to promote it in an appropriate on the web, social media, or at academic conferences. This will ensure that the book is successful. Of course, if one choose a good publisher, the book will get more visibility.

Lessons learned
This year, I published an edited book on high utility pattern mining with Springer. I followed all the above steps to edit that book. I first submitted a book proposal to Springer, which was accepted. Then, I signed the contract, and posted a call for chapters. I received several chapter proposals and also asked other researchers to write chapters. The writing part took a bit of time because although I edited the book, I still participated to the writing of six of the twelve chapters. Moreover, I also asked various people to review the chapters. Then, it took me about 2 weeks to put all the chapters together and fix the formatting issues. Overall, the whole process was done over about 1 year and half, but I spent perhaps 1 or 2 months of my time. Would I do it again? Yes, because I think it is a good for my career, and I have some other ideas for books.

The most important lesson that I learned is to give more clear instructions to authors to reduce formatting problems and other issues arising when putting all chapters together.

Conclusion
In this blog post, I have discussed how to write an academic book. Hope you have learned something! Please share your comments below. Thanks for reading!


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged , , , , , , | Leave a comment

An introduction to frequent subgraph mining

In this blog post, I will give an introduction to an interesting data mining task called frequent subgraph mining, which consists of discovering interesting patterns in graphs (a pattern mining task). This task is important since data is naturally represented as graph in many domains (e.g. social networks, chemical molecules, map of roads in a country). It is thus desirable to analysze graph data to discover interesting, unexpected, and useful patterns, that can be used to understand the data or take decisions.

What is a graph? A bit of theory…

But before discussing the analysis of graphs, I will introduce a few definitions.  A graph is a set of vertices and edges, having some labels. Let’s me illustrate this idea with an example. Consider the following graph:

This graph contains four vertices (depicted as yellow circles). These vertices have labels such as “10” and “11”.  These labels provide information about the vertices. For example, imagine that this graph is a  chemical molecule. The label 10 and 11 could represent the chemical elements of Hydrogen and Oxygen, respectively. Labels do not need to be unique. In other words, the same labels may be used to describe several vertices in the same graph. For example, if the above graph represents a chemical molecule, the labels “10” and “11” could be used for all vertices representing Oxygen and Hydrogen, respectively.

Now, besides vertices, a graph also contains edges. The edges are the lines between the vertices, here represented by thick black lines.  Edges also have some labels. In this example, four labels are used, which are 20, 21, 22 and 23.  These labels represents different types of relationships between vertices. Edge labels do not need to be unique.

Types of graphs: connected and disconnected 

Many different types of graphs can be found in real-life. Graphs are either connected or disconnected.  Let me explain this with an example. Consider the two following graphs:

The graph on the left is said to be a connected graph because by following the edges, it is possible to go from any vertex to any other vertices. For example, imagine that vertices represents cities and that the edges are roads between cities. A connected graph in this context is a graph where it is possible to go from any city to any other cities by following the roads.  If a graph is not connected, it is said to be a disconnected graph. For example, the graph on the right is disconnected since Vertex A cannot be reached from the other vertices by following the edges. In the following, we will use the term “graph” to refer to connected graphs.  Thus, all the graphs that we will discuss in the following paragraphs will be connected graphs.

Types of graphs: directed and undirected

It is also useful to distinguish between directed and undirected graphs. In an undirected graph, edges are bidirectional, while in a directed graph, the edges can be unidirectional or bidirectional. Let’s illustrate this idea with an example.

The graph on the left is undirected, while the graph on the right is directed. What are some real-life examples of a directed graph?  For example, consider a graph where vertices are locations and edges are roads. Some roads can be travelled in both directions while some roads may be travelled in only a single direction  (“one-way” roads in a city).

Some data mining algorithms are designed to work only with undirected graphs, directed graphs, or support both.

Analyzing graphs

Now that we have introduced a bit of theory about graphs, what kind of data mining task can we do to analyze graphs?  There are many  answers to this question. The answer depends on what is our goal but also on the type of graph that we are analyzing (directed/undirected, connected/disconnected,  a single graph or many graphs).

In this blog post, I will explain a popular task called frequent subgraph mining. The goal of subgraph mining is to discover interesting subgraph(s) appearing in a set of graphs (a graph database). But how can we judge if a subgraph is interesting?  This depends on the application. The interestingness can be defined in various ways. Traditionally, a subgraph has been considered as interesting if it appears multiple times in a set of graphs. In other words, we want to discover subgraphs that are common to multiple graphs. This can be useful for example to find association between chemical elements common to several chemical molecules.

The task of finding frequent subgraphs in a set of graphs is called  frequent subgraph mining.  As input the user must provide:

  • graph database (a set of graphs)
  • a parameter called the minimum support threshold (minsup).

Then, a frequent subgraph mining algorithm will enumerate as output all frequent subgraphs. A frequent subgraph is a subgraph that appears in at least minsup graphs from a graph database.  For example, let’s consider the following graph database containing three graphs:

Now, let’s say that we want to discover all subgraphs that appear in at least three graphs. Thus, we will set the minsup parameter to 3. By applying a frequent subgraph mining algorithm, we will obtain the set of all subgraphs appearing in at least three graphs:

Consider the third subgraph (“Frequent subgraph 3”).  This subgraph is frequent and is said to have a support (a frequency) of 3 since it appears in three of the input graphs. These occurrences are highlighted in red, below:

Now a good question is how to set the minsup parameter? In practice, the minsup parameter is generally set by trial and error.  If this parameter is set too high, few subgraphs will be found, while if it is set too low, hundred or millions of subgraphs may be found, depending on the input database.

Now, in practice, which tools or algorithms can be used to find frequent subgraphs? There exists various frequent subgraph mining algorithms. Some of the most famous are GASTON, FSG, and GSPAN.

Mining frequent subgraphs in a single graph

Besides discovering graphs common to several graphs, there is also a variation of the problem of frequent subgraph mining that consists of finding all frequent subgraphs in a single graph rather than in a graph database. The idea is almost the same. The goal is also to discover subgraphs that appear frequently or that are interesting. The only difference is how the support (frequency) is calculated. For this variation, the support of a subgraph is the number of times that it appears in the single input graph. For example, consider the following input graph:

This graph contains seven vertices and six edges. If we perform frequent subgraph mining on this single graph by setting the minsup parameter to 2, we can discover the five following frequent subgraphs:

These subgraphs are said to be frequent because they appear at least twice in the input graph. For example, consider “Frequent subgraph 5”. This subgraph has a support of 2 because it has two occurrences in the input graph. Those two occurrences are highlighted below in red and blue, respectively.

Algorithms to discover patterns in a graph database can often be adapted to discover patterns in a single graph.

Want to try frequent subgraph mining?

If you want to try frequent subgraph mining algorithms, some public fast Java open-source implementations of TKG for top-k frequent subgraph mining and gSpan are available in the SPMF data mining library. The SPMF library also offers algorithms for many other pattern mining tasks such as high utility itemset mining, sequential pattern mining, sequential rule mining and periodic pattern mining.

Conclusion

In this blog post, I have introduced the problem of frequent subgraph mining, which consists of discovering subgraphs appearing frequently in a set of graphs.  This data mining problem has been studied for more than 15 years, and many algorithms have been proposed.  Some algorithms are exact algorithms (will find the correct answer), while some other are approximate algorithms (do not guarantee to find the correct answer, but may be faster).

Some algorithms are also designed to handle directed or undirected graphs, or mine subgraphs in a single graph or in a graph database, or can do both. Besides, there exists several other variations of the subgraph mining problem such as discovering frequent paths in a graph, or frequent trees in a graph.

Besides, in data mining in general, many other problems are studied related to graphs such as optimization problems, detecting communities in social networks, relational classification, etc.

In general, problems related to graphs are quite complex compared to some other types of data. One of the reason why subgraph mining is difficult is that algorithms typically need to check for “subgraph isomorphisms“, that is to compare subgraphs to determine if they are equivalent. But nonetheless, I think that these problems are quite interesting as there are several research challenges.

I hope that you have enjoyed this blog post. If there is some interest about this topic, I may do another blog post on graph mining in the future.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 120 data mining algorithms.

Posted in Big data, Data Mining, Pattern Mining | Tagged , , , , , , | 12 Comments

Sequential pattern mining vs Sequence prediction ?

In this blog post, I will answer a question that I have received in my e-mail about what is the difference between sequential pattern mining and sequence prediction. I think that this is a good question and sharing the answer can help to clarify some concepts for some people.

Generally speaking, the goal of sequential pattern mining is to find some patterns that appear in many sequences of symbols.  For example, lets say that you have some sequences of purchases made by customers in a retail store. You can then apply a sequential pattern mining algorithm to find sequential patterns, that is to know what are some sequence of purchases that are common to many customers. For example, you may find that  <harrypotter1, spiderman, batman>  is a sequential pattern. This pattern means that many people have bought the movie Harry potter 1, and then Spiderman, and then Batman.  If you find such patterns, it can help you to understand the data. If you are the retail store manager, you may use such pattern to take some business decisions such as to offer some discount to customers on Batman if they previously buy harrypotter and spiderman.

But there are many other usages of sequential patterns. You can also use the sequential patterns to make some sequence prediction. For example, if someone buys Harry Potter 1 and Spiderman, you may predict that he will buy Batman based on the above sequential pattern. This can be used to perform recommendation

Another example about the applications of sequential pattern mining is to find patterns in text documents.  A text document is a set of sentences, and each sentences is a sequence of words. Thus, you can apply a sequential pattern mining algorithm to find the sequential patterns that tell you some frequent sequence of words appearing many times in a book. This can tell you about some writing patterns used by some authors, and you can even use these patterns to try to guess who is the author of some anonymous book (if you are curious, I actually did that in a paper: https://www.philippe-fournier-viger.com/FLAIRS2016__AUTHORSHIP_ATTRIBUTION.pdf).

On the other hand, the goal of sequence prediction is to predict what is the next symbol of a sequence of symbols.  For example, some people buy  the movies Harry Potter 1,  Hulk, Batman, and then Star Wars, and we want to know what is the next movie that this person will buy?  There are many ways to do sequence prediction. One way is to use the sequential patterns or a variation called sequential rules. For example, we did sequence prediction using sequential rules in apaper to predict the next webpage that someone will click: https://www.philippe-fournier-viger.com/sequential_rules_prediction_2012.pdf
But there are also many other models for sequence predictions that do not rely on sequential patterns like the CPT and CPT+ models (video presentation here: https://data-mining.philippe-fournier-viger.com/video-sequence-prediction-with-the-cpt-and-cpt-models/) , the all-k order markov model, the DG model, TDAG, and LZ78.

Thus, to summarize, the goal of sequential pattern mining is to find patterns.  You can find these patterns in data for multiple purpose. It can be just to understand the data and learn something about it. It can be to use these patterns to do sequence prediction, or other tasks like clustering, authorship attribution, etc.  Thus, sequential pattern mining has many applications and sequence prediction is one of them.  And the goal of sequence prediction is to predict the next symbol of a sequence. There are many methods to do sequence prediction and sequential pattern mining is one of them.

Hope that his short answer will be helpful. Some additional blog posts that I wrote on these topics:


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 180 data mining algorithms.

Posted in Big data, Data Mining | Tagged , , , , | 2 Comments

Email invitation to be a “special” speaker, a scam?

Have you ever received an e-mail from some small conference telling you that they want to invite you to give a talk as a special speaker, honorary speaker or distinguished speaker? I received several e-mails like this and while something it is some valuable invitation, most of the time is just spam. In this blog post, I will discuss this.

Several small conference organizers send unsollicited e-mails to try to attract papers for their conferences and ultimately to collect money in the form of registration fees. As many people ignore these e-mails, some strategies that they use is to mention the title of one of your recent paper and to invite you as a special speaker. This is an example:

Dear Dr. XXXXXX

Please accept my apology if this email bothers you, as I have tried to send you this invitation in last months but without any response from you.

On behalf of the Organizing Committee, it is our delight to extend to NAME_OF_CONFERENCEwhich is going to be held during 2021 (next year) in CITY, Japan.It is our great pleasure and privilege to welcome you to join the NAME_OF_CONFERENCE act as the chair/speaker while presenting about TITLE OF_MY_PAPER.

Another example:

Dear Dr. XXXXX,

Hope you receive this letter in a wonderful mood. Please accept my apology if this email bothers you, as I have tried to send you this invitation to you but without any response. Would you please send a reply?

Thanks for your time to this email from our committee, the committee of XXXXXX-2022 cordially welcome you to share a presentation as a session speaker/chair regarding your research. Sincerely wish you can give us an opportunity to include your research in our program and proceeding.


If you read such e-mails for the first time, you may think that the sender has read your paper and is really interested in your research and wants to invite you as a special guest to their conference to give a talk. So you may be tempted to accept. However, most of the time, this is just SPAM and they only mention the title of one of your paper and that you are invited as a special speaker to catch your attention.

Before accepting such invitation, you should check: (1) is this a well-known conference in your field that has been held for several years? (2) do you know other people having attended it? (3) is it associated with a famous institutions or publisher (beware that the website may be fake and provide misleading information, though). But perhaps the most important is to check if the conference organizers will be asking you to pay a registration fee or give you some special benefits. The reason is that as an invited guest, you would expect some kind of preferential treatment over regular attendees. If they do not give you any special benefits as a special speaker, then you are just another speaker for their conference, and the goal is just to earn money. If you are not sure about whether a conference is legit or not, it is best also to ask your supervisor or senior researchers for their opinion.

There are also many other types of SPAM e-mails that target academic researchers such as e-mails asking you to publish your thesis as a book. I may talk about this in more details in a future blog post! That is all for today!


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.


Posted in Academia, Research | Tagged , , , | Leave a comment

Two journal special issues with deadlines in 2021

This is a short blog post tto let you know that I am co-organizing two special issues in some journals related to data mining, data science and machine learning:

(1) Special issue on Spatiotemporal Big Data Analytics
Journal: Electronics
Publisher: MDPI
Deadline: 30th September 2021
Details: https://www.mdpi.com/journal/electronics/special_issues/Spatiotemporal-BD-Analytics

(2) Special issue on Generative Adversarial Networks for Multi-Modal Multimedia Computing
Journal: Wireless Communication and Mobile Computing
Publisher: Wiley / Hindawi
Deadline: 28th May 2021
Link: https://www.hindawi.com/journals/wcmc/si/710287/

Note that those are open-access journal. Thus, there is a publication fee. If you do not want to have a publication fee, you may consider submitting your paper to the Data Science and Pattern Recognition journal (DSPR), which is currently free to publish and has a fast review time. I am editor-in-chief of that journal.

If you have any questions or are not sure whether your research is relevant to these journals, you may send me a message, and we can discuss about it.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in cfp | Leave a comment

Atomic Habits to Become a Better Researcher

In this blog post, I will talk about the concept of atomic habits and how it can help you achieve your goals in general, but especially to become a better researcher.

The concept of Atomic Habits, was popularized in a book called “Atomic Habits” by James Clear. The idea in this book is quite simple but yet, it is a powerful way of achieving your goals by applying some simple steps in your daily life.

Setting your goal

The first step is to set some clear goal about what you want to do. This is important as it is the target that you want to achieve. A goal could for example be to improve your paper writing skills.

The importance of the process or system

Having a goal is good but it is yet not enough because persons who succeed and fail still have the same goal (e.g. the winner of a race vs the losers, or the researchers who got a tenured positions vs those who don’t). Thus, what makes the difference between succeeding and failing is not the goal but the system or process that is used to achieve it.

The system of atomic habits

The main idea in the book of James Clear is that we can achieve big goals by changing our daily habits. This can be by adopting some good habits. For example, if your goal is to improve your English writing skills, working on it 20 minutes every day may not do much in the short term but in the long term may lead to major improvements. But it can also be useful to remove the bad habits. For example, one may want to stop wasting too much time browsing the Web every day.

However, as many people knows, it is often hard to start a good habit and keep it for a long time. Many people will for example start to do some physical exercises for a few weeks and then give up quickly. It is also hard to stop bad habits.

To help change habits, the key points proposed by James Clear are:

  • Make it easy: Do not try to make some changes that are too challenging early. For example, if you decide to study English 5 hours per day, it would be perhaps be difficult to sustain over the long term. It is perhaps better to start with 20-30 minutes per day, and later you may increase. But at first, consistency is what is important and will help you to not give up.
  • Make it obvious: To make sure that you continue your good habit every day, it is important that you do not forget about it. Thus, you may try to connect your new habit with your previous habits. For example, if you want to take some medicine every day, you may put it beside your toothbrush to not forget to take it. If you want to read a book every night, you may put the book on your pillow.
  • Make it attractive and satisfying: Because the long term goals may take a lot of time to achieve, it is important to make sure you associate some short term rewards to your good habits . Thus, you may think about some rewards such as: If I study English every day for one week, I will buy myself a hot chocolate cup.
  • Make it harder to keep the bad habits: You may think about strategies to make your bad habits harder to do. For example, if you want to drink less alchool, you may put the bottles out of sight.
  • Track your habits: You may use some book to keep a record of your habits over time.
  • Find the right environment and the right people. The environment and the people that we interact with also play a role in changing habits. Changing the environment or getting along with other people having the same goals may help.

These ideas are quite simple and can be applied to many aspects of life (loosing weight, etc.) but can be also used by researchers to become better researchers. Some good habits for researchers may be to waste less time on the Internet, to have a fixed schedule and sleep well every day, to eat well, to exercise, to improve writing skills, to write more papers, to write a book, to improve programming skills, to write a blog post every week, to improve presentation skills, etc.

Conclusion

In this blog post, I gave a short overview of the book Atomic Habits and discussed a little bit about how it can be used in the life of a researcher. If you do not have time to read the book, you may have got the main idea from this blog post. There is also some good video presentations by James Clear that can be watched online (it is shorter than reading the book, if you are busy like me!).

If you have any comments, please post in the comment section below.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Leave a comment

The Hard Road to Success in Academia

Today, I will talk about at topic that not many researchers talk about, which is the long and hard road to become a professor in academia, and why some people give up before reaching their goal at different stages of their career. I will talk about this topic because I was recently reading about researchers who decided to leave academia to do something else due to the difficulty of getting a tenured professor position or permanent researcher position.
A list of researchers that have posted about the reasons why they have left academia can be found below, and has inspired this blog post:
https://docs.google.com/spreadsheets/u/0/d/1OODoiZKeAtiGiI3IAONCspryCHWo5Yw9xkQzkRntuMU/edit

By reading these posts, some observations are:

  • Several people complained that there are not enough professor positions that are available. This is true as there are much more persons who obtain a PhD than persons who can become a professor.
  • Someone can become a post-doctoral researcher after the PhD to gain more experience, but there is typically a limit on the number of years that one can work as a post-doctoral researcher. So this is a temporary solution to get more time before finding a professor position. Some persons have done up to 6 years as a post-doctoral researcher after their Ph.D but could still not find a faculty position and thus decided to give up and do something else.
  • Some persons complained about the low salary of working as a post-doctoral researcher compared to what they would earn in the industry. Some also mentioned the unability to have a stable job, sometimes having to sign a 1 year contract, while some other have been more lucky to sign for 2 or 3 years.
  • Several people complained about not choosing where they would live (for example having to accept a post-doc position in another city far from their family)
  • Some people enjoyed working as a post-doctoral researcher but it is is a temporary position as there is typically a limit on the number of years that one can be a post-doctoral researcher.
  • Some persons complained that many entry-level faculty positions in universities are short-term contracts and are not permanent. This is a reality in many places. In fact, after my Ph.D and doing my post-doc, I even started with a 9 month contract as an adjunct professor position in Canada, before getting a 3 year contract, and now a 5 year contract. Also, several permanent professor who retired were replaced by temporary jobs to reduce costs.
  • Some people have given up on academia to work in the industry or start their own business among other things. Some have decided to do something completely different such as starting a knitting store! Some have decided to even go back to studying to obtain a degree in another field.
  • Some people have said that they actually gave up on many other things to try to succeed in academia such as spending less time with their daugther, working every evening and week-ends to work on papers and books, and also gave up on other things that they like. Thus, by leaving academia, some have said that they are now happy to pursue other dreams.
  • Several adjunct professors have tried hard to get a tenured position but gave up. Some reasons are the inability to get national funding, after applying multiple times and failing. In some countries like USA, the success rate appears to be very low in some fields.
  • Some researchers have complained about some toxic working environments such as other people trying to sabotage their research, etc.
  • Some researchers have talked about the negative psychological effect and depression due to various factors such as having to work hard, a toxic work environment, and to pressure to obtain grants and get tenured.
  • Some people claim that a good amount of luck is required to be successful in academia.

I have to say that it is true that it is not easy to succeed in academia. I personally had to go through many challenges to become a professor and eventually get a full professor position, and be successful in my field. I also had to give up on several other things that I like to succeed. And I had to work hard for more than a decade, almost every day from morning this late at night (which I still do, by the way). But now, I have a good position and I am quite happy of what I do. For me, all this hard work was worth it as I like doing research and I enjoy teaching. But I certainly gave up on some other things that I enjoy to focus on my research career. For example, I also enjoy other things like learning languages, drawing, playing guitar and running as hobbies, which I do not have too much time to do.

For the young researchers, my first advice is to learn to know yourself and what you really like. If your dream is to become a professor, it is possible but you need to work hard and work smart to be the most effective and reach your goal. You also need to make a realistic plan of how to attain your goal. I have written several blog posts to give advices about how to be successful in academia that you can read:

Hope that this blog post has been interesting! If you want to add something, please share your comments in the comment section below.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged , , , , | 2 Comments