How to write good research papers and publish them in excellent conferences? I have written a series of blog posts to answer these questions and more. To make this information easy to find, here is the list of these blog posts:
This year, I am attending the PAKDD 2019 conference (23rd Pacific Asia Conference on Knowledge Discovery and Data Mining), in Macau, China, from the 14th to the 17th April 2019. In this blog post, I will provide information about the conference.
About the PAKDD conference
PAKDD is one of the most important international conference on data mining, especially forAsia and the pacific area. I have attended this conference several times in recent years. I have written reports about the PAKDD 2014, PAKDD 2015, PAKDD 2017 and PAKDD 2018 conferences.
The proceedings of PAKDD are published in the Springer Lectures Notes on Artificial Intelligence (LNAI) series, which ensures good visibility for the paper. Until the end of May 2019, the proceedings of PAKDD 2019 can be downloaded for free.
This year, PAKDD 2019 received a record of 567 submissions from 46 countries. 25 papers were rejected because they did not follow the guidelines of the conference. Then, other papers were reviewed each by at least 3 reviewers. 137 papers have been accepted. Thus the acceptance rate is 24.1 %.
The conference was held at The Parisian hotel, a 5 stars hotel in Macau, China. Macau is a very nice city, located in the south of China. It has nice weather and some of its major industries are casinos and tourism. Macau was once occupied by Portugal before being returned to China. As a result, there is a certain Portuguese influence in Macau.
Day 0: Registration
On the first day, I arrived at the hotel and registered. The staff was very friendly. Below are some pictures of the registration area, the conference bags and materials. The bag is good-looking and contains the proceedings on a USB, the program, as well as some delicious local food as a gift.
Day 1 : Tutorial: IoT BigData Stream Mining
In the morning, I have attended the IoT Big Data Stream Mining tutorial by Joao Gama, Albert Bifet, and Latifur Khan.
It was first discussed that IoT is a very important topic nowadays. According to Google Trends, IoT (Internet of Things) has became more popular than “Big Data”.
In traditional data mining, we often assume that we have a dataset to train a model. A key difference between traditional data mining and analyzing the data of IoT is that the data may not be a static dataset but a stream of data, coming from multiple devices. A data stream is a “continous flow of data generated at high-speed from a dynamic time-changing environment”. When dealing with a stream, we need to build a model that is updated in real-time and can fit in a limited amount of memory, to be able to do anytime predictions. Various tasks can be done on data streams such as classification, clustering, regression and pattern mining. Some key idea in stream mining is to extract summaries of the stream because all the data of a stream cannot be stored in memory. Then, the goal is to provide approximate predictions based on these summaries and provide an estimation of the error. It is also possible to not look at all the data but to take some data samples, and to estimate the error based on the sample size.
If you are interested in this topics, slides of this tutorial can be found here.
Day 1: Welcome reception
After the workshops and tutorials, there was a welcome reception in the evening at the Galaxy Hotel. There were drinks and food. It was a good opportunity for discussing with other researchers. I met several researchers that I knew and met several people that I did not knew.
Day 2: Conference Opening
The second day started with the conference opening, where a traditional lion dance was first performed.
Then, the organizers talked. It was announced that there was more than 300 participants to the conference this year.
The PC chair gave information about the conference. Here are some pictures of some slides:
Then, there was a keynote about relational AI by Dr. Jennifer L. Neville. It was about the analysis of graph or networks such as social networks.
In the evening, there were no activities were planned, so I went with other researcher to eat at a restaurant in the Taipa area.
Day 3: Keynote on Talent Analytics
In the morning, there was a keynote by prof. Hui Xiong about “Talent Analytics: Prospects and Opportunities”. The talk is about how to identify and manage talents, which is very important for companies.
A talent is some “experienced professional with deep knowledge”. This is in contrast with personnel that do simple standardized work and have simple knowledge and may in the future be replaced by machines. Talents are team players and elite talents also have leadership. Leadership means to have vision about the current situation and what will happen in the next five years, be able to manage a team and manage risks. In terms of team management, it is important to find talents for the right positions and manage the team well.
The presenter explained that intelligent talent management (ITM) means to use data with an objective, and to take decisions based on data, and to offer specific solution to complex scenarios and be able to do recommendations and predictions. Some examples of tasks are to predict when talents will leave, do intelligent recruitment, do intelligent talent development, management, organization, and risk control. Doing this well requires big data technical knowledge and human resource management knowledge.
Then, there was paper presentations.
Day 3: Excursion and banquet
In the afternoon, there was a 4 hour city tour of St. Paul Ruin, Senado Square, A Ma temple and the Lotus flower square. Here are a few pictures.
Finally, the conference banquet was held in the evening. Several awards were announced.
And there was some music and show during the banquet:
Day 4: Keynote Talk on Big Data Privacy
In the morning, there was a keynote talk by Josep Domingo-Ferrer about how to reconcile privacy with data analytics. He explained what is big data anonymization, limitation of the state of the art techniques, how to empower subjects, users and controllers, and opportunities for research.
It was first discussed that several novels have anticipated the problem of data privacy, and nowadays many countries have adopted laws to protect data. A few principles are proposed to handle data: (1) only collect data that is needed that and keep it only as long as possible, (2) let the user give specific and explicit consent, and (3) limit collected data to some purpose, (4) the process should be open and transparent, (5) the ability to erase or rectify data, (6) protect data from security threats, (7) accountability, and (8) privacy should be in the design of the system.
But it is sometimes complicated to comply with these principles. It seems to be in conflict with the use of big data.
A solution is data anonymization. After we anonymize data, it may be easier to use the data for secondary uses. Thus a challenge is to create these anonymized big data sets.
Statistical disclosure control is a set of techniques to anonymize data. It is used to reduce the risk that data is re-identified. A goal is often to anonymize the data to reduce the risks of disclosure while preserving the usefulness of the data (utility).
On the other hand, privacy-first models ensure that the anonymized data meet some minimum requirements. One of the most famous approach is called “k-anonymity“.
Other approaches are “differential privacy” techniques.
Some challenges related to privacy forbig data is to ensure privacy in dynamic data (data streams). For big data, there are methods that anonymize data locally (e.g. by adding noise or generalization) before sending them to controller.
Some limitations of state-of-the-art techniques are as follows:
There was then some discussion of some proposals for privacy preserving big data analytics. I will not report all the details. The conclusions of the talk:
Day 4 – afternoon
In the afternoon, there was a PAKDD most influential paper award presentation on Extreme Support Vector Machine by Prof. Qing He, as well as the PAKDD 2019 Challenge Award presentation.
Overall, this was an excellent conference. It was well-organized. I met many researchers, listened to several interesting talks. Looking to PAKDD 2020 next year in Singapore.
This week, I have attended the 7th China International Technology Expo (CITE 2019), which was held at the Shenzhen Convention and Exhibition Center in the city of Shenzhen, China from the 9th to the 11th April 2019. In this blog post, I will give a brief overview of this fair, where various companies were showing their new products and services.
The event is organized as a fair, where companies have booths, separated by themes: (1) Smart Home, Smart City, Smart Terminal, (2) New Display, (3) Intelligent Manufacturing and 3D printing, (4) Robot and Intelligent Systems, (5) Artificial Intelligence and Intelligent Hardware, (6) IOT, Blockchain, Cyber security, (7) Automative electronics, battery, New energy, (8) Basic electronics, components, equipments and materials.
There was numerous Chinese companies as well as some international companies. And it was quite interesting to see the various products on display. The CITE 2019 fair is reasonably big but not as big as some other technology fairs in China such as the BIG DATA expo.
Below, I show some selected pictures from the CITE 2019 fair:
This was just a short blog post to give a glimpse of this event. I think it is quite interesting to attend such event to see what is happening in the industry. Hope you have enjoyed reading this blog post about CITE 2019. If you want to get notified about next blog posts, you can follow me on Twitter at@philfv.
Today, I have the pleasure to interview Rage Uday Kiran researcher at the National Institute of Informatics in Tokyo, Japan. R. Uday Kiran is an Indian researcher who has been working in Japan for several years. He has been active mainly in the field of data mining, and is a well-known researcher on the topic of discovering patterns in databases. He has taken the time to answer several questions for this interview.
you please give a brief overview of your most important contributions?
Frequent itemset mining is an important model in data mining. Its mining algorithms discover all itemsets in the data that satisfy the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions that an itemset must cover within the data. Since only a single minSup threshold is used for the entire data, the model implicitly assumes that all items within the data have uniform frequency. However, this is the seldom case in many real-world applications. In many applications, some items appear very frequently within the data, while others rarely appear. If the frequencies of items vary greatly, then we encounter the following two problems:
If minSup is set too high, we miss those itemsets that involve rare items in the data.
In order to find the itemsets that involve both frequent and rare items, we have to set minSup very low. However, this may cause a combinatorial explosion, producing too many itemsets, because those frequent items associate with one another in all possible ways and many of them are meaningless depending upon the user and/or application requirements.
This dilemma is known as the rare item problem. During
my PhD, I have tried to address this problem by developing frequent itemset models
based on multiple minimum supports.
Periodic itemsets are an important class of regularities that
exist within the data. Most previous studies have tried to find periodic
itemsets based on an implicit assumption that all transactions within the data
occur at a fixed time interval. However,
in many real-world applications, transactions occur irregularly within the
data. For the past few years, I am
developing models to discover different types of periodic itemsets in irregular
time series/temporal databases.
2) What do
you think are the key problems that remain to be solved in the field of pattern
1. Rare Item problem is still a major problem which needs to
be addressed in many pattern mining models.
measures, such as occupancy, have to be investigated to assess the
interestingness of an itemset.
3. Tuning is a common practice in pattern mining. So disk
based algorithms have been investigated to lower the operational cost.
3) What do
you expect to achieve in the next 5 years?
In the near future, IoT devices become the main source of data. The data generated by these IoTs is often large (petabytes of data) and typically have spatiotemporal characteristics. In the next few years, I would like to develop models that can extract useful information in spatiotemporal databases. In addition, I would like to investigate parallel and disk-based algorithms to find useful information in very large databases efficiently.
4) Do you
think that it is important to collaborate with the industry? What are the keys
to a successful collaboration?
Yes. I firmly believe it is important for an academician to collaborate with the industry persons. Industrial collaboration facilitates an academician to know the limitations of current research on a particular topic, thereby, enabling an academician to develop models and algorithms that can cater to the industrial requirements. Mutual trust, regular discussions and openness are crucial factors for a successful collaboration.
5) What is
the current state of data mining and artificial intelligence technology in
In my opinion, this is the hardest question to answer. Japanese
government has initiated a project, called Society 5.0, which is a human-centered
society that balances economic advancement with the resolution of social
problems by a system that highly integrates cyberspace and physical space. In
this context, most researchers in Japan are working on developing parallel deep
neural network algorithms that can analyze the real-world data
effectively. In my lab at the University
of Tokyo, researchers are working on language translation using deep neural
conferences do you like to attend? Why?
I generally wish to attend top international conferences
(e.g. KDD, CIKM, PAKDD, SSDBM, EDBT, DASFAA and DEXA). The reasons are as
follows : (1) To know about the hot research problems which are being addressed by the researchers.
(2) Interact with the speakers/authors to gain in-depth perception on the
interested topics. (3) Collaboration with fellow researchers working on similar
7) Do you
have some advices for young researchers?
Have an open mind. Read as many research papers as possible, and ensure that you are covering many topics. Try to get the grasp of implicit and explicit assumptions made by authors in every research paper. Carefully manage the time. Try to collaborate with the senior research students/persons in your lab.
Today, I will list a few useful mailing lists related to data mining and big data. Subscribing to these mailing list is useful for PhD students and researchers, as many jobs, conferences, special issues and other opportunities are advertised on these mailing lists. It is also good to post your own announcements for jobs, call for papers, etc.
Have you ever wanted to write an academic book or wondered what are the steps to write one? In this blog post, I will give an overview of the steps to write an academic book, and mention some lessons learned while writing my recent book on high utility pattern mining.
Step 1.Think about a good book idea. The first step for writing a book is to think about the topic of the book and who will bethe target audience. The topic should be something that will be interesting for an audience. If a book focuses on a topic that is too narrow or target a small audience, the impact may be less than if a more general topic is chosen or if a larger audience is targeted.
One should also think about the content of the book, evaluate how much time it would take to write the book, and think about the benefits of making the book versus spending that time to do something else. It is also important to determine the book type. There are three main types of academic books:
First, one may publish a textbook, reference book or handbook. Such book must be carefully planned and written in a structured way. The aim is to write a book that can be used for teaching or used as a reference by researchers and practitioners. Because such book must be well-organized, all chapters are often written by the same authors.
Second, one may publish an edited book, which is a collection of chapters written by different authors. In that case, the editors typically write one or two chapters and then ask other authors to write the remaining chapters.This is sometimes done by publishing a “call for chapters” online, which invite potential authors to submit a chapter proposal. Then, the editor evaluates the proposal and select some chapters for the book. Writing such book is generally less time-consuming than writing a whole book by oneself because the editors do not need to write all the chapters. However, a drawback of such book is that chapters may contain redundancy and have different writing styles. Thus, the book may be less consistent than a book entirely written by the same authors. A common type of edited book is also the conference or workshop proceedings.
Third, one may publish his Ph.D. thesis as a book if the thesis is well-written. In that case, one should be careful to choose a good publisher because several predatory publishers offer to publish theses with a very low quality control, while taking all the copyrights, and then selling the theses at very expensive prices.
Step 2. Submit a book proposal After finding a good idea for a book, the next step is to choose a publisher. Ideally, one should choose a famous publisher or a publisher that has a good reputation. This will give credibility to the book, and will help to convince potential authors to write chapters for the book if it is an edited book.
After choosing a publisher, one should write a book proposal and send it to the publisher. Several publishers have specific forms for submitting a book proposal, which can be found on their website or by contacting the publisher. A book proposal will request various information such as: (1) information about the authors or editors, (2) some sample chapter (if some have been written), (3) is there similar books on the market?, (4) who will be the primary and secondary audience?, (5) information about the conference or workshop if it is a proceedings book, (6) how many pages, illustrations and figures the book will contain?, (7) what is the expected completion date?, and (8) a short summary of your book idea and the chapter titles.
The book proposal will be evaluated by the publisher and if it is accepted, the publisher will ask to sign a contract. One should read the contract carefully and then sign it if it is satisfying.
Step 3. Write the book Then the next step is to write the book, which is generally the most time-consuming part. In the case of a book written all by the same authors, this can require a few months. But for an edited book, it can take much less time. Editor must still find authors for writing the chapters and perhaps also write a few chapters.
After the book have been written, it should be checked carefully for errors and consistency. A good idea is to ask peers to check the book to see if something need to be improved. For an edited book, a review process can be organized by recruiting reviewers to review each chapter. The editors should also spend some time to put all the chapters together and combine them in a book. This can take quite a lot of time, especially if the authors did not respect the required format. For this reason, it is important to give very clear instructions to authors with respect to the format of their chapters before they start writing.
Step 4. Submit the book the publisher After the book is written, it is submitted to the publisher. The publisher will check the content and the format and may offer other services such as creating a book index or revising the English. A publisher may take a month or two to process a book before publishing it.
Step 5. Promote the book After writing a book, it is important to promote it in an appropriate on the web, social media, or at academic conferences. This will ensure that the book is successful. Of course, if one choose a good publisher, the book will get more visibility.
Lessons learned This year, I published an edited book on high utility pattern mining with Springer. I followed all the above steps to edit that book. I first submitted a book proposal to Springer, which was accepted. Then, I signed the contract, and posted a call for chapters. I received several chapter proposals and also asked other researchers to write chapters. The writing part took a bit of time because although I edited the book, I still participated to the writing of six of the twelve chapters. Moreover, I also asked various people to review the chapters. Then, it took me about 2 weeks to put all the chapters together and fix the formatting issues. Overall, the whole process was done over about 1 year and half, but I spent perhaps 1 or 2 months of my time. Would I do it again? Yes, because I think it is a good for my career, and I have some other ideas for books.
The most important lesson that I learned is to give more clear instructions to authors to reduce formatting problems and other issues arising when putting all chapters together.
Conclusion In this blog post, I have discussed how to write an academic book. Hope you have learned something! Please share your comments below. Thanks for reading!
Today, I will discuss how to write a good research grant proposal. This topic is important for researchers, who are at the beginning of their careers and want to obtain funding for their research projects. A good research proposal can be career-changing as it may allow to secure considerable funding that may for example, help to obtain a promotion. On the other hand, a poorly prepared research proposal is likely to fail. To avoid writing a very long post on this topic, I will focus on the key points for writing a good project proposal.
Before writing a research grant proposal, the first step is preparation. Preparation should ideally start several weeks or months before the deadline. The reason is that writing a proposal takes time and that unexpected events may occur, which may delay the progress. Moreover, starting earlier allows to ask feedback from peers and to think more about the content and how to improve the proposal.
Another important aspect of preparation is to choose an appropriate funding program for the proposed research project.
The research question
A key aspect of preparing a research grant proposal is to choose a research question that will be addressed by the research project.
The key points to pay attention related to the research question are that: (1) the research question is new and relevant, (2) the research project is feasible within the time frame, using the proposed methodology and given the researcher(s)’s background and skills, and (3) the research project is expected to have an important impact (be useful). In the project proposal, the above elements (1), (2), and (3) need to be clearly explained to convince the reviewers that this project deserved to be funded.
Choosing a good research question takes time, but it is very important.
Another important part of a project proposal is the literature review, which should provide an overview of relevant and recent studies on the same topic. It is important that the literature review is critical (highlight the limitations of previous studies) with respect to the research question. Moreover, the literature review can be used to highlight the importance of the research question, and its potential impact and applications.
References should be up-to-date (preferably from the last five years). But older references can be included, especially if they are the most relevant.
A good proposal should also clearly explain the methodology that will be used, and the theoretical basis for using that methodology.
About carrying out experiments, one should explain how participants will be recruited and/or data will be obtained, how big the sample size will be(to ensure that results are significant), how results will be interpreted, and how do deal with issues related to ethics.
If a methodology is well-explained and detailed, it indicates that the researcher has a clear plan about how he will conduct the research. This is important to show that the project is feasible.
Timeline of the
To further convince reviewers that the project will succeed, it is important to also provide a clear timeline for the project. That timeline should indicate when each main task will be done, by who, and what will be the result or deliverables for each task. For example, one could say that during the first 6 months, a PhD student will do a literature review and write a journal paper, while another student will collect data, and so on.
The timeline can be represented visually. For example, I show below a timeline that I have used to apply for some national research funding. That project was a five year projects with three main tasks. I have divided the task clearly among several students.
Note that it is good to mention the names of the students or staff involved in the project, if the names are known. It can also be good to explain how the students will be recruited.
It is also useful to mention the equipment or facilities that are available at the institution where the researcher works, and that will help to carry the project.
Another very important aspect is to write clearly what will be the expected impact of the research project. The impact can be described in terms of advances in terms of knowledge, but also in terms of benefits to the society or economy. In other words, the proposal should explain why the project will be useful.
A project proposal needs to also include a budget that must follow the guidelines of the targeted funding source. It is important that the amounts of money are reasonable and justifications are provided to explain why the money is required.
A proposal should also explain that the applicant and its team have the suitable background and skills required for successfully conducting the project. This is done by describing the background and skills of researchers, and how they fit the project.
In this blog post, I have discussed the key points that a good research proposal should include. I could say more about this topic, but I wanted to not make it too long for a blog post.
Today, I will write a blog post aimed a young researchers, who want to know what is the work of a reviewer in academia, how to become a reviewer of international journals or conferences, and what are the benefits of being a reviewer.
What is the work of a
The main work of a reviewer is to read articles and evaluate if the content is suitable to be published. The role of reviewers is very important for the publication process in academia as it helps to filter bad papers and provide advice for improving other papers.
Toreview an article, a reviewer may spend a few minutes (e.g. when a paper is clearly bad or contain plagiarism, and is directly rejected by the reviewer) to several hours (when the paper is complex and read carefully by the reviewer) . Typically, in good journals and conferences, a paper will be evaluated by several reviewers as they may have different opinions and backgrounds, which help to take a fair decisions on whether to publish papers or not.
In general, the review process of top journals and conferences is quite effective at eliminating bad papers as those can recruit excellent reviewers. Smaller or less famous journals sometimes have more difficulty to find good reviewers related to the topics of papers. And sometimes, some bad manuscripts will once in a while pass through the review process due to various reasons. And for some predatory journals, they often do not have reviewers and will publish anything just to earn money.
What are the benefits
of being a reviewer?
The main benefits is to help the academic community by
providing feedback to publishers and authors.
The reviewers typically work for free. But sometimes publishers provides some gift to reviewers. For example, Elsevier had been offering a free one month subscription to one of its online service named Scopus to reviewers, while some top journals of Springer offer to download a free e-book from the Springer library after completing a review on time. Such offers may help to convince researchers to work as reviewers.
Some other benefits of reviewing are:
Read the latest research and learn about topics
that one would maybe not take the time to read otherwise.
Obtain some visibility in the research community.
Some conferences will for example publish the names of reviewers in conference proceedings or on their websites.
Learn to think like a reviewer, and become more
familiar with the review process of journals. This help to write better papers
and to know what to expect when submitting papers to the same conferences and
Put this on a CV. Especially for those aiming to
work in academia, being a reviewer of a good journal or conference can be useful
on a CV.
How to become a
A graduate student may start to review papers for his supervisor. In that case, the supervisor will let the student write the review and then the supervisor will check it carefully before the review is submitted to ensure that it is a good and fair review. This will give the opportunity to students to learn how to be a reviewer.
Sometimes, a PhD student may also find some opportunities to review papers by himself. For example, when I was a PhD student I visited some journal websites that advertised that they needed reviewers. I then sent an email with my CV to ask to be a reviewer. The journal was not famous, but it gave me some experience to start review some papers.
But generally, most journals will contact potential reviewers rather than the other way. Generally, when a paper is submitted, the journal editor will search for reviewers that have papers on the same topic or work in a related area, to ask them to review the paper. Thus, reviewers of a journal paper are often expert in the field. Often, editors will prefer to ask researchers who have previously published in the same journal, to review papers. Typically a reviewer, may have a PhD or at least be a PhD student with good publications. In some cases, a master degree students may be asked to review papers. I have seen it once or twice. But this is quite rare and these students had very strong publications.
For a conference, there is typically a program committee that is established. It is a set of researchers that will review the submitted papers. Each reviewer may for example have to review 3 to 5 papers. To join the program committee of a conference, one may email the organizers to ask if they need additional reviewers. Then, the organizers may accept if the applicant has a good CV. But for famous conferences, joining the program committee typically require to be recommended by a conference organizer or some members of the program committee. It is generally not easy to join the program committee (to be a reviewer) of top conferences.
Drawback of being a
One of the drawback of being a reviewer is that it can require some considerable amount of time. For example, I receive numerous emails to ask me to review journal papers for various journals and on a wide range of topics. Although I do several reviews every month, I also decline several invitations because of time constraints, and that I just receive too many invitations. I will for instance often decline to review papers not in my research area or that are for unknown journals, or journals not related to my research area.
How to review a paper?
If you have been selected to review papers, you may be interested to read a blog post that I wrote about how to review papers.
Today, I have discussed about the work of reviewers and how to become a reviewer. If you have comments, please share them in the comment section below. I will be happy to read them.
In this blog post, I will talk about an important concept, which is often overlooked in data mining and machine learning: explainability.
To discuss this topic, it is necessary to first remember what is the goal of data mining and machine learning. The goal of data mining is to extract models, knowledge, or patterns from data that can help to understand the data and make predictions. There are various types of data mining techniques such as clustering, pattern mining, classification, and outlier detection. The goal of machine learning is more general. It is to build software that can automatically learn to do some tasks. For example, a program can be trained to recognize handwritten characters, play chess, or to explore a virtual world. Generally, data mining can be viewed as a field of research that is overlapping with machine learning and statistics.
Machine learning and data mining techniques can be unsupervised (do not require labelled data to learn models or extract patterns from data) or supervised (labelled data is needed).
In general, the outcome of data mining or machine learning can be evaluated to determine if something useful is obtained by applying these techniques. For example, a handwritten character recognition model may be evaluated in terms of its accuracy (number of characters correctly identified divided by the number of characters to be recognized) or using other measures. By using evaluation measures, a model can be fine-tuned or several models can be compared to choose the best one.
In data mining and machine learning, several techniques work
as black-boxes. A black box model
can be said to be a software module that takes an input and produces an output
but does not let the user understand the process that was applied to obtain the
Some examples of blackbox models are neural networks. Several neural networks may provide a very high accuracy for tasks such as face recognition but will not let the user easily understand how the model makes predictions. This is not true for all models, but as neural networks become more complex, it becomes more and more difficult to understand them. The opposite is glassbox models, which let the user understand the process used to generate an output. An example of glass box models are decision trees. If a decision tree is not too big, it can be easy to understand how it makes its predictions. Although such models may yield a lower accuracy than some blackbox models, glassbox models are easily understood by humans. In data mining, another example of explainable models are patterns extracted by pattern mining algorithms.
A glassbox model is thus said to be explainable. Explanability means that a model or knowledge extracted by data mining or machine learning can be understood by humans. In many real world applications, explanability is important. For example, a marketing expert may want to apply data mining techniques on customer data to understand the behavior of customers. Then, he may use the learned knowledge to take some marketing decisions or to design a new product. Another example is when data mining techniques are used in a criminal case. If a model predicts that someone is the author of an anonymous text containing threats, then it may be required to explain how this prediction was made to be able to use this model as an evidence in a court.
On the other hand, there are also several applications where explanability is not important. For example, a software program that do face recognition can be very useful even though how it works may not be easily understandable.
Nowadays, many data mining or machine learning models are not explainable. There is thus an important research opportunity to build explainable models. If we build explainable models, a user can participate in the decision process of machines and learn from the obtained models. On the other hand, if a model is not explainable, a user may be left out of the decision process. This thus raises the question of whether machines should be trusted to make decisions without human intervention?
In this blog post, I have described the concept of explainability. What is your opinion about it? You can share your opinion in the comment section below.
In this blog post, I will list a few interesting and recent books on the topic of pattern mining (discovering interesting patterns in data). This mainly lists books from the last 5 years.
High utility pattern mining: Theory, Applications and algorithms (2019). This is the most recent book, edited by me. It is about probably the hottest topic in pattern mining right now, which is high utility pattern mining. The book contains 12 chapters written by experts from this field about discovering different kinds of high utility patterns in data. It gives a good introduction to the field, as it contains five survey papers, and also describe some of the latest research. Link: https://link.springer.com/book/10.1007/978-3-030-04921-8
Supervised Descriptive Pattern Mining (2018). A book that focuses on techniques for mining descriptive patterns such as emerging patterns, contrast patterns, class association rules, and subgroup discovery, which are other important techniques in pattern mining. https://link.springer.com/book/10.1007/978-3-319-98140-6
That is all I wanted to write for today. If you know about some other good books related to pattern mining that have been published in recent years, please let me know and I will add them to this list. Also, I am looking forward to edit another book related to pattern mining soon…. What would be a good topic? If you have some suggestions, please let me know in the comment section below!