In this blog post, I will explain a simple way of transforming a Latex document to HTML. Why doing this? There are many reasons. For example, you may have formatted some text in Latex and would like to quickly integrate it in a webpage.
The wrong way
First, there is a wrong way of doing this. It is to first create a PDF from your Latex document, and then use a tool to convert from PDF to HTML. If you try this and the document is even just slightly complex, the result may be very bad… and the HTML code may be horrible with many unecessary tags.
The good way
Thus, the best way to convert Latex to HTML is to use some dedicated tool. There are several free tools, but many are designed to run on Linux. If you are using Windows, it may thus take you some time to find the right tool.
Luckily the popular Latex distributions like MikTek and TexLive include an executable of a softwate to convert from Latex to HTML that works on Windows. Thus, if you have the full TexLive distribution, you do not need to download or install anything else. Below, I will describe how to do with TexLive on Windows.
Using with TexLive on Windows
First, you need to open the command line and go to the directory containing your Latex document. Let say that your Latex document is called article.tex. Then, you can run this command:
The result will be a new file article.html
The result is usually quite good. For example, I have converted a research paper that I wrote about high utility episode mining and the results looks like this:
I would say that 90 % of the paper was converted correctly. There is some other parts that I have not shown like some pseudocode for some algorithms that were not formatted properly. But I would say that the conversion is on overall really good.
In this blog post, I have shown a simple way of converting Latex to HTML on Windows using the TexLive distribution. If you are using MikTex or Linux, similar commands can be used.
Hi all, I have not written much on the blog in the last month because I have been very busy with numerous projects and deadlines. I thus took a small break to focus on other things. Now that I have more time, I will start to write more often on the blog, as before.
Periodic pattern mining consists of discovering patterns that appear regularly over time in data. For example, by analyzing customer data, one may find that a periodic pattern is that many customers buy wine and cheese together every week.
This book will aim at providing an introduction to periodic pattern mining and also showcase some recent research on that topic. Thus, chapters may take the form of a survey paper or that of a research paper.
If you are interested to participate, you can find a copy of the detailled call for chapters here, and submit a short chapter proposal no latter than October 1st.
The deadlines are as follows:
Chapter proposal deadline:
1st October 2020
Proposal acceptance date:
10th October 2020
Full chapter submission deadline:
15th January 2021
Planned publication date:
1st July 2021
Looking forward to see your submissions on this topic!
By the way, if you want to know more about periodic pattern mining, you can also check other resources:
A few of my algorithms for periodic pattern mining are offered in the SPMF software, which also offers algorithms for many other types of patterns like sequential patterns, frequent itemsets, episodes association rules, etc.
In this blog post, I will talk about a recent case of serious academic misconduct by Sandeep Kautish from LBEF / APU that I experienced when submitting a book proposal to CRC Press. In that book proposal, I am a collaborator (co-editor). The full story is below.
On June 4th 2020morning, we submitted a book proposal to Prof. Sandeep Kautish, editor of a new book series called “Advances in Informatics and Information Systems Engineering” for CRC Press to propose a book related to artificial intelligence. We submitted to him because he previously made a call for book proposals.
Then, in the afternoon, we received the following e-mail from Sandeep Kautish:
FROM: CRC Editor-AIISE <firstname.lastname@example.org> TO: +++++, +++++, +++++, +++++ 4th June, 14 h 46 Dear All, Congratulations on the nicely drafted proposal. Also,I wish to get the consent of you allto add myself(Prof. Dr. Sandeep Kautish, Series Editor CRC Press) as 5th Editor in the proposal.I am Series Editor of 3 (three) book series of CRC Press with over 30 books in production and have been the editor of more than five Elsevier, Springer, and IGI Global books (one Elsevier and one Springer already going on). My brief biography is given below –
Dr. Sandeep Kautish is working as Professor & Dean-Academics with LBEF Campus, Kathmandu Nepal running in academic collaboration with Asia Pacific University of Technology & Innovation Malaysia.
(…) – Series Editor Advances in Informatics and Information Systems Engineering CRC Press (Taylor & Francis Group)
Thus, the book series editor Sandeep Kautish acknowledged receiving our proposal, said that it is a good proposal. But he told us that he wants to add himself as a co-editor of our book (!) This is totally unacceptable and inappropriate, as he did not write a single word of our proposal. And it is a clear conflic of interest.
We don’t have any reason to add him as co-editor. We don’t know him and he directly asks to put his name on our proposal that he did not write. And obviously, the purpose of this message is to make us feel that if we do not accept, he will reject the proposal and not transfer it to CRC Press. And if there is doubts about that, it has been confirmed in the next e-mail and phone call.
Now, since we cannot accept such behavior, one of the member of our proposal told him that we will not add him to the book proposal on the phone. Then, because of this he wrote another e-mail a few hours later to reject the proposal that he once thought was a good proposal:
FROM: CRC Editor-AIISE <email@example.com> TO: +++++, +++++, +++++, +++++ 4th June, 19 h 27 Dear all, Based on my discussion with _________ over a phone call, I have decided not to process and acceptthe said Proposal under my series.
It was latter confirmed to me that he was very angry over the phone that we did not accept to put his name on our proposal. This is really unprofessional and unethical.
A book series editor should never ask to be put as co-editor of books that are proposed in his series, that he did not wrote, and as a condition to process the proposal. It is a very serious case of academic misconduct. And I am sure that this is not the policy of CRC Press, either. Thus, I will also fill a complaint to CRC Press about this so that he does not try to bully other researchers that are in weak positions into putting his name on their books.
I have previously published a few books with Springer and never had to face such bad behavior from a book series editor. In fact, I would never have imagine that this could have happened when submitting to an editor like CRC Press, which is a decent publisher.
Who is Sandeep Kautish?
So you may now wonder who is Sandeep Kautish? He is an Indian researcher who is professor and dean with of some small department called LBEF Campus in Kathmandu Nepal for the Asia Pacific University of Technology & Innovation (APU). This is his webpage: http://apiitmalaysia.academia.edu/DrSandeepKautish and his other webpage: https://www.lbef.org/profile/dr-sandeep-kautish/ and this is his official e-mail: firstname.lastname@example.org
As I see from the webpage of Sandeep Kautish, he does not seems to be a strong researcher. He has about 100 citations in Google Scholar. Thus, I think that CRC Press maybe made a mistake when appointing him to such position as book series editor, and as we discovered he decided to take advantage of this to try to bully people into putting his name on their books. Why? I guess the reason must be to obtain a promotion or such things.
Update 1: More cases of academic extortionby Sandeep Kautish
2020-6-4 1:00 PM. About two hours after publishing this post, someone else has privately contacted me to inform me that SandeepKautish has done the same thing to them for another book proposal with CRC Press. They also did not give up to the bullying tactics and refused to add him as co-editor of their book.
2020-6-5 2.50 PM. Then, two more researchers have come up to talk with me privately to tell me about some bad experiences that they also had with SandeepKautish related to bullying for book proposals. The first one told me that such things happened about 10 more times to people that he knows. Here is some excerpt from that discussion:
He told me much more than this but I just show some key points and I hide some information to preserve anonymity of that person. The second one told me that he had more or less the same experience as me with SandeepKautish for a book proposal a while ago. Below I just show some small part of what he told me.
Update 2: Complaint to CRC press
2020-6-5: After complaining to CRC Press, they have answered me very quickly and offered to reconsider our book proposal for another book series. They also told me that they think that this is unacceptable, it is not their policy and that they were also very surprised. CRC Press have been very nice and professional and I am happy that they are now investigating this to take quick action to solve this problem. I have known CRC press for a long time (used some of their textbook for teaching and published with them before). The problem that I faced here is with a book series editor working for them. But it will not change my overall opinion that CRC is a good publisher.
Update 3: Action from CRC press
2020-6-11: Today I saw the message that after research, evaluation and deliberation, CRC Press has decided tocancel the three following series by Sandeep Kautish, Dr. Pradeep N, Dr. Sountharrajan, and Dr. J. Amudhavel:
Innovations in Computational Approaches with Machine Intelligence
Advances in Informatics and Information Systems Engineering Series
Engineering Reflections on Pandemics and Sustainable Solutions for COVID-19
I am very happy about this quick action from CRC. It will ensure that the same situation will not happen again to other researchers in the future from these people!
In this blog post, I have shared a case of highly unethical behavior in academia by an Indian researcher named SandeepKautish who works at APU / LBEF. As always, in such case, the best solution is to file a complaint and make the story public otherwise such things will continue to happen. I have previously reported some other cases of academic misconducts, that you may be interested to read:
Each dataset has two versions: (1) sequences of words and (2) sequences of Part-of-Speeches (POS) tags.
The authors and total number of words/sentences in the corpus of each author is as follows: Catharine Traill (276,829/ 6,588), Emerson Hough (295,166/ 15,643), Henry Addams (447,337/ 14,356), Herman Melville (208,662/ 8,203), Jacob Abbott (179,874/ 5,804), Louisa May Alcott (220,775/ 7,769), Lydia Maria Child (369,222/ 15,159), Margaret Fuller (347,303/ 11,254), Stephen Crane (214,368/ 12,177), and Thornton W. Burgess (55,916/ 2,950).
Datasets (books) in SPMF format
Datasets in SPMF format (with item names) – can be used with the GUI of SPMF
Original books as text
– A Tale of The Rice Lake Plains (words / POS) -Lost in the Backwoods (words / POS) – The Backwoods of Canada (words / POS)
– A Tale of The Rice Lake Plains (words / POS) -Lost in the Backwoods (words / POS) – The Backwoods of Canada (words / POS)
– A Tale of The Rice Lake Plains (words / POS) -Lost in the Backwoods (words / POS) – The Backwoods of Canada (words / POS)
In this report, I will talk about the 24th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2020), from the 11th to 14th May 2020.
The PAKDD conference
PAKDD is a top international conference on data mining / big data in the Pacific-Asia part of the world. I have attended this conference times and written reports about several editions of the conference. If you are interested, you can read these reports here: PAKDD 2014, PAKDD 2015, PAKDD 2017, PAKDD 2018 and PAKDD 2019.
As usual, the conference proceedings of PAKDD 2020 are published by Springer in the Lectures Notes on Artificial Intelligence (LNAI) series. This ensures that the proceedings are indexed in DBLP and other major indexes, and gives good visibility to papers.
This year, there was 628 submissions to PAKDD 2020. From those, 135 papers have been accepted, which means an acceptance rate of 21.5%.
The conference went online
This year, the PAKDD 2020 conference was planned to be held in Singapore. But due to the unforeseen COVID-19 virus pandemic around the world, the PAKDD 2020 conference was held online instead. Part of the registration fee was re-imbursed to the authors because organizers saved money by doing the conference online. And of course, since the conference was online, all social events like banquet, reception were cancelled.
All authors were asked to submit a pre-recorded 13 minute video of their paper in 720p resolution with their slides, before the conference. Then during the conference, authors had to be available to answer questions online after the presentation of their paper. Thus, each paper was alloted a total of 17 minutes. This is somewhat less than previous years where long presentations had about 30 minutes, if I remember well.
The conference could be accessed through the Zoom online meeting system. To attend the different sessions, a password was required, which was made available to registered attendees.
Some video ettiquette tips were given to authors
As for proceedings, since the conference was online, proceedings were made for download from the conference website in PDF format.
Day 1 – Tutorials and workshop day
On the first day, there was 5 workshops and 2 tutorials.
I first went to have a look at the literature based discovery workshop using Zoom. There was about 22 persons in that workshop at 9:26 AM, watching this presentation about using evolutionary algorithms for matching biodemical ontologies.
Then, I popped in the Data Science for Fake News workshop at 9:40 AM to see how it was. Although, it was supposed to start at 9 AM, the workshop had not started. Using the chatroom, I asked and was answered that it was delayed until 10 AM (perhaps some technical problem or someone missing due to time zones?).
Thus, I went next to check the Game Intelligence & Informaticsworkshop at 9:50. There was about 11 persons watching the presentations at 9:47 AM. Game intelligence is a quite interesting topic. Here is a screenshot from that workshop, where game strategies were analyzed:
Then, at 9:57AM I went to have a quick look at the Tutorial on Deep Explanations in Machine Learning via Interpretable Visual Methods, which was in the fourth parallel session. There was about 44 persons watching it, so it seemed to be the most popular session. This topic is interesting as neural networks can be very effective but are mostly black-box models . In that tutorial, they talked about how to interpret such models, and they also discussed some other ways of interepreting knowledge in data mining such as how to visualize association rules (screenshot below).
So far, all of this was quite interesting. And there was some good questions in the sessions that I have attended.
In the afternoon at 2PM, I attended the 9th Workshop on Biologically Inspired Data Mining(BDM 2020). This is a workshop that has been running for many years at PAKDD, that I personally like as it cover various topics such as genetic algorithms, particle swarm optimization (PSO), ant colony optimization, and also applications of such algorithms. There was about 18 persons attending the workshop at 2:11 PM. First, the organizer Shafiq Alam gave an overview of the motivations for biologically inspired data mining by explaining that optimization algorithms like genetic algorithms can be used to quickly find an approximate solutions to hard problems, if we can accept to lose a little bit about the accuracy. Then, some results were about using PSO for clustering and recommendation. Then, there was some paper presentations, and a discussion about current trends.
At the same time in the afternoon, there was a Tutorial on deep Bayesian network that had about 31 attendees at 2:19 PM, and a workshop on Learning Data Representations for clustering, which had about 14 attendees at 14:21 PM. Overall, it seems that the tutorials were the most popular sessions during this first day.
At 8:30 to 9:00 AM, there was the conference opening. There was about 59 persons in that session at 8:58 AM. Some awards were announced:
It was followed by a keynote from Prof. Bing Liu about open-world AI and “continual learning”, which discusssed the need for software that can continuously learn. Here are a few slides:
This was followed by two Industry talks, one by Ussama Fayyad and another by Ankur Teredesai. Below is a few slides from the talk of A. Teredesai about AI for health, which was watche. He discussed how data mining and AI can help for healthcare. In particular, he talked about epidemiological models for diseases such as COVID-19. At 11:18 AM, there was about 27 persons in that session. That talk interesting but there was some internet connection problems at some point such that the audio was hard to hear for a few minutes. But then, it was OK.
Then, in the afternoon, there was paper presentations.
On the morning 8:30 AM, there was a keynote talk by Inderjit S. Dhillon about multi-output prediction. There was about 42 persons watching at 8:51 AM. Here is a screenshot of that talk:
In the afternoon, there was a keynote talk by Prof. Samuel Kaski titled “Data Analysis with Humans” about how humans can participate in the machine learning process. There was about 34 persons attending the talk at 2:08 PM. He first illustrated that different problems (and method) require different levels of human intervention.
Generally, the user can participate in different ways in the machine learning of data mining process.
First the user can be a passive data source. Second the user can participate more actively in the process of machine learning or data mining to guide the software program.
Here is a slide from approach 1).
And here is a slide from approach 2), where the user guides than AI program towards a solution.
Then, there was more slides and details but I did not take note of everything.
Then, after that there was more paper presentations.
On Day 3, there was the most influential paper talk, a keynote talk by Prof. Jure Leskovec in the afternoon, and more paper presentations.
Papers about pattern mining
Now I will talk a little bit about papers related to pattern mining, as it is one of my topics of interest. I presented a paper about a new algorithm named LTHUI-Miner to discover high utility itemsets that are trending in non predefined time periods in customer transaction databases. This is the work of my master degree student:
Also another paper related to pattern mining that was published in PAKDD this year is about discovering frequent subsequences in a set of sequences using an algorithm called Tree-Miner:
Tree-Miner: Mining sequential patterns from SP-Tree. Redwan Ahmed Rizvee (University of Dhaka), Chowdhury Farhan Ahmed (University of Dhaka), Mohammad Fahim Arefin (University of Dhaka)
Is this online format a success?
Overall, the online format of this conference is fine. But I miss the social activities of an offline conference like the coffee breaks, where we can talk with other researchers to exchange ideas and meet new people. For me, this is perhaps the most interesting parts of a conference. For me, this is one of the most interesting aspects of a conference.
Also, as a suggestion, it would have been nice if there was a playback feature to watch presentations that we have missed. In my case, I am in the same time zone as Singapore so it was convenient for me to watch the presentations, but I can imagine that people from some other countries (e.g. some part of Canada with a 12 hours time difference) would have a harder time to watch some presentations.
Special journal issues
Some papers were invited for a special issue in the JDSA journal. This is always interesting to be invited in a special issue. However, although this journal is published by Springer, a problem is that this journal is still quite new, and as such it is to my knowledge not indexed in databases like SCI or EI. In some countries like where I work, this is important and papers not indexed do not have so much value. So for this reason I had to decline the invitation to extend my paper. I would have prefered to be invited in a special issue in a more established journal.
In the call for papers, there was also a mention that some papers would be invited for an issue in the KAIS journal. This is a quite good journal, but apparently it was only for the few very best papers.
Overall, it was an interesting conference. Due to the virus situation, the conference was held online. The organizers manage to organize the conference very well in this situation. Looking foward to PAKDD 2021 next year.
Hi all, this is to announce that a new textbook in Thai has been published about pattern mining, which includes many examples using the SPMF software. The textbook named “Pattern Mining: Theory and Practice” is written by teacher Panida Songram from Mahasarakham University (Thailand) and can be used for teaching or self-learning, for students or practitionners. I have known the auhor for many years and I am very happy that she let me host a copy of the book that you can download from this link: Pattern Mining: Theory and Pratice (PDF, 14.2 MB),
The book gives a good coverage of pattern mining. It explains algorithms but also contains many practical examples about how to use SPMF. Some key topics in the book are itemset mining, sequential pattern mining and multi-dimensional sequential pattern mining.
That is all I wanted to share for today. If you can read Thai, I highly recommend to download this book. 😉
Today, I want to share with you the video presentation that I have prepared for my paper at PAKDD 2020. It presents a new problem where we want to discover locally trending high utility itemsets (LTHUIs). A LTHUI is a set of items purchased by customers that are trending (generate money that follows an upward or downward trend during some non predefined time periods. It is a variation of the popular high utility itemset mining problem.
Hope you will enjoy this video! If you want more details about this topic, you can read this paper:
Many researchers or students want to be successful researchers in their field. For this they make many sacrifices such as working long hours at the lab every day from morning to the evening. This is important because honestly, success comes with hard work. But it is important to still keep a good life balance to stay healthy. In this blog post, I will talk about the importance of having good life and work habits for researchers.
First let me tell you a bit about my story. Since the start of my graduate studies, I have worked countless hours to improve myself. For example, during my master degree and Ph.D. studies, I would basically not take any rest during the whole year, and work maybe 12 hours every day. That has allowed me to be successful in my field, receive big grants during my studies, publish many papers, and then to land some good jobs in academia. Nowadays, as I have a familly, I cannot work as much as when I was a student, but I still work hard, and I am much more efficient that I was before due to the skills that I have gained. For example, I can write a paper much more quickly. I still work very late at night almost every day.
Health is important
Now, what I have learnt over the year is that working is not everything. Health is also very important. Working for long hours at the lab can eventually bring several health problems like pains in the wrist, neck, back problem, and eye problems. Luckily, I do not have any major problems, but it is something to be awared of, as problems will typically appear later down the road.
First, it is important to eat healty food.
Second, it is important to have a good posture while working. For example, it is worthy to find a good chair for working and to adjust the height of the table, screen and to have some appropriate mouse and keyboard, to be comfortable.
Third, it is important to avoid sitting for a too long time, and to sometimes rest your eyes. Several studies have shown that sitting for long periods of time may lead to various diseases. Thus, every hour, it is good to stand up and go for a walk for a few minutes, for example.
Fourth, it is equally important to do some exercise every week. Even doing a few hours of exercise every like running, swimming or playing badminton can make you feel better. I personally like to go run for 30 minutes to an hour every day.
Also, if you are tired or are always siting on a chair, you may consider working in a standing position. I have recently started to do this, and it really feels great. I even wonder why I have not done this before! It is very good for the posture and the back. Here is a picture of my setup at home:
Some people recommend to alternate between a standing and sitting position to avoid getting tired. But personally, I have no problem working for several hours in a standing position. If you dont have a support like mine on the picture, you could as well use some boxes to raise your computer higher.
Another good advice is that if you are working on a laptop, you should consider using an external screen or external keyboard. The reason is that if you put your laptop low, then the keyboard will be perhaps at an appropriate height but the screen will be too low and you will have to bend your neck. But on the other hand, if you put your laptop higher the screen will be at an appropriate height for your eyes but the keyboard will be too high. Thus, using an external screen or keyboard can solve this problem.
In this blog post, I have discussed about the importance of having some good life habits to be a healthy researcher and avoid health problems later in life. If you have some other suggestions related to this, please post them in the comment section below!
Today, I will write a short blog post just to give a list of some common errors that I observed recently in some journal and conference research papers:
Using a reference number as the subject of a verb. For example, “ proposed an algorithm” should be written as “Smith et al.  proposed an algorithm”.
When there is a shorter way of writing something, it should be used. For example, “in order to” should be replaced by “to“. Another example: “this new type of algorithm is” can be replaced by “this new algorithm type is“. Similarly, “A is an extension of B” can be replaced by “A extends B“. One should write concisely.
The title of a paper is too long. I recommend to not have more than 10 words, and preferably less. I recently read a paper having a title with more than 20 words!
Using too much the word “we”. Generally, it is better to avoid using “we” as much as possible.
Using the words “you” or “I”. These words should never appear in a research paper.
I could say much more about this. Indeed, you can look at my other blog posts about writing research papers for more information. But my goal was just to remind you about some common errors!