Recently, I found that K. Singh, Shashank Sheshar Singh, Ajay Kumar, and Bhaskar Biswas from the Indian Institute of Technology (BHU) (India) have plagiarized my papers in a paper published in the IEEE TKDE (Transactions on Knowledge and Data Engineering ) journal. I will explain this case of plagiarism below.
*** Important notice: Note that “Kuldeep Singh” is a very common name. This article refers to K. Singh working at BHU University in Varanasi, India. This is not about other people with the same name working in Europe or other locations ***
But before let me explain what is plagiarism. There are two types. First, some people will copy some text word for word from another paper without using quotation marks and a citation. Journal editors can easily detect this using automatic tools like CrossCheck. Second, some people will be more careful. They will copy the ideas of another paper without citations and will rewrite the text to avoid being detected. They will then take the credit for the ideas developed by another researcher. Most of the time reviewers of top journals will detect this but sometimes it will go undetected. This is what happened in the TKDE paper that I will talk about today. That paper is:
Kuldeep Singh, Shashank Sheshar Singh, Ajay Kumar, and Bhaskar Biswas (2018) CHN: an efficient algorithm for mining closed high utility itemsets with negative utility”, IEEE Transaction on Knowledge and Data Engineering.
What is wrong with that paper?
That paper actually proposed a new algorithm called CHN for discovering closed high utility itemsets with negative utility values. In that paper, they extended the EFIM-Closed algorithm that I had proposed at the MLDM 2016 conference, but they did not mention it in the paper. Basically, they copied several techniques from my EFIM and EFIM-Closed papers without mentioning that they were reusing these ideas. They even renamed some of these techniques (e.g. the “utility-bin”) with a different name (e.g. utility array) and rewrote the text. Thus, it appears as Kuldeep Singh et al. proposed several of the techniques of EFIM-Closed, which is unacceptable. Some of the techniques have been adapted in the paper for the different problem, there is a citation for some upper-bounds, but some techniques are exactly the same and not cited.
What has been plagiarized?
I will list the content that has been plagiarized in the paper and provide screenshots of a side-by-side comparison of the papers.
1) In page 4 of the paper of Kuldeep Singh et al., they copy several definitions such as Property 3.1 and Property 3.2 from our FHN paper in KBS 2016.
2) In Section 4.1, they present two techniques: (1) transaction merging and (2) database projection. But those are the same as in the EFIM-Closed paper. The authors rewrote the text. They mentioned that they could reuse a sorting technique from EFIM-Closed but failed to explain that basically all the idea in this section is copied from EFIM-Closed and unchanged from our paper!
3) In Section 4.2, they pretend to use a new technique called “utility-array” to calculate the utility, support and upper-bound of patterns. But basically, they just renamed the “utility-bin” technique of EFIM-Closed to “utility array” and rewrote the text. They copied the idea without citation and then used it to calculate utility and support in the same way, but also some other upper-bounds.
4) In Section 4.4, the techniques for finding closed patterns are all copied from the EFIM-Closed papers without modifications. EFIM-Closed proposed to use backward/forward extension checking in utility mining, by drawing inspiration from sequential pattern mining. Kuldeep Singh et al. rewrote the text and claimed that they were the first to do that and just cited the sequential pattern mining paper that we cited in our paper.
5) In Section 4.5, they present their CHN algorithm that incorporates the copied techniques and also some other modifications. But the pseudocode is very similar to EFIM-Closed since they extend that algorithm. But they never explain that they extend EFIM-Closed as the basis of their algorithm.
6) The following figure look quite familiar?
7) Besides, it is interesting that in Section 4.2, the authors claimed to have proposed a new RTWU upper-bound, while in Section 3 they had already acknowledged that it was from another paper! It is actually from our FHN paper.
So is there any new contribution in that TKDE paper?
To answer that question, I decide to search a little bit more, and I found that the authors had proposed an algorithm for high utility mining with negative utility called EHIN in the Expert Systems journal also in 2018:
Singh, K., Shakya, H. K., Singh, A., & Biswas, B. (2018). Mining of high-utility itemsets with negative utility. Expert Systems, e12296. doi:10.1111/exsy.12296
So what is the difference between the two papers of
Kuldeep Singh et al. ? The only difference is the technique for checking that an itemset is closed using forward and backward extensions. But as I have shown before, this technique is copied from our EFIM-Closed paper in section 4.4 without citations. Thus, there is basically nothing new in the TKDE paper.
Now another question is whether Kuldeep Singh et al. cite their Expert System paper correctly? They put a citation (see below), but they also do not explain that the TKDE paper is almost the same as their Expert System paper.
Who are the authors?
Kuldeep Singh, Shashank Sheshar Singh, Ajay Kumar, and Bhaskar Biswas are working for the Computer Science and Engg, of the Indian Institute of Technology (BHU), Varanasi, India 2210
What will happen?
As usual, when I find some case of plagiarism, I report it to the journal. I have thus sent an e-mail to the editor of TKDE to report that case of plagiarism, and filled a formal complaint to IEEE to ask that they retract the paper, as soon as possible.
I also sent an e-mail to the dean of the IIT (BHU) to let them know about what happened, as this also concern their institution.
What are the lessons to be learned? In general, there is no problem for a researcher to extend the algorithm of another researcher. It happens all the time. However, the problem is when a researcher copies the ideas of another algorithm without citing these ideas, to take credit for these ideas. This is what Kuldeep Singh et al. did in that TKDE paper. They have extended EFIM-Closed with a few ideas to support negative utility values. That would have been fine, if this had been explained. However, the authors rather chose to copy several techniques without citing them and mentioning that EFIM-Closed was extended.
Another lesson is that plagiarism can happen even in the top journals (TKDE is among the top 5 journals in data mining).
Hope you have learned something from this blog post. That is all for today.
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 100 algorithms for pattern mining.