The Semantic Web and why it failed.

In this blog post, I will talk about the vision of the Semantic Web that was proposed in the years 2000s, and why it failed. Then, I will talk about how it has been replaced today by the use of data mining and machine learning techniques.

What is the Semantic Web?

The concept of Semantic Web was proposed as an improved version of the World Wide Web. The goal was to create a Web where intelligent agents would be able to understand the content of webpages to provides useful services to humans or interact with other intelligent agents.

To achieve this goal, there was however a big challenge. It is that most of the data on the Web is unstructured, as text. Thus, it was considered difficult to design software that can understand and process this data to do some meaningful tasks.

Hence, a vision of the Semantic Web that was proposed in the years 2000s was to use various languages to add metadata to webpages that would then allow machines to understand the content of webpages and do reasoning on this content.

Several languages were designed such as RDF, OWL-Lite, OWL-DL, OWL-FULL and also some query languages like SparQL.  The knowledge described using these languages is called ontologies. The idea of an ontology is to describe various concepts occurring in a document at a very high level such as car, truck, and computer, and then to link these concepts to various webpages or resources. Then, based on these ontologies, a software program could use reasoning engines to reason about the knowledge in webpage and perform various tasks based on this knowledge such as finding all car dealers in a city that sell second-hand blue cars.

The fundamental problems of the Semantic Web

So what was wrong with this vision of the Semantic Web?  Many things:

  • The languages for encoding metadata were too complex. Moreover, encoding metadata was time-consuming and prone to errors. The proposed languages for adding metadata to webpages and resources were difficult to use. Despite the availability of some authoring tools, describing knowledge was not easy. I have learned to use OWL and RDF during my studies, and it was complicated as OWL is actually based on formal logics. Thus, learning OWL required a training and it is very easy to use the language in a wrong way if we don’t understand the semantics of the provided operators.  It was thus wrong to think that such a complicated language could be used at a large scale on the Web. Also because such languages are complicated, they are prone to errors.
  • The reasoning engines based on logics were slow and could not scale to the size of the Web. Languages like OWL are based on logic, and in particular description logics. Why? The idea was that it would allow to use inference engines to do logical reasoning on the knowledge found in the webpages. However, most of these inference engines are very slow. In my master thesis in the years 2000s, reasoning on an OWL file with a few hundred concepts using the state-of-the-art inference engines was already slow. It could clearly not scale to the size of the Web with billions of webpages.
  • Languages were very restrictive. Another problems is that since some of these languages were based on logics, they were very restrictive. To describe some very simple knowledge it would work fine. But to describe something complicated, it was actually very hard to model something properly. And many times the language would not just not allow to describe something.
  • Metadata are not neutral, and can be easily tweaked to “game” the system.  The concept of adding metadata to describe objects can work in a controlled environment such as to describing books in a library. However, on the Web, bad people can try to game the system by writing incorrect metadata. For example, a website could write incorrect metadata to achieve a higher ranking in search engines. Based on this, it is clear that adding metadata to webpages cannot work. This is actually the reason why most search engines today do not rely much on metadata to index documents.
  • Metadata is quickly obsolete and need to be always updated.
  • Metadata intereporability betwen many websites or institutions is hard. The whole idea of describing webpages using some common concepts to allow reasoning may sound great. But a major problem is that various websites would then have to agree to use the same concepts to describe their webpage, which is very hard to achieve. In real-life, what would instead happen is that a lot of people would describe their webpages in inconsistent way, and the intelligent agents would not be able to reason with these webpages as a whole

Because of these reasons, the concept of Semantic Web was never achieved as in that vision (by describing webpages with metadata and using inference engines based on logics).

What has replaced that vision of the Semantic Web?

In the last decades, we have seen the emergence of data mining (also called big data, data science) and machine learning. Using data mining techniques, it is now possible to directly extract knowledge from text. In other words, it has become largely unnecessary to write metadata and knowledge by hand using complicated authoring tools and languages.

Moreover, using predictive data mining and machine learning techniques, it has become possible to automatically do complex tasks with text documents without having to even extract knowledge from these documents. For example, there is no need to specify an ontology or metadata about a document to be able to translate it from one language to another (although it requires some training data about other documents). Thus, the focus as shifted from reasoning with logics to use machine learning and data mining techniques.

It has to be said though that the languages and tools that were developed for the Semantic Web have some success but a much smaller scale than the Web. For example, it has been used internally by some companies. Research about logics, ontologies and related concepts is also active, and there are various applications of those concepts, and challenges that remains to be studied. But the main point of this post is that the vision that this would be used at the scale of the Web to create the Semantic Web did not happen. However, some of these technologies can be useful at a smaller scale (e.g. reasoning about books at the library).

So this is all I wanted to discuss for today. Hope this has been interesting 😉 If you want to read more, there are many other articles on this blog. and you can also follow this blog on Twitter @philfv .


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Big data, Data Mining, Data science, General, Research, Web | Leave a comment

招收数据挖掘领域博士后,地点:中国,深圳

哈尔滨工业大学(深圳)工业设计研究中心正在招聘两名博士后研究人员进行数据挖掘/大数据方向的研究。

Harbin Institute of Technology (Shenzhen)

招聘条件:

  • 计算机科学博士学位,
  • 数据挖掘人工智能领域有着深厚的研究背景,
  • 在数据挖掘或人工智能领域的优秀会议或期刊上发表过论文,
  • 对数据挖掘算法的开发和应用有浓厚兴趣,
  • 211/ 985大学或国外优秀学校博士学位优先考虑。

成功申请人将:

  • 工作在与时间序列和空间序列相关方面或者其它与数据挖掘领域相关的理论或者工业应用。(确切的主题会根据申请人的优势讨论后确定)。
  • 加入由Philippe Fournier-Viger教授领导的优秀研究团队,Philippe Fournier-Viger教授是流行数据挖掘库SPMF的创始人,并且与其他领域的优秀研究人员有密切合作。
  • 工作在具有先进设备的实验室(实验室配备高端的工作站,用于大数据研究的服务器集群,GPU服务器,虚拟现实设备,身体传感器等)。
  • 以年薪17.6万元人民币聘用两年(其中51600来自学校,120,000来自深圳市政府)。请注意,博士后研究员不需要对工资支付任何税费,学校会提供低价格的租赁公寓(大约1500/月,很大地节省了住宿费用)。
  • 工作在全球计算机科学领域排名前50的大学之一,以及中国排名前10的大学之一。
  • 工作在中国东南部增长最快的城市之一深圳,这里污染低,全年气候温暖,接近香港。

如果您对此职位感兴趣,请尽快发送您的详细简历(包括出版物和参考文献清单)至Philippe Fournier-Viger教授(philfv8@yahoo.com ),可以申请2018年或2019年的博士后名额。

Posted in Academia, Big data, Data Mining | Tagged , , , | Leave a comment

How to run SPMF without installing Java?

The SPMF data mining software is a popular open-source software for discovering patterns in data and for performing other data mining tasks. Typically, to run SPMF, Java must have been installed on a computer. However, it is possible to run SPMF on a computer that does not have Java installed. For example, this can be useful to run SPMF on a Windows computer where the security policy does not allow to install Java. I will explain how to achieve this, and discuss alternative ways of running a Java program without requiring to install Java or by installing it automatically.

Method 1: Packaging the application in a .exe file with the Java Runtime Environment

This is one of of the most convenient approach. To do this, one may use a commercial software like Excelsior JET https://www.excelsiorjet.com/) . This software is not free but provides a 90 day full featured evaluation period.  Using this software, we can choose a jar file such as spmf.jar.  Then, Excelsior JET packages the Java Runtime Environment with that application in a single .exe file. Thus, a user can click on the .exe file to run the program just like any other .exe program.

Converting SPMF to exe

To try a 32 bit version of SPMF 2.30 that has been packaged in an .exe file using JET Excelsior, you can download this file:  spmf_jet.exe  (2018-04-02)
However, note that I have generated this file for testing purpose and will not update this file for each future release of SPMF.

While trying JET Excelsior, I made a few observations:

  • If we want to generate a .exe file that can be used on 32 bit computers, we should make sure to package a 32 bit version of the Java Runtime Environment in the .exe file (instead of the 64 bit JRE). This means that the 32 bit version of the Java Runtime Environment should have been installed on your computer.
  • Although the software does what it should do, it sometimes results in some slow down of the application. I assume that it must be because files are uncompressed from the .exe file.
  • Packaging the runtime environment increases the size of your application. For example, the SPMF.jar file is about 6 megabytes, while the resulting .exe file is about 15 megabytes.
  • Although a Java application is transformed into an .exe file, it stills uses the Java memory management model of using a Garbage Collector. Thus, the performance of the .exe should not be compared with a native application developed in language such as C/C++.

Method 2: Using a Java compiler such as GCJ

There is exists some Java compiler such as GNU GCJ (http://gcc.gnu.org/)  that can compile a Java program to a native .exe file. I have previously tried to compile SPMF using GCJ. However, it failed since GCJ  does not completely support SWING and AWT user interfaces, and some advanced features of Java. Thus, the graphical user interface of SPMF could not be compiled using GCJ and some other classes. In general, GCJ can be applied for compiling simple command-line Java programs.

Method 3: Using JSmooth to automatically detect the Java Runtime Environment or installl it on the host computer

An alternative called JSmooth (http://jsmooth.sourceforge.net/ ) allows to create an .exe file from a Java program. Different from Excelsior Jet, JSmooth does not package the Java Runtime Environment in a .exe file. The .exe file is instead designed to search for a Java Runtime Environment on the computer where it is run or to download it for the user. I did not try it but it seems like an interesting solution. However, if it is run on a public computer, this approach may fail as it requires to install Java on the computer, and  local security policies may prevent the installation of Java.

Method 4: Installing a portable version of the Java Runtime Environment on a USB stick to run a .jar file

There exists some portable software called jPortable and jPortable Launcher (download here: https://portableapps.com/apps ) to easily install a Java Runtime Environment on a USB stick. Then the jPortable Launcher software can be used to launch a .jar file containing a Java application.

Although this option seems very convenient as it is free and does not require to install Java on a host computer, the installation of jPortable failed on my computer as it was unable to download the Java Runtime Environment. It appears that the reason is that the download URL is hard-coded in the application and that jPortable has not been updated for several months.

Conclusion

There might be other ways of running Java software on a host computer without installing Java. I have only described the ones that I have tested or considered. If you have other suggestions, please share your ideas in the comments section, below.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Data Mining, Programming | Tagged , , , , | Leave a comment

KDD 2018 workshop on utility-driven mining

KDD workshop 2018

In this blog post, I will present the upcoming 1st International Workshop on Utility-Driven Mining, which will be held  in conjunction with the 24th ACM SIGKDD Conference (KDD 2018), on the 20th August 2018 in London (United Kingdom).

This KDD workshop is co-organized by Vincent-Tseng, Philip, S. Yu and Jerry Chun-Wei Lin and me (Philippe Fournier-Viger). The workshop aims at bringing together academic and industrial researchers and practitioners from data mining, machine learning and other interdisciplinary communities, which have done work related to the topic of utility-driven mining.

The topic of this workshop is very timely as there is more and more research on the topic of utility mining.

The important dates for submitting a paper to the workshop are as follows:

  • Paper submissions due:
    May 8, 2018
  • Paper notifications:
    June 8, 2018
  • Workshop date:
    August 20, 2018

Papers must be formatted according to the format of research papers for the KDD 2018 conference, that is no more than 9 pages using the Standard ACM Conference template.

For more details, you may visit the website of the workshophttp://philippe-fournier-viger.com/utility_mining_workshop_2018/index.html


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Conference, Research, Utility Mining | Tagged , , , , , | Leave a comment

Plagiarism by Sudhir Mohod and Sharda Khode from Bapurao Deshmukh College (BDCE)

Today, I have found yet another case of plagiarism from India, about some people who have again plagiarized my papers.  Since I do not have a lot of time, I will write briefly about this case.

The plagiarist are Sudhir Mohod and Sharda Khode from the Bapurao Deshmukh College of Engineering (BDCE), Sevagram in Nagpur, India (http://www.bdce.edu.in/ ).

The paper titled “Mining High Utility Itemsets using TKO and TKU to find Top-k High Utility Web Access Patterns”  was published in the proceedings of the International Conference on Electronics, Communication and Aerospace Technology (ICECA 2017), and appears in IEEE Explore ( http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8203736&tag=1 ). Here is a screenshot of the first page:

Sudhir Mohod Sharda Khode BDCE, Sevagram BDCO

The paper has plagiarized the paper by Tseng et al. published in TKDE (2016), which I am a co-author of ( http://www.philippe-fournier-viger.com/TKDE_TKO.pdf ), which has proposed the TKO and TKU algorithms.

Sudhir Mohod and Sharda Khode have plagiarized several paragraphs of our paper and even claim in the abstract to proposed the TKU and TKU algorithms, which was proposed by us. Here I highlights some large paragraphs that have been plagiarized from our paper:

Who are Sudhir Mohod and Sharda Khode?

Sudhir Mohod works at the Bapurao Deshmukh College of Engineering as assistant professor and head of department. His e-mail is:  sudhir_mohod@rediffmail.com .  Additional information about Sudhir Mohod (also called S. W. Mohod) are found on the college webpage:

Sudhir Mohod plagiarist S.W. Mohod

Sharda Khode is also an assistant professor at that college. Her e-mail is :  Sharda.khode16@gmail.com . On the webpage of the Bapurao Deshmukh College, we can find additional information about S. K. Khode:

S.K. Khode (Sharda Khode)

Conclusion

I will of course report this case of plagiarism to that institution where Sudhir Mohod and Sharda Khode work, and also to the IEEE using their Copyright Infringment Form (http://ieeexplore.ieee.org/xpl/copyrightInfringement.jsp ) so that the paper is retracted.  Hopefully, the Bapurao Deshmukh College of Engineering (BDCE)  will also take appropriate action to punish the authors of this plagiarized paper.

Some other recent cases of plagiarims that I have reported from India are below:

Update 2018-03-27

Today, someone from the IEEE Intellectual Property Rights Office answered me to acknowledge that they received my complaint, and ask for a copy of my paper. They say that they will send the documents to a committee that will compare papers and take an appropriate action afterwards. I hope that this process will be completed quickly so that the paper by  Sudhir Mohod and Sharda Khode is retracted, as it should.

Posted in Plagiarism | Tagged | 1 Comment

Plagiarism by Kalli S N Prasad and S Venkata Suryanaryana at GVIT College Bhimavaram (affiliated to JNTUK) and CVR college

In this blog post, I will again report another case of my papers being plagiarized by researchers from India. This is the second time that I find that someone has plagiarized my paper in less than a month. This time, my papers are plagiarized by a team of three professors, including S Venkata Suryanaryana from the CVR College of Engineering, and  Kalli S N Prasad, the principal of the computer science department of the GVIT college.  This college is affiliated to the Jawaharlal Nehru Technological University (JNTUK), Kakinada.

Kalli S N Prasad

Kalli S N Prasad

What is the paper?

The paper was written by three Indian professors.  The first one is Professor  Kalli S N Prasad  from the GVIT College, Bhimavaram in India.  The second one is Associate Professor  S Venkata Suryanaryana from the CVR College of Engineering in Hyderabad. The third one is Associate Professor D. Srikar of the GVIT College, Bhimavaram.

The paper containing plagiarism is:

Prasad, Kalli SN, S. Venkata Suryanaryana, and D. Srikar. “Efficient Mining Method for Maximal Closed Frequent Sequences without Candidate Generation.” (2017).  International Journal of Engineering Technology  Science and Research (IJETSR) www.ijetsr.com, ISSN 2394 – 3386, Volume 4, Issue 10, October 2017

Here is a screenshot of that paper:

Kalli S N Prasad , GVIT College, Bhimavaram

On that screenshot, I just show the abstract of the paper. And we can already see the greater part of the abstract is sentences copied word for word from my survey paper about sequential pattern mining published in the DSPR journal (my paper is here: http://www.philippe-fournier-viger.com/dspr-paper5.pdf) .

Then, many parts of the paper by Kalli S N Prasad plagiarize my paper. They claim to propose a new algorithm that they call MCFS. But they just copied the pseudocode of the MaxSP sequential pattern mining algorithm that I published in 2014 in that paper: http://www.philippe-fournier-viger.com/ADMA2013_MaxSP_maximal_sequential_patterns.pdf . They basically stole my algorithm, changed the name, and copied text from my papers.

Kalli S N Prasad

 The rest of the paper is similar.

Who are Kalli S N Prasad and S. Venkata Suryanaryana?

Kalli S N Prasad is a professor and the principal of the Computer science and engineering deparment at a college called GVIT college (Grandhi Varalakshmi Venkata Rao) in  Bhimavaram, IndiaThe website of the GVIT college is (http://www.gvit.ac.in/) .  On the page of the Department of Computer Science and Engineering, we can find the profile of that professor and his colleague S. Venkata Suryanaryana, who plagiarized my paper.

GVIT College, Bhimavaram.

From that webpage, we can find that the e-mail of S. Venkata Suryanaryana is :  suryabcu@gmail.com

Moreover, the e-mail of Kalli S N Prasad is : principal@gvvit.org

Relationship with the CVR College Of Engineering

I have also found that S. Venkata Suryanaryana is also associated to a college called the CVR College of Engineering http://cvr.ac.in/it/home.php ). On the Research page of that college, we can find a mention of his plagiarized paper.

CVR college of enineering

On the website of the CVR College Of Engineering, we can find several information about S. Venkata Suryanaryana, including his e-mail ( suryanarayana@cvr.ac.in ), and his CV ( http://cvr.ac.in/it/suryanarayana.php ) :

CVR college S Venkata Suryanarayana

And another picture (http://cvr.ac.in/it/home.php ) :

S Venkata Suryanarayana

What will happen?

This is just another case of plagiarism. I will of course report it to the top level of that college, and also to the affiliated  Jawaharlal Nehru Technological University (JNTUK), Kakinada ( http://www.jntuk.edu.in) ,  and perhaps also to the ministry of education.

Moreover, I will also ask for the journal paper to be retracted from the journal where it is published.  As I said, this is not the only case of plagiarism from India. I have reported many cases in recent years, where for many of them, the professors have not even been punished and are still working at their respective college as if nothing happened. The four most recent cases are explained in the following blog posts:

Update 2018-03-13.  About two weeks after informing both the JNTUK, GVIT college, and CVR college of engineering about that serious case of plagiarism, I did not receive any answer yet. It seems that they are just pretending that nothing happened, and that no punishment will be given.  If I still receive no answer, next week, I will have to consider contacting the ministry of education.

Posted in Academia, Plagiarism | Tagged , , , | 5 Comments

Comparing Two LaTeX documents with Latexdiff

Many researchers are using Latex to write research papers instead of using Microsoft Word. I previously wrote a blog post about the reasons for using Latex to write research papers. Today, I will go in more details about Latex and talk about a nice tool for comparing two Latex files to see the differences. The goal is to see the changes made between two files as it would appear when comparing two revisions of a document using Microsoft Word.  Comparing two documents is very useful for researchers, for example, to highlight the changes that have been made to improve a journal paper when it is revised.

We will use the latexdiff tool.   This tool can be used both on Windows and Linux/Unix. It is a Perl script.   I will first explain how to use it on Windows with MikTek.

Using Latexdiff on windows with the  MikTek Latex distribution

Consider that you want to compare two files called  v1.tex and v2.tex.

Step 1.  Open the Miktek Package Manager and install the package “latexdiff“.

As a result, latexdiff.exe  will be installed in  the directory \miktex\bin\ of your Miktek installation.

Step 2.  From the \miktex\bin\  directory, you can now run the command using the command line interface:

 latexdiff   v1.tex  v2.tex >  out.tex

This will create a new Latex file called out.tex that highlight the differences between your two Latex files.  You can then compile out.tex using Latex. The results is illustrated below:

latexdiff

 

Note 1: If you want to use latexdiff in any directories (not just in \miktek\bin\, you should add the path  to the directory \miktek\bin\ to the PATH environment variable of Windows.

Note 2 : There are a lot of things can go wrong when installing latexdiff on Windows. First, the encoding of your Latex files may cause some problems.  It took me several tries before I could make it work on some Latex files because the encoding was not in UTF-8.   I first had to convert my file to UTF-8. Also, for some Latex files, the output of latexdiff may not compile. Thus, I had to fix the output by hand.  But there are a lot of command line parameters for latexdiff that can be used perhaps to fix these problems if you encounter them.

Installing Latexdiff on other platforms

To install latexdiff on other platforms, you should first make sure that Perl is installed on your machine. Perl can be downloaded from here: http://www.perl.org/get.html Then, you should download the latexdiff package from CTAN to install it:  https://ctan.org/tex-archive/support/latexdiff

I will not provide further details about this because I did not install it that way.

Conclusion

In this blog post, I shown how to use a very useful tool called latexdiff for researchers who are writing their papers using Latex.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Academia, General, Latex, Research | Tagged , , , , | Leave a comment

The conference that tolerates up to 20 % plagiarism

I often receives e-mails  about call for papers from unknown conferences and journals.  Usually, I ignore these e-mails. But sometimes, I open them just to have a look. Today, I received a very interesting e-mail from the ““International Conference on New Trends in Engineering & Technology” that encourages authors to plagiarize up to 20% of other papers. Here is the e-mail:

Brahma A <ank22@ascsp.co.in>
To:icntet@icntet.org
Bcc: XXXXXXX@gmail.com
Feb 17 at 12:11 PM

Dear Sir/Madam,

GRT College welcomes you to participate in “International Conference on New Trends in Engineering & Technology”. Our motto is to transform the ideas in to reality, for that we need to improvise the idea and get suggestions as well as reference. So we are coming up with the “conference short name” to share, realize and feel the ideas projected. We will give your idea a reputation if it is found to be true and plagiarism does not hold more than 20%. We welcome your participation.

All accepted, registered and projected papers will be submitted for inclusion in IEEE Xplore Digital Library.

Last date of paper submission is extended to: 22nd Feb. 2018.

Conference Venue: GRT Institute of Engineering and Technology. Tirupathi Highway, Tiruvallur Dist, Chennai, Tamil Nadu, India.

Please circulate the same to in your group.

Further Details: http://shorl.com/vygrehobryjunu

Of course, plagiarism is unacceptable. In my opinion, this must be a bad translation, and the conference organizers probably meant “self-plagiarism” instead of “plagiarism”. This shows that sometimes it is important to choose the words carefully when writing.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in General, Other, Plagiarism, Research | Tagged , , | Leave a comment

Subgraph mining datasets

In this post, I will provide two standard benchmark datasets that can be used for frequent subgraph mining. Moreover, I will provide a set of small graph datasets that I have created for debugging subgraph mining algorithms.

subgraph mining datasets

The format of graph datasets

A graph dataset is a text file which contains one or more graphs.  A graph is defined by a few lines of text that follow the following format (used by the GSpan algorithm)

  • t # N    This is the first line of a graph. It indicates that this is the N-th graph in the file
  • v M L     This line defines the M-th vertex of the current graph, which has a label L
  • e P Q L   This line defines an edge, which connects the P-th vertex with the Q-th vertex. This edge has the label L

Five small datasets

Here are five small datasets that I have created for debugging frequent subgraph mining algorithms. Each dataset contains a single graph, which is enough for some small debugging tasks.

1) single_graph1.txt 

Content of the file:

t # 1
v 0 10
v 1 11
v 2 12
e 0 1 21
e 2 1 21

Visual representation:

(L10) ---L21--- (L11) ---- L21 ---- (L12)
  0              1                   2

2) single_graph2.txt

Content of the file:

t # 1
v 0 10
v 1 11
v 2 10
v 3 10
e 0 1 21
e 2 1 21
e 1 3 21

Visual representation:

(L10) --- L21 --- (L11) --- L21 ---- (L10)
  0                 1                  2
                    |
                    |
                   L21
                    |
                    |
                  (L10)3

3) single_graph3.txt

Content of the file:

t # 1
v 0 10
v 1 10
v 2 10
e 0 1 20
e 1 2 20
e 2 0 20

Visual representation:

 (L10)---- (L11) ---- (L10)
    0        1          2

4) single_graph4.txt

Content of the file:

t # 1
v 0 10
v 1 10
v 2 11
v 3 11
e 0 1 21
e 0 2 20
e 1 3 20
e 2 3 22
e 1 2 23

Visual representation:

    (L10) ------- L20 ------ (L11)
      |                    /   |
      |                 /      |
      |              /         |
      L21          /           |
      |         L23           L22
      |        /               |
      |      /                 |
      |    /                   |
      |  /                     |
    (L10) ------ L20 -------- (L11)

5) single_graph5.txt

Content of the file:

t # 1
v 0 10
v 1 10
v 2 11
v 3 11
e 0 2 20
e 1 3 20
e 1 2 20

Visual representation:

(10) -- 20 --  (11) -- 20 – (10) –-- 20 –---(11)
  0            2           1                3

Two standard benchmark datasets

Moreover, here are two popular datasets that are used in frequent sub-graph mining (I have obtained them from the GitHub website):

Conclusion

In this blog post, I have share some helpful datasets.  If you want to know more about subgraph mining you may read my short introduction to subgraph mining.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 145 data mining algorithms.

Posted in Big data, Data Mining, Data science, Graph mining | Tagged , , , | Leave a comment

Plagiarism by Divvela Srinivasa Rao at Lakireddy Balireddy College of Engineering (LBRCE)

In this blog post, I will report on another serious case of plagiarism from India. I have recently found that Divvela SRINIVASA Rao and Kilaru Gowthami  from the Lakireddy Balireddy College of Engineering in Mylavaram, India have  plagiarized my work in a journal paper.

Which paper?

The paper that contains plagiarism is this one:

Divvela SRINIVASA Rao and Kilaru Gowthami (2017)Discovering Periodic high-utility item sets from transactional databases. IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 4, Ver. III (Jul.-Aug. 2017), PP 33-42

Here is a screenshot of the title and abstract of the paper:

Plagiarism by Divvela SRINIVASA Rao Kilaru Gowthami

In that paper, Divvela SRINIVASA Rao and Kilaru Gowthami  copied text, algorithm and figures from my paper and claimed to have proposed the PHM algorithm, which was I proposed a year before at the Indust. Conf. On Data Mining 2016 conference. The paper of Rao and Gowthami is very poorly written. It contains many typos and errors and the figures and tables are blur as they did screenshots to copy some of them. The paper was published in a journal named  IOSR Journal of Computer Engineering (IOSR-JCE) by a publisher called IOSR journals. The fact that they have accepted such poorly written paper containing plagiarism raises question about the quality of journals from that publisher. Actually, I have never heard about that publisher before.

Who is Divvela SRINIVASA Rao?

He is an Indian person apparently working as Sr.Assistant Professor at the Computer science and Engineering (CSE) Department of the  Lakireddy Balireddy College of Engineering (LBRCE) in Mylavaram, India. Here is a picture of Divvela SRINIVASA Rao from his ResearchGate profile:

Divvela.Srinivasa Rao the plagiarist

It is to be noted that it is  that has uploaded the plagiarized paper to ResearchGate from his account (https://www.researchgate.net/profile/Divvela_Rao ) :

Divvela.Srinivasa Rao

The website of the Lakireddy Balireddy College of Engineering (LBRCE), where Divvela Srinivasa Rao  works is http://lbrce.ac.in/:

From the website of the university, we can find the CV of Divvela Srinivasa Rao ( srinumtechcse2007@gmail.com ):

CV of Divvela SRINIVASA Rao

Thus, Divvela Srinivasa Rao has been working for that college since 2014.

Another plagiarized paper by Divvela Srinivasa Rao

Is that the only paper that was plagiarized by Divvela Srinivasa Rao? Actually not.  By looking at the ResearchGate page of D.S. Rao, we can find that in a 2016 paper he plagiarized another paper. The paper containing plagiarism is:

Rao, Divvela Srinivasa, and V. Sucharita. “Maximum Utility Item sets for Transactional Databases Using GUIDE.” Procedia Computer Science 92 (2016): 244-252.

Here is a screenshot:

another paper by Divvela.Srinivasa Rao

In that paper Divvela Srinivasa Rao and V.Sucharita claimed to have proposed the GUIDE algorithm. But again, this is plagiarism because that algorithm was proposed several years earlier by Brian Shie et al. from Taiwan  in that paper:

Shie, B.E., Philip, S.Y. and Tseng, V.S., 2012. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 39(17),

And it is quite interesting, that I left a comment in 2016 on that paper by Divvela Srinivasa Rao on ResearchGate to indicate that it contains plagiarism to other people who may read it. And Rao, just ignored the comment and answered by asking questions about how to do research, like if he did nothing wrong. (https://www.researchgate.net/publication/306067553_Maximum_Utility_Item_Sets_for_Transactional_Databases_Using_GUIDE?tab=comments ):

Divvela.Srinivasa Rao

After that I did not answer, of course and he did not remove the paper either. And then, he even plagiarized my paper. This shows that this researcher has no shame in plagiarizing the papers of other people.

Who is Kilaru Gowthami?

The other author of the 2017 paper is Kilaru Gowthami who has now graduated from the Lakireddy Balireddy College of Engineering (LBRCE) according to his LinkedIn page:

There is not much information about Kilaru Gowthami  on the Internet. He seems to just be a student. However, did he take advantage of the plagiarized paper to graduate? This is an interesting question.

About the Lakireddy Balireddy College of Engineering (LBRCE)

It is the first time that I heard about this college so I decided to read about it. I checked the college website and found that the Chairman and founder of this college is Lakiredd Bali Reddy, a wealthy man living in the United states ( http://www.lbrce.ac.in/chairman.html ).  But this does not tell the whole story. By reading Wikipedia ( https://en.wikipedia.org/wiki/Lakireddy_Bali_Reddy ), we find that he is a convicted felon. He has spent about 8 years in jail in the US from about 2000 to 2008 for human trafficking, transportation of minor for sexual activity and immigration fraud. He is a also a registered sex offender in the US. It find it very surprising that the LBRCE college still carry the name of  Lakireddy Bali Reddy, even after he was convicted of human trafficking and jailed.

What will happen?

On 2018-02-09 ,I sent an e-mail to the dean (Dr. CH. V. Narayana),  principal  (  Dr. K.Appa Rao) and vice-principal (Dr. K.Srinivasa Reddy ) of the LBRCE colllege where D.S. Rao works to report this serious case of plagiarism. I am thus waiting for their answer to how they will handle this serious case of plagiarism. Hopefully, they will take appropriate action, and will also ensure that the plagiarized papers are retracted from the journals.

This is unfortunately not the first time that my papers are plagiarized by researchers from India. Although there are many excellent Indian researchers, there is an issue with plagiarism in smaller universities. Here are three other recent examples of cases of plagiarism from India that I have reported:

For some of these cases, I have contacted their affiliated college several times, and some  ignored the problem and let these people work in their college as if nothing happened.  This is quite incredible. I think that the government of India need to do something more to try to eradicate plagiarism in colleges, as there are quite many cases from my experience. But maybe that it is not something easy to do. Hopefully, the situation can be improved.

That is all for today! I just wanted to write about this to make sure that this case of plagiarism becomes public, and it is not ignored.

Update 2018-02-10:  I have received an e-mail from the Principal of the college indicating that there will be taking actions against Divvela Srinivasa Rao and that the two papers containing plagiarism will be retracted, and that he will inform me of further actions later. I appreciate that the Principal has quickly answered my message. I am looking forward to know what actions will be taken.

Posted in Academia, Plagiarism | Tagged | Leave a comment