When ChatGPT is used to write papers…

Today, I want to share with you something funny but also alarming. It is that some papers published in academic journals contains text indicating that parts were apparently written by LLMs.

The first example is this paper “The three-dimensional porous mesh structure of Cu-based metal-organic-framework – aramid cellulose separator enhances the electrochemical performance of lithium metal anode batteries” in the journal Surfaces and Materials of Elsevier. The first sentence of the introduction is “Certainly, here is a possible introduction for your topic:

It is quite surprising that authors and reviewers did not see this!

A second example of such problem is case report “Successful management of anlatrogenic portal vein and hepatic artery injury in a 4-month-oldfemale patient: A case report and literature review published in the open-access Elsevier journal Radiology and Case Reports:

Again, it is surprising that this has passed through the review process unnoticed by the editor, reviewers or authors.

Have you found other similar cases? If so please share in the comment section!

Posted in Uncategorized | Leave a comment

Sneak peak at the new user interface of SPMF (part 2)

Today, I will continue to show you some upcoming features of SPMF 2.60, on which some work is ongoing. This new version of SPMF should be released in the coming weeks. The new feature that I will talk about today is the Timeline Viewer. It is a powerful tool for visualizing temporal data. Let me now show you this in more details.

The Timeline Viewer can first display event sequences, which are files taken as input by episode mining algorithms, among others. For example, we can use the TimeLineViewer to see a visual representation of this input file:

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
1|1
1|2
1 2|3
1|6
1 2|7
3|8
2|9
4|11

To do that, we first select the input file “contextEMMA.txt” using SPMF (1) and then click on the new “view dataset” button (2):

This open a table representation of the dataset:

Then, we click on the “View with Timeline Viewer” button (3) to see the visual representation:

The Timeline Viewer provides several options such as exporting to image, changing the tick interval, and the minimum and maximum timestamps, as well as applying a scaling ratio on the X axis. Moreover, the Timeline Viewer has a built-in custom algorithm to automatically determine the best parameters to ensure a good visualization. Here are some of the options available:

The second feature of the Timeline Viewer is to view time-interval datasets such as those taken as input by the FastTIRP and VertTIRP algorithms (to be released in SPMF 2.60). To use this feature, we again select an input file (1) and click on the “View dataset” button (2) :

Then, we obtain a Table representation of the dataset and click on the “View with Timeline Viewer” button (3) to see the visual representation:

The result is like this:

At the bottom, we have the timeline. On the left side, we can see the sequence IDs (S0, S1, S2, S3…) and we can see the time intervals from each sequences depicted using a different color for easier visualization. We can also adjust various parameters to customize the visualization and export the picture as a PNG file.

Here is another example with a smaller data file containing three time interval sequences:

OK, so that’s all for today. I just wanted to give you a preview of upcoming features in SPMF. Hope that it is interesting. There are still some bugs to be fixed and other improvements to be made, so that feature may still change a bit before it is released.

By the way, the Timeline Viewer is completely built from scratch to ensure efficiency (which is an important design goal of SPMF). Building a time line viewer was quite challenging. There are many special cases to consider and tricky aspects to ensure a good visualization.

If you have any comments or suggestions about this feature or what you would like to have in SPMF, please leave a comment below or send me a message.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in spmf | Tagged , , , , , , , , , , , , , | Leave a comment

Sneak peak at the new user interface of SPMF (part 1)

I am currently working on the next version of SPMF, which will be called 2.60. There will be several improvements to the user interface of SPMF. Here is an overview of some of the improvements to give you a sneak peak at what is coming. Note that, more changes may still occur before the next version is released ;-P

The new VIEW button is one of the most important new features of the upcoming SPMF 2.60. It provides many different views of various types of input files. For example, if we open an input file for high utility itemset mining, the view is like this:

There are also many other viewers that are integrated in the new version of SPMF, that cover all the main types of data available in SPMF.

Hope that this is interesting. This is just to give you a preview of what is coming in SPMF. Of course, this might still be a little different when it is released as I am still thinking about other possible improvements.


Posted in Big data, Data Mining, Data science, spmf | Leave a comment

UDML 2024 Accepted papers

Today, I want to talk to you about the upcoming UDML 2024 workshop at the PAKDD 2024 conference. This year is the 6th edition of the UDML workshop. I am happy to say that this year, we received a record number of submissions (23 submissions), which shows that the workshop and this research direction of utility mining and learning is going well.

As a result of the number of submissions, the selection process has been quite competitive, with many good papers, and some could not be accepted even if they were actually very good.

The list of the 10 accepted papers is as follows:

This will be certainly a very interesting workshop this year at PAKDD.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Conference, Data Mining, Data science, Pattern Mining, Utility Mining | Tagged , , , , , | Leave a comment

SPMF 2.60 is coming soon!

Today, I want to talk a little bit about the next version of SPMF that is coming very soon. Here is some highlights of the upcoming features:

1) A Memory Viewer to help monitor the performance of algorithms in real-time:

Also, the popular MemoryLogger class of SPMF is also improved to provide the option of saving all recorded memory values to a file when it is set in recording mode and a file path is provided. This is done using two new methods “startRecordingMode” and “stopRecordingMode”. The MemoryLogger will then write the memory usage values to a file every time that an algorithm calls the checkMemory method. You can stop the recording mode by calling the stopRecordingMode method.

2) A tool to generate cluster datasets using different data distributions such as Normal and Uniform distribution. Here some screenshots of it:

3) A simple tool to visualize transactions datasets. This tool is simple but can be useful for quickly exploring a datasets and see the content. It provides various information. This is an early version. More features will be considered.

The tool has two visualization features, to viewthe frequency distribution of transaction according to their lengths, as well as the frequency distribution of items according to their support:

4) A simple tool to visualize sequence datasets. This is similar to the above tool but for sequence datasets.

5) A new tool to visualize the frequency distribution of patterns found by an algorithm. To use this feature, when running an algorithm select the “Pattern viewer” for opening the output file. Then, select the support #SUP and click “View”. This will open a new window that will display the frequency distribution of support values, as show below. This feature also works with other measures besides the support such as the confidence, and utility.

6) A tool to compute statistics about graph database files in SPMF format. This is a feature that was missing in previous version of SPMF but is actually useful when working with graph datasets.

7) Several new data mining algorithm implementations. Of course, several algorithms for data mining will be added. Some that are ready are FastTIRP, VertTIRP, Krimp, and SLIM. Others are under integration.

8) A new set of highly efficient data structures implemented using primitive types to further improve the performance of data mining algorithms by replacing standard collection classes from Java. Some of those are visible in the picture below. Using those structure can improve the performance of algorithm implementations. It actually took weeks of work to develop these classes and make it compatible with comparators and other expected features of collections in the Java language.

Conclusion

This is just to give you an overview about the upcoming version of SPMF. I hope to release it in the next week or two. By the way, if anyone has implemented some algorithms and would them to be included in SPMF, please send me an e-mail at philfv AT qq DOT com.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Pattern Mining, spmf | Tagged , , , , , , , , , , , , , | Leave a comment

The importance of using standard terminology in research papers

Today, I will talk about the importance of using standard terminology in research papers in computer science. The idea to talk about this on the blog came after reading an interesting letter about research on optimization called “Metaphor‑based metaheuristics, a call for action: the elephant in the room” by Aranha et al. (DOI: 10.1007/s11721-021-00202-9).

This paper explains that in the field of optimization, there have been a growing list of articles in the last decade proposing seemingly new approach for optimization but explained using a wide range of metaphors some related to animals (e.g. bats, grey wolves, termites, spiders), natural phenomena (e.g. invasive weed, the big bang, river erosion), and many other weird sources of inspirations (e.g. how musicians play music together, how interior design is carried and the political behavior of countries).

A key issue pointed by the authors and other researchers is that many metaphor-based optimization algorithms introduce new terminology that are unnecessary to explain the new algorithms, as they could be explained more simply using the existing terminology. For example, it was shown by Camacho-Villalon (DOI: 10.1007/s11721-019-00165-y) that some optimization algorithms such as Intelligent Water Drop (IWD) optimization are nothing but a special case of Ant Colony Optimization (ACO). However, the terminology is changed and pheromone in ACO is now called the soil in IWD, and ants are water drops, and so on. Another example is black hole optimization, which was shown to be a special case of particle swarm optimization.

The main problem with authors proposing seemingly new algorithms using non standard terminology is as Aranha explains: ” (i) creating confusion in the literature, (ii) hindering our understanding of the existing metaphor-based metaheuristics, and (iii) making extremely difficult to compare metaheuristics both theoretically and experimentally.”

This problem has become quite big in optimization research with several papers proposing new metaphors that are unrealistic or unnecessary to explain small modifications to existing algorithms, so as to publish more papers with little innovation. However, this problem also appears in other fields of computer science where researchers use non standard terminology in their papers. As a result, it often become difficult to verify where an idea truly came from, some work may be duplicated, and finding other papers related to an idea can become quite difficult (if several papers use different terminology.

This is why, it is important to always use standard terminology when proposing a new paper, and also to clearly indicate the relationship with previous papers, and give credit when credit is due. This helps the research community in making it easier to find papers and understanding the relationships between them.

Hope that this has been an interesting blog post. If you have time, you may read the above papers that I have mentioned. They are quite interesting and highlight this issue.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Academia | Tagged , , , , | Leave a comment

UDML 2024 @ PAKDD 2024 (deadline extended)

This is a short blog post to let you know that the deadline for submitting your papers to the UDML 2024 workshop at the PAKDD 2024 conference has been extended to the 7th February.

Website: https://www.philippe-fournier-viger.com/utility_mining_workshop_2024/

Note that this year, all accepted papers from UDML 2024 will be invited for an extended version in a special issue of the Expert Systems journal.

So this is a very good opportunity for your papers at PAKDD!

And happy new year 2024!


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in cfp | Tagged , , , , , , | Leave a comment

Your social network on DBLP as a graph

Today, I discovered an interesting function of DBLP which is to draw your social network as a graph (assuming that you have a DBLP page). To use that feature, it is simple. Open your DBLP webpage, and then click here at the bottom of the page:

Then, your social network will be displayed (it can take a little while). For example, this is mine:

What is interesting is that it shows not only the direct co-authorship links, but also some transitive links thus highlighting some potential connections that one could create through his current network.

In the above picture, the graph is quite dense since I have 390 co-authors on DBLP.

By observing this graph, we can also see some strange structures like this one:

This structure seems too perfect (all the authors are connected between themselves). Thus, I have investigated why. The reason is simple. It is a paper that I participated in, where there was 8 authors and most of them were not from computer science. Thus, most of the authors on that paper had only one paper on DBLP, which was the same.

There is also a dense cluster here:

which is mostly European researchers.

I just wanted to share this interesting function with you in this blog post, as I have discovered it today (but it might have been available for a while!).

Posted in Academia, Research | Tagged , , , , , , | Leave a comment

Call for papers: UDML 2024 workshop @ PAKDD 2024

I am glad to announce that the 6th UDML 2024 workshop on Utility-Driven Mining and Learning will be held next year at the PAKDD 2024 conference.

IMPORTANT DATES

  • Workshop Paper Submission: January 17, 2024
  • Workshop Paper Acceptance Notification: February 7, 2024
  • Workshop Paper Camera-ready: February 21, 2024

PUBLICATIONS

All the accepted papers will be invited for publication in a special issue of the Expert Systems journal (Wiley, indexed in EI and SCI).

The website of UDML 2024 will be put online soon!

Posted in Uncategorized | Leave a comment

A new survey paper on episode mining!

I am pleased to announce today that my collaborators and I have published a new survey paper about episode mining to give an introduction to this nice and interesting subfield of pattern mining. To our knowledge this is the most complete and up-to-date survey paper on this topic.

What is Episode mining? Put simply, it is about analyzing a long sequence of events with timestamps to discover interesting patterns in it such as that some events often appear before other events within some interval of times. This has many applications in real-life such as analyzing relationships between alarms in computer networks.

I have previously written a blog post that gives and introduction to episode mining, and also published a video introduction to episode mining. But this time, it is a survey paper that is more detailed and give a broad and detailed overview of this research topic. You can read the new survey paper here:


Ouarem, O., Nouioua, F., Fournier-Viger, P. (2023). A Survey of Episode Mining. WIREs Data Mining and Knowledge Discovery, Wiley, to appear.

I hope that you will enjoy this new survey!

Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Data science, Pattern Mining | Tagged , , , , , , , | Leave a comment