A simple BAT script to unzip ZIP files in all sub-directories and then delete the ZIP files

In this blog post, I will share a simple script to recursively unzip all ZIP files in sub-directories of a directory, and then delete the ZIP files. Such script is useful for example when managing the submissions of papers at academic conferences. The script is based some code on StackOverflow where I have added the function to delete the ZIP files:

FOR /D /r %%F in ("*") DO (
    pushd %CD%
    cd %%F
        FOR %%X in (*.rar *.zip) DO (
            "C:\Program Files\7-zip\7z.exe" x "%%X"
		   del *.zip
        )
    popd
)

This is for Windows. It assumes that you have 7ZIP installed on your computer and that the path to 7ZIP is correct. Save this script into a text file. Rename it to have the .bat extension. Then, double click on the file to run the script. That is all!


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Other | Leave a comment

Brief report about the DASFAA 2022 conference

This week, I am attending the DASFAA 2022 conference, which is held online from the 11th to the 14th April 2022. In this blog post, I will talk about this event, and will update the blog through the conference.

DASFAA 2022 is the  27th International Conference on Database Systems for Advanced Applications. It is a well-established conference on database and related research areas and applications such as data mining and machine learning. This year, it was supposed to be held in India, but due to the pandemic, it was held in online mode. The conference is organized by IIT Hyderabad, India.

The DASFAA Proceedings

The DASFAA conference proceedings are published by Springer in book(s) of the Lecture Notes in Computer Sciences series. This ensures that papers in DASFAA are indexed and have a good visibility.

This year the DASFAA program includes 5 keynote talks, 143 research papers, 12 industry presentations, a panel, 11 demos, 5 tutorials and 6 workshops (including the PMDB 2022 workshop on pattern mining and machine learning). It is thus a rather large conference. There was 420 registered attendees.

Acceptance rates

For the main track, there was 400 submissions, and 72 were accepted as full papers ( acceptance rate of 18%) and 76 as short papers (19%).

For the industry track, there was 36 submissions, from which there are 7 accepted full papers (19%) and 6 accepted short papers (17%).

For the demo track, 9 of the 18 submissions were accepted (50%).

For the PhD track, 2 out of 3 submissions were accepted (66%).

The online platform

The conference is held using an online platform called Airmeet. It allows to check the schedule, listen to talks and there are some video chat rooms to socialize with other participants. Here are a few screenshots of that platform. The DASFAA schedule page:

The DASFAA social lounge offering chat rooms:

Day 1 DASFAA Workshop 1 : PMDB 2022

This year, I have co-organized the PMDB 2022 workshop at DASFAA (1st Workshop on Pattern mining and Machine learning in Big complex Databases). The workshop has 6 papers and a keynote talk by Prof. Joshua Zhexue Huang.

The keynote talk by Prof. Huang, founding director of the big data institute at Shenzhen University was about big data approximate computing. He presented a model called RSP (Random Sample Partition) as an alternative to the popular Hadoop and Spark models. The key idea in RSP is to create distributed data blocks that are random samples of the original data. Then, using these random data blocks, big datasets can be used to train approximate models such as SVM. Using RSP, confidence intervals can be calculated on the errors of approximation. RSP allows to process very large datasets efficiently and provide excellent scalability. In some applications, RSP was shown to have better performance than traditional models such as Hadoop.

big data approximate computing by Joshua Zhexue Huang

Then, there was 6 accepted papers presentations:

9:45Paper #2 An Algorithm for Mining Fixed-Length High Utility Itemsets (PDF)
Le Wang
10:05Paper #3 A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine (PDF)
Qihang Huang, Yulin He, Shengsheng Xu and Joshua Zhexue Huang
10:25Paper #4 Pattern Mining: Current Challenges and Opportunities (PDF, video)
Philippe Fournier-Viger, Wensheng Gan, Youxi Wu, Mourad Nouioua, Wei Song, Tin Truong and Hai Duong Van
10:45Paper #7 Why not to Trust Big Data: Identifying Existence of Simpson’s Paradox (PDF)
Rahul Sharma, Minakshi Kaushik, Sijo Arakkal Peious, Mahtab Shahin, Ankit Vidhyarthi and Dirk Draheim
11:05Paper #8 Localized Metric Learning for Large Multi-Class Extremely Imbalanced Face Database (PDF) (best paper award)
Seba Susan and Ashu Kaushik
11:25Paper #9  Top-k dominating queries on incremental datasets (PDF)
Jimmy Ming-Tai Wu, Ke Wang and Jerry Chun-Wei Lin

In particular, something special at this workshop is that we organized a collaborative paper (paper #4) with 7 well-established authors in pattern mining to talk about the current challenges and opportunities in pattern mining. You can watch the video of that talk on Youtube at: https://www.youtube.com/watch?v=CmtXcavZIgM&feature=youtu.be .

The PMDB workshop has been a success. Thus, we aim to organize it again next year and make it larger.

Day 2 – Conference opening

In the conference opening, the conference was presented. There was a lot of interesting information. As I missed one part of the opening, I would to thank Prof. P.K. Reddy for sharing the slides with me. I will provide a few screenshots of interesting content from the opening below.

dasfaa 2022 conferenc opening
dasfaa history
dasfaa acceptance rate
dasfaa tag cloud
dasfaa conference arangement
dasfaa program at a glance
dasfaa keynote talks
dasfaa best student paper
dasfaa best paper
dasfaa contribution award
dasfaa industry track
dasfaa demo track
dasfaa phd consortium

Keynote 1 : Fairness in Database Querying by Gautam Das

The first keynote of the conference was about fairness for database queries. This is an topic that has made the news in recent years with machine learning models that are for example deemed to be racist or unfair to some groups. Here are a few slides:

fairness in database querying by gautam das

Research paper presentations

There was numerous paper presentations. I was busy. Thus I will not specifically report on them.

Next year, DASFAA 2023

DASFAA2023 will be in Tianjin, China during April 2023

Conclusion

I have given a brief overview of the DASFAA 2022 conference. Hope that it has been interesting.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Conference, Data Mining | Tagged , , , , , , , | Leave a comment

Brief report about the CACML 2022 conference

Today, I have attended the CACML 2022 conference (2022 Asia Conference on Algorithms, Computing and Machine Learning), which was held virtually from Hangzhou, China from March 25 to 27.

This is a new conference on a very timely topic of machine learning and computing. The website is : cacml.net. The conference was well organized. The proceedings of the conference are published by IEEE in the IEEE Xplore digital library (indexed by IE, INSTP etc), which provides a good visibility to the paper.

I have been involved in this conference as conference chair, and as keynote speaker.

Keynote talk by Prof. Witold Pedrycz

The first talk was by Witold Pedrycz from University of Alberta, Canada, a well-known researcher and editor of the Information Science journal. He talked about machine learning, granular computing and federated learning. Here are a few slides:

As illustrated in the above slide, in federated learning, each client is the owner of his data. Then, if a server wants to build a machine learning model, the server will interact with the clients to build a model. But the client will not disclose their data to the server. Instead the clients will provide knowledge to the server to help build the model. This is interesting idea to protect the privacy of users. Below, there is another slide that shows how a model is updated in the case of gradient descent with federated learning. Each client returns a gradient to the server rather than sharing the data.

Then, there was more details about how federated learning and granular computing can be used jointly to build rule-based models, and the issue of how to evaluate the quality of the models built using federated learning (since the server does not have access to all the data). I will not report all the details. But it was an interesting talk.

Keynote talk by Prof. Chin Chen Chang

The second keynote by Prof. Chin Chen Chang from National Tsing Hua University, Taiwan was about information hiding that is how to hide information, for example, to send a secret message by hiding it inside images.

Keynote talk by Prof. Philippe Fournier-Viger

The third keynote was given by me about “Advances and challenges for the automatic discovery of interesting patterns in data“.

In that talk I briefly reviewed early studies on designing algorithms for identifying frequent patterns that can be used for instance to identify frequent alarms or faults in telecommunication networks. Then, I gave an overview of recent challenges and advances to identify other types of interesting patterns in more complex data. I introduced the concepts of high utility patterns, locally interesting patterns, and periodic patterns. Lastly, the SPMF open-source software will be mentioned and opportunities such as how to combine pattern mining with machine learning.

Keynote talk by Prof. Jie Yang

The last keynote talk was by Prof. Jie Yang from Shanghai Jiaotong University about attacks on deep learning models. Here are a few slides:

As shown above, some attacks can be on unnecessary features while other attacks can be about missing features.

There was also some discussion of defense techniques against attacks on neural networks, and attacks on the attention of deep learning models.

Conclusion

This is a short overview of the CACML 2022 conference. It is a medium-size conference (which is understandable since it is a new conference), but there was some good speakers and invited guests. I have enjoyed it, and I would attend again.


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Conference, Machine Learning | Tagged , , , , , , | Leave a comment

A Tool to Generate Reviews of Academic Papers

Writing reviews is important but sometimes repetitive and time-consuming. Hence, today I built a tool to help automatize the process of review writing. You can try it at the website below:

The Review Generator ( http://philippe-fournier-viger.com/reviewgenerator/index.html )

This tool let you select some items and this will add predefined sentences to your review. Of course, this tool is not supposed to replace a human and generated reviews should be viewed as draft and edited by hand to add more details.

If you like the tool, you may boomark it and share it. And if you would like more features, please let me know. For example, if you would like that I add more content to the tool, please leave a comment below or send me an email.

** Credit ** That project is a modification of AutoReject (https://autoreject.org/)by Andreas Zeller, which was designed as a joke to automatically reject papers. I have reused the template but modified the text content to turn it into a serious tool. #review#academia#reviewgenerator#reviewprocess#journal#conference


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, General, Research | Tagged , , , , , | 2 Comments

What I learned from being a journal editor for two years?

For over two years, I have worked as associate editor-in-chief for a major journal published by Springer. For this job, I have handled around 500 papers from all the steps of the review process. Recently, I have decided to stop this work as it was taking too much of my time and as time is limited in life, I wanted to give priority to other aspects of my life. I was very happy to be associate editor-in-chief for that big journal as this has allowed me to learn many things. I will discuss what I have learned in this blog post. The key things that I have learned are:

  • Being an editor can be a lot of work. For a big journal, the workload of an editor is quite high. I would spend about one hour every day to try to find reviewers for papers, reading the papers or do other related tasks. One hour may not seem a lot but when you have a research team, a family and other things in your life, it is a lot of time and you have to think about your priorities. With this 1 hour, I could instead do some sport, spend more time with my kid or spend more time on my own research.
tired professor editor
  • Finding reviewers for a paper is not so easy. Authors of research papers often think that reviewers are easy to find. But the truth is that often it is a hard task to find reviewers. The editor has to invite numerous reviewers. Each reviewer has one week or more to accept the invitation. And often the potential reviewers would not answer or decline. Thus, it can happen that it takes over one month to find suitable reviewers for a paper. Generally, if a topic is unpopular, it can be quite hard to find reviewers.
  • Reviewers are often late. Many authors do not know that it is quite common that reviewers submit their reviews late. Often, it is by a few days, but some reviewers can be late by up to a month in some cases. When a reviewer is late, the journal will automatically send reminders to the reviewer but still some reviewers can receive dozens of reminders and ignore them. In this case, the editor has to step in and find more reviewer(s).
  • The work of an editor can be repetitive. Although being an editor is interesting and it is also very important for academia, I feel that some work is repetitive such as clicking to find reviewers. Other people will have different opinions but rather than doing such tasks, I would rather spend time working on my own research.
  • When submitting a paper, if you can enter keywords or select topics from a taxonomy to describe your paper, choose them wisely. It is important to choose keywords that really describe your paper well as this will be used to find reviewers for your paper and can accelerate the review process.
  • Some reviewers are unethical. As editor, it is important to read carefully what the reviewers write as there are several reviewers that would act unethically such as by asking authors to cite many of their papers to increase their citation count. When i would see this, I would intervene to stop this, of course, but there are always a few reviewers that are trying to do this.
  • If authors do not do enough effort to revise their paper well (after a round of review), it is not uncommon that a reviewer will change his mind and suggest to reject the paper in the next round. The reviewers are doing their work for free. Some reviewers are very patient and will accept to review the same paper multiple times. But others will not be patient if the author does not make sufficient effort to address issues in the paper. Thus authors should always do their best to solve the problems in their paper.

That was just a short blog post to talk a little bit about the work of an editor and the review process of a journal. I hope it does not sound negative because actually, I have learned many things from this work and I do not regret accepting to do this work.

Hope that this has been interesting. If you have any comments or questions, please leave them in the comment section below!


Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Other, Research | Tagged , , , , , , , | 1 Comment

CFP: DSSBA 2022 @ IEEE DSAA

Special Session on Data Science for Social and Behavioral Analytics

This is to let you know that I co-organize a special session called DSSBA 2022 at the IEEE DSAA 2022 conference, and we need your papers ;-).

This special session is about social and behavior analytics. This includes topics such as:

•    Efficient and scalable algorithms for behavioral and social analytics
•    Evaluation of behavioral analytic models
•    Social computing and behavior analytics
•    Interpretation of data science models under cognitive theories
•    Business process analysis
•    Customer behavior analysis
•    Social network analysis
•    Abnormal behavior detection
•    Group behavior analysis
•    Behavior prediction
•    Models for targeted and personalized behavior services
•    Privacy-preserving behavior and social analytics
•    Security models and algorithms for behavioral analytic
•    Behavioral economics with data analysis
•    Case studies and real-world applications of the above

All the accepted papers will be published as regular papers in the proceedings of IEEE DSAA 2022. The papers are published by IEEE and EI-indexed.

The deadline is: 1st June 2022
The format is up to 10 pages in IEEE double-column format.

Website for more information about DSSBA :
http://philippe-fournier-viger.com/DSSBA_2022/index.html

IEEE logo

Posted in cfp, Conference, Data Mining, Data science | Leave a comment

The HUIM-Miner and FHM algorithms (video)


Today, I post one more new video to explain concepts about pattern mining. In the new video, I talk about high utility itemset mining, and explain the HUI-Miner and FHM algorithms. Those are two popular high utility itemset mining algorithms that have been used in hundreds of research papers.

high utility itemset mining video frame

Click here to watch the video: The HUI-Miner and FHM algorithm – 45 minutes

If you want to see more videos about pattern mining, you can check the video page of the SPMF website or my Youtube channel (which contains many of the same videos)

Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Pattern Mining, Utility Mining, Video | Tagged , , , , , , , , , | Leave a comment

The PrefixSpan algorithm (video)

I have posted a new video about pattern mining, explaining the PrefixSpan algorithm. It assumes that you know already what is sequential pattern mining. If you are not familiar with sequential pattern mining, you can first watch my video Introduction to sequential pattern mining.

Click here to watch the video: The PrefixSpan algorithm (30 min)

If you want to see more videos about pattern mining, you can check the video page of the SPMF website or my Youtube channel (which contains many of the same videos)

Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in Data Mining, Pattern Mining, Video | Tagged , , , , , | Leave a comment

Some upcoming features of SPMF 2.54

Today, I will reveal some upcoming features of SPMF, which I am currently testing and will be released in the next version (2.54), probably in about 1 week.

1) The first new feature is that it will be possible to launch an algorithm in a separated virtual machine instead of using a thread in the same virtual machine. This option is useful for running performance experiments as it ensures that a new virtual machine is used for each algorithm execution. This avoids the problem that the memory is not released by the Java Garbage collector.

2) The second new feature is a time limit. It will now be possible to set a maximum time limit and to kill an algorithm automatically after exceeding that time limit.

The features are already implemented and I have tested them on Windows. I will be testing them on Linux later to see if it also works before releasing them.

If you have any suggestions for other features, you may leave a comment in the comment section below. I am happy to receive any comments to improve the software 🙂

Philippe Fournier-Viger is a distinguished professor working in China and founder of the SPMF open source data mining software.

Posted in open-source, Pattern Mining, spmf | Tagged , , , | Leave a comment

How to call SPMF from another Java program as an external program?

This is a short blog post where I will explain how to call SPMF as an external program from another Java program to execute an algorithm.

Before we start, it should be said that there are multiple ways to use SPMF. It can be used as a standalone program with a graphical user interface and from the command line. Moreover, the code of SPMF can be directly integrated in other Java programs, and SPMF can also be called using unofficial wrappers from other languages such as Python and R.

If you want to use SPMF from a Java program, it can be desirable in some cases to call SPMF using its command line interface rather than integrating the code of SPMF directly in your Java program. The adavantage is that SPMF is then executed as a separated process on your computer and it may be easier to maintain. How to do this?

First, you should download spmf.jar from the download page of the SPMF website and put it in the same folder as your Java program.

Second, you should lookup which algorithm you want to use in the documentation webpage of the SPMF website. For this example, lets say that we want to call the Apriori algorithm on a file called “contextPasquier99.txt” with the parameter minsup = 40% and save the result in a file “output.txt“.

According to the documentation of Apriori, we should write a command like this to execute it from the command line:

java -jar spmf.jar run Apriori contextPasquier99.txt output.txt 40% 

From Java code, we can write do like this:

List commandWithParameters = new ArrayList();
commandWithParameters.add("java");
commandWithParameters.add("-jar");
commandWithParameters.add("spmf.jar");
commandWithParameters.add("run");
commandWithParameters.add("Apriori");  // Algorithm name
commandWithParameters.add("contextPasquier99.txt");  // input file
commandWithParameters.add("output.txt");  // output file
commandWithParameters.add("0.5%");  // parameter

ProcessBuilder pb = new ProcessBuilder(commandWithParameters);
pb.redirectOutput(Redirect.INHERIT);  // This will redirect the output of SPMF to the console
Process process = pb.start();  // Run SPMF in a separated process

Running this code, will execute the Apriori algorithm and write some statistics in the console:

=============  APRIORI - STATS =============
 Candidates count : 11
 The algorithm stopped at size 4
 Frequent itemsets count : 9
 Maximum memory usage : 7.322685241699219 mb
 Total time ~ 0 ms
===================================================

If you want your Java program to wait for the completion of SPMF before continuing, you can add this line:

int exitValue = process.waitFor();

It is also possible to stop the process using this:

process.destroy();

That is all for today!

Posted in Data Mining, open-source, spmf | Tagged , , , | Leave a comment