Hi all, this is to announce that a new textbook in Thai has been published about pattern mining, which includes many examples using the SPMF software. The textbook named “Pattern Mining: Theory and Practice” is written by teacher Panida Songram from Mahasarakham University (Thailand) and can be used for teaching or self-learning, for students or practitionners. I have known the auhor for many years and I am very happy that she let me host a copy of the book that you can download from this link: Pattern Mining: Theory and Pratice (PDF, 14.2 MB),
The book gives a good coverage of pattern mining. It explains algorithms but also contains many practical examples about how to use SPMF. Some key topics in the book are itemset mining, sequential pattern mining and multi-dimensional sequential pattern mining.
That is all I wanted to share for today. If you can read Thai, I highly recommend to download this book. 😉
Today, I want to share with you the video presentation that I have prepared for my paper at PAKDD 2020. It presents a new problem where we want to discover locally trending high utility itemsets (LTHUIs). A LTHUI is a set of items purchased by customers that are trending (generate money that follows an upward or downward trend during some non predefined time periods. It is a variation of the popular high utility itemset mining problem.
Many researchers or students want to be successful researchers in their field. For this they make many sacrifices such as working long hours at the lab every day from morning to the evening. This is important because honestly, success comes with hard work. But it is important to still keep a good life balance to stay healthy. In this blog post, I will talk about the importance of having good life and work habits for researchers.
First let me tell you a bit about my story. Since the start of my graduate studies, I have worked countless hours to improve myself. For example, during my master degree and Ph.D. studies, I would basically not take any rest during the whole year, and work maybe 12 hours every day. That has allowed me to be successful in my field, receive big grants during my studies, publish many papers, and then to land some good jobs in academia. Nowadays, as I have a familly, I cannot work as much as when I was a student, but I still work hard, and I am much more efficient that I was before due to the skills that I have gained. For example, I can write a paper much more quickly. I still work very late at night almost every day.
Health is important
Now, what I have learnt over the year is that working is not everything. Health is also very important. Working for long hours at the lab can eventually bring several health problems like pains in the wrist, neck, back problem, and eye problems. Luckily, I do not have any major problems, but it is something to be aware of, as problems will typically appear later down the road.
My advices
First, it is important to eat healty food.
Second, it is important to have a good posture while working. For example, it is worthy to find a good chair for working and to adjust the height of the table, screen and to have some appropriate mouse and keyboard, to be comfortable.
Third, it is important to avoid sitting for a too long time, and to sometimes rest your eyes. Several studies have shown that sitting for long periods of time may lead to various diseases. Thus, every hour, it is good to stand up and go for a walk for a few minutes, for example.
Fourth, it is equally important to do some exercise every week. Even doing a few hours of exercise every like running, swimming or playing badminton can make you feel better. I personally like to go run for 30 minutes to an hour every day.
Also, if you are tired or are always siting on a chair, you may consider working in a standing position. I have recently started to do this, and it really feels great. I even wonder why I have not done this before! It is very good for the posture and the back. Here is a picture of my setup at home:
Some people recommend to alternate between a standing and sitting position to avoid getting tired. But personally, I have no problem working for several hours in a standing position. If you dont have a support like mine on the picture, you could as well use some boxes to raise your computer higher.
Another good advice is that if you are working on a laptop, you should consider using an external screen or external keyboard. The reason is that if you put your laptop low, then the keyboard will be perhaps at an appropriate height but the screen will be too low and you will have to bend your neck. But on the other hand, if you put your laptop higher the screen will be at an appropriate height for your eyes but the keyboard will be too high. Thus, using an external screen or keyboard can solve this problem.
Conclusion
In this blog post, I have discussed about the importance of having some good life habits to be a healthy researcher and avoid health problems later in life. If you have some other suggestions related to this, please post them in the comment section below!
Today, I will write a short blog post just to give a list of some common errors that I observed recently in some journal and conference research papers.
Using a reference number as the subject of a verb. For example, “[12] proposed an algorithm” should be written as “Smith et al. [12] proposed an algorithm”.
When there is a shorter way of writing something, it should be used. For example, “in order to” should be replaced by “to“. Another example: “this new type of algorithm is” can be replaced by “this new algorithm type is“. Similarly, “A is an extension of B” can be replaced by “A extends B“. In other words, we should write concisely.
The title of a paper is too long. I recommend to not have more than 10 words, and preferably less. I recently read a paper that had a title with more than 20 words!
Using too much the word “we”. Generally, it is better to avoid using “we” as much as possible.
Using the words “you” or “I”. These words should never appear in a research paper.
I could say much more about this. Indeed, you can look at my other blog posts about writing research papers for more information. But my goal was just to remind you about some common errors!
Hi all, This is to let you know the good news that the UDML workshopon Utility Driven Miningand Learning will be back this year, at IEEE ICDM 2020, for the third edition (UDML 2020).
This is a good venue to submit your papers about data mining and machine learning, especially given that all accepted papers will be published in the IEEE ICDM workshop proceedings, just like last year! Also, we are planning to have a special issue in a good SCI/EI journals for the best papers of the workshop (to be confirmed).
In particular, if you have some papers about high utility pattern mining (including topics such as high utility itemset mining, high utility episode mining or high utility sequential pattern mining), this is a perfect place to submit your papers 😉
But we are also looking for papers on other more general topics related to the concept of utility, such as to analyze/learning the important factors (eg, economic factors) in the data mining or machine learning process. Here is a non exhaustive list of some potential topics:
Theory and core methods for utility mining and learning
Utility patterns mining in large datasets, e.g., high-utility itemset mining, high-utility sequential patterns/rules mining, high-utility episode mining, and other novel patterns
Analysis and learning of novel utility factors in mining and learning process
Predictive modeling/learning, clustering and link analysis that incorporate utility factors
Incremental utility mining and learning
Utility mining and learning in streams
Utility mining and learning in uncertain systems
Utility mining and learning in big data
Knowledge representations for utility patterns
Privacy preserving utility mining/learning
Visualization techniques for utility mining/learning
Open-source software/libraries/platform
Innovative applications in interdisciplinary domains, like finance, biomedicine, healthcare, manufacturing, e-commerce, social media, education, etc.
New, open, or unsolved problems in utility-driven mining
Submissions are limited to 10pages, and must be formatted according to the IEEE 2-column format(link) Papers will be evaluated based on the evaluation criteria of the main IEEE ICDM 2020 conference for research papers. In particular, papers must present original research that is not under consideration in other journals, conferences and workshops.
In this blog post, I will share another talk that I have recorded recently. This time, I will explain a new paper from my team about discovering cost-effective patterns using some algorithms called CEPB and CEPN. Mining cost-effective patterns is a new topic in pattern mining that combines the concept of utility with that of cost.
Moreover, you can also download these algorithms, the source code and dataset from the SPMF data mining library.
That is all for today. == Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.
Today, I will share a short keynote talk (28 min) about discovering interpretable high utility patterns in data that I have presented at the CCNS 2020 conference. This talk gives an overview of techniques for finding interesting and useful patterns that can help to understand data.
Hope you will enjoy this video! If you want to know more about how to find interesting and useful patterns in data, I have written a series of blog posts on this topic.
I have also published various videos that you can find on this blog. Moreover, to apply this in your projects, you can use the SPMFopen-source data miningsofware (which I am the founder). It provides more than 150 algorithms for identifying useful patterns in data.
The SPP-Growth algorithm and datasets for evaluating its performance are available in the SPMF software, which is open-source and programmed in Java.
Source code and datasets:
The source code of SPP-Growth and datasets are available in the SPMF software.
The research paper:
Fournier-Viger, P., Yang, P., Lin, J. C.-W., Kiran, U. (2019). Discovering Stable Periodic-Frequent Patterns in Transactional Data. Proc. 32nd Intern. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA AIE 2019), Springer LNAI, pp. 230-244
If you want to watch more videos about data mining algorithms that I have recorded, you can click on the “video” category of this blog.
In this blog post, I will talk about how about to record a research talk on a computer as a video. This is an important topic for researchers for at least two reasons. First, sharing videos talking about your research can help to promote your research. Second, a researcher may be invited to send a video of his talk to a conference if he cannot attend it because of issues such as not obtaining a travel visa. Third, recording a video of a talk is useful as a backup plan when giving a talk online.
The steps to record a presentation as video on a computer are as follows.
Step 1. What kind of presentation do you want to give?
The first step is to decide on the type of presentation that you want to record. The most common types are:
A) Slides with voice-over: A person will record some slides with a voice-over.
B)Videoof a talk: A person will record a video of himself talking without slides.
C)Complex presentation: A person will combine multiple elements such as a presentation with slides, a video of himself, and audio.
D)Virtual whiteboard presentation: A person will do a type of presentation where we will see him writing or drawing live on a virtual whiteboard.
Doing a presentation of type A) or B) is easier than of type C) and D). But a more complex presentation may sometimes appear more interesting.
Step 2.Make sure that you have the right equipment
Recording a presentation can be done using very basic equipment like a cellphone or the microphone and webcam of a laptop computer. However, the quality of built-in webcams and microphones if often poor. To record video presentations, I use:
A professional microphone. I have bought one that is not so expensive and can be plugged by USB, and comes with a tripod (the SAMSON C01UPRO – see below). Using such microphone makes a huge difference in sound quality compared to the built-in microphone of my laptop. Some people will also buy additional accessories for their microphone like a pop filter, and a microphone shock mount. Also, it is important to plug the microphone directly into a USB port of the computer rather than using a USB hub to avoid recording delays.
A good webcam. I have also bought a good webcam (Logitech c922 Pro Stream), which can record in high definition with good colors. A nice feature is that the webcam can also be mounted on a tripod and that it has a free background removal feature that I will talk more about later.
Light. A good lighting source is also important if you are going to record videos of yourself using a camera and want to look good. Some cheap LED lamps or LED panels can for example be purchased and installed on your desk.
The above is perhaps the most important piece of equipments to increase the quality of recorded talks. Other equipment could also be added like tripods, a green screen for shooting videos, good headphones, etc. Here is a picture of my relatively simple setup for recording videos. I use two LED lamps, and an external webcam and microphone.
Step 3. Prepare your presentation
Before recording a talk, it is recommended to prepare your talk well and rehearse it a few times. This is true for any talks so I will not talk about this here.
Step4.Record the video
Depending on the type of presentation that you will make, it will be more or less complicated to record the presentation. I will discuss a few cases below.
For a presentation of type A) (slides + voice-over), it is quite simple. One can prepare his slides with a software such as Microsoft Powerpoint and then use the “Record slide show” feature to add voice to the presentation. This is done by clicking on the button below:
The result is a Powerpoint presentation that can be played with audio on any computer equipped with Powerpoint. Then, for more convenience, there is some software to convert a Powerpoint presentation to a video.
For a presentation of type B) (video from a camera), one can use some basic software to record from a camera such as a wecam. Some basic software to record a video come packaged with most operating systems (e.g. the Camera app in Windows 10). However, there also exist many other software programs that let you record videos but also add special effects, transitions, texts and other elements to your videos. Some video editing software are quite powerful and easy to use (e.g. Wondershare Filmora, Movavi Video editor) while some are harder to learn but are more powerful (e.g. Adobe Premiere).
For a presentation of type C)(slide + video of the person), it is more complicated to record because it requires to not only record your slides but also a video of yourself at the same time and then put them together in a video. Here is a picture of the result that we may want to achieve:
To do such video recording, I use the Camtasia software, which allows to record my screen or a Powerpoint presentation with a Webcam at the same time, and then to edit the resulting video with effects, transitions, text, etc. This software is not free, but it is very easy to use and powerful. Other alternative software could certainly be used.
I first open my slides with Microsoft Powerpoint.
Then I open the “Camtasia Recorder“. You can see the interface, below:
There, I first select the part of the screen that I want to record by clicking the “Custom” button. Then, I choose to record using my webcam by clicking “Camera on” and using my microphone by clicking “Audio on“. Then, to start recording, I click the “rec” button.
Then, after I finish recording my presentation, I click “Stop“.
After clicking “Stop” this opens the Camtasia editor and there I can edit the video that I have recorded. The interface of the Camtasia editor looks like this:
In the editor, it is possible to cut some part of the videos, add effects, and many other things. As you can see in the picture above, I have two different video tracks (at the bottom), one for the video recorded from the webcam (with a green background), and one for the presentation.
Then, since I have shot the video of myself with a green background, I can remove the background behind me. This is done by clicking on the video track of me and adding the “Remove a color” visual effect where I choose “green” as the color to be removed. (see a screenshot below):
This effect called “Chroma key” is a nice effect to have to do a nice presentation. It allows to have a transparent background so that I can overlay my video on top of my slides! If you also want to do this, you first need to shoot your video of yourself with a green background. There are two ways to do this. The traditional way is to shoot with a green screen behind you like this (source of the picture: Amazon).
However, buying a green screen is actually not necessary. A more simple solution is to use a virtual webcam software like ChromaCam that will use machine learning to automatically remove your background and put a green background behind you.
This is what I have done in the example above to avoid buying a real green screen. The latter would of course give a better effect but it would require additional space and money. The virtual webcam software Chromacam can be used for free but in that case, it will add a watermark to your videos. To remove the watermark, it is possible to buy a license. Or if you have a webcam like the C922 Pro Stream or Brio from Logitech, then ChromaCam will be free to use for the ChromaKey effect. So this is one of the reasons why I chose to buy the C922 Pro Stream for my setup. There are some other alternatives to ChromaCam like XSplit VCam but it is also not free and worse, it is based on a subscription model that requires to pay every month. There might be some other free alternatives to Chromacam but I did not find a good one that is easy to use and give good results. Here is a picture of the Chroma Key effect obtained using the Chroma Cam software:
As you can see above, it can remove the background quite well, although it may cut a bit of my hair and shoulder sometimes 😉
Another important thing that I do using the CamtasiaEditor is to add a “Cursor effect” so that my mouse pointer is highlighted in yellow in my videos. The result looks like this:
To do that, I click on the video track of my slides in Camtasia Editor and select a Cursor effect from the one offered:
Lastly, after recording the video, the last step is to encode it in an appropriate format. I usually choose MP4 because it is read by most browsers and devices. Then I publish the video. There exists various websites for publishing videos with different features. In my case, I already pay for a hosting company to host my website. Thus, I put my videos on that platform.
Lastly, for a presentation of type D) using a virtual whiteboard for writing and drawing on the screen, one can use a pen tablet or a computer that support pen input. For example, I bought a Wacom Intuos pen tablet (see below) that I connected to my laptop. Using this tablet, I can draw inside a software like Microsoft Paint, Powerpoint or One Note.
Then, I can record what I am writing using a screen recording software like Camtasia. This allows to create video presentations where I appear to be writing on a virtual whiteboard. For example, here is a little drawing that I made using the pen tablet, just as example:
Using this type of presentations is very useful to teach topics such as mathematics that are typically taught using a whiteboard.
Update from 2023: a green screen, and better recording method
I have recently upgraded my setup to record with a green screen, and the result is quite better. As you can see in the screenshot below:
This is my updated setup with a green screen:
Having a green screen gives better results but requires more space and also to adjust the lightning properly which is more complex than my previous solution with ChromaCam.
Besides, I also recently improved my recording method to get better video quality. A problem that I had observed is that Camtasia is quite CPU hungry and because of this the videos that I recorded of myself with my webcam tended to be a little choppy with a low frame rate. Moreover, I also observed that my new smartphone has better image quality than my webcam. Thus, I now use my cellphone to record the video of myself separately rather than using my webcam. This has the advantages that video quality is not affected by the CPU usage of Camtasia anymore and that better image quality is also obtained due to the smartphone’s better camera. Then, after I recorded the video with my smartphone, I import the video in Camtasia and synchronize the video with the sound and the recorded presentation. By doing this the end result has better image quality but is also more fluid. If I had a more expensive camera, I could also use this in place of my smartphone.
Conclusion
In this blog post, I have provided some tips about how to record a research talk on a computer. Hope this has been interesting and will be useful!
If you have some comments or other complementary advices, please leave a comment below.
Today, I presents the CPT and CPT+ sequence prediction models in a video. Sequence prediction is an important task in data mining which consists of predicting the next symbols of a sequence. It can be used for example to predict the next word that someone will type on a keyboard, or the next location where someone will go.
The official implementations of CPT and CPT+ models and datasets for evaluating their performance are available in the SPMF software, which is implemented in Java and open-source. There is also an unofficial implementation of CPT in Cython.
The CPT+ (Compact Prediction Tree+) model is described in this article: