UDML 2020 – Utility Driven Mining and Learning Workshop

Hi all, This is to let you know the good news that the UDML workshop on Utility Driven Mining and Learning will be back this year, at IEEE ICDM 2020, for the third edition (UDML 2020).

This is a good venue to submit your papers about data mining and machine learning, especially given that all accepted papers will be published in the IEEE ICDM workshop proceedings, just like last year! Also, we are planning to have a special issue in a good SCI/EI journals for the best papers of the workshop (to be confirmed).

open-source data mining software

In particular, if you have some papers about high utility pattern mining (including topics such as high utility itemset mining, high utility episode mining or high utility sequential pattern mining), this is a perfect place to submit your papers 😉

But we are also looking for papers on other more general topics related to the concept of utility, such as to analyze/learning the important factors (eg, economic factors) in the data mining or machine learning process. Here is a non exhaustive list of some potential topics:

  • Theory and core methods for utility mining and learning
  • Utility patterns mining in large datasets, e.g., high-utility itemset mining, high-utility sequential patterns/rules mining, high-utility episode mining, and other novel patterns
  • Analysis and learning of novel utility factors in mining and learning process
  • Predictive modeling/learning, clustering and link analysis that incorporate utility factors
  • Incremental utility mining and learning
  • Utility mining and learning in streams
  • Utility mining and learning in uncertain systems
  • Utility mining and learning in big data
  • Knowledge representations for utility patterns
  • Privacy preserving utility mining/learning
  • Visualization techniques for utility mining/learning
  • Open-source software/libraries/platform
  • Innovative applications in interdisciplinary domains, like finance, biomedicine, healthcare, manufacturing, e-commerce, social media, education, etc.
  • New, open, or unsolved problems in utility-driven mining

The website of the UDML 2020 workshop is here:
http://www.philippe-fournier-viger.com/utility_mining_workshop_2020/

Submissions are limited to 10 pages, and must be formatted according to the IEEE 2-column format(link) Papers will be evaluated based on the evaluation criteria of the main IEEE ICDM 2020 conference for research papers. In particular, papers must present original research that is not under consideration in other journals, conferences and workshops.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

(video) Mining Cost-Effective Patterns

In this blog post, I will share another talk that I have recorded recently. This time, I will explain a new paper from my team about discovering cost-effective patterns using some algorithms called CEPB and CEPN. Mining cost-effective patterns is a new topic in pattern mining that combines the concept of utility with that of cost.

Hope you will enjoy this video! If you want more details about this topic, you can read this paper:

Fournier-Viger, P., Li, J., Lin, J. C., Chi, T. T., Kiran, R. U. (2019). Mining Cost-Effective Patterns in Event Logs. Knowledge-Based Systems (KBS), Elsevier

Moreover, you can also download these algorithms, the source code and dataset from the SPMF data mining library.

That is all for today.
==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

(video) Discovering interpretable high utility patterns in databases

Today, I will share a short keynote talk (28 min) about discovering interpretable high utility patterns in data that I have presented at the CCNS 2020 conference. This talk gives an overview of techniques for finding interesting and useful patterns that can help to understand data.

Hope you will enjoy this video! If you want to know more about how to find interesting and useful patterns in data, I have written a series of blog posts on this topic.

I have also published various videos that you can find on this blog. Moreover, to apply this in your projects, you can use the SPMF open-source data mining sofware (which I am the founder). It provides more than 150 algorithms for identifying useful patterns in data.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

(video) Identifying Stable Periodic Frequent Patterns using SPP-Growth

Today, I present a video about finding stable periodic patterns in data, and discuss a new algorithm named SPP-Growth for this task.

The  SPP-Growth algorithm and datasets for evaluating its performance are available in the SPMF software, which is open-source and programmed in Java.

Source code and datasets:

The source code of SPP-Growth and datasets are available in the SPMF software.

The research paper:

Fournier-Viger, P., Yang, P., Lin, J. C.-W., Kiran, U. (2019). Discovering Stable Periodic-Frequent Patterns in Transactional Data. Proc. 32nd Intern. Conf. on Industrial, Engineering and Other Applications of Applied Intelligent Systems (IEA AIE 2019), Springer LNAI, pp. 230-244

If you want to watch more videos about data mining algorithms that I have recorded, you can click on the “video” category of this blog.

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

How to record a research talk as a video?

In this blog post, I will talk about how about to record a research talk on a computer as a video. This is an important topic for researchers for at least two reasons. First, sharing videos talking about your research can help to promote your research. Second, a researcher may be invited to send a video of his talk to a conference if he cannot attend it because of issues such as not obtaining a travel visa. Third, recording a video of a talk is useful as a backup plan when giving a talk online.

The steps to record a presentation as video on a computer are as follows.

Step 1. What kind of presentation do you want to give?

The first step is to decide on the type of presentation that you want to record. The most common types are:

  • A) Slides with voice-over: A person will record some slides with a voice-over.
  • B) Video of a talk: A person will record a video of himself talking without slides.
  • C) Complex presentation: A person will combine multiple elements such as a presentation with slides, a video of himself, and audio.
  • D) Virtual whiteboard presentation: A person will do a type of presentation where we will see him writing or drawing live on a virtual whiteboard.

Doing a presentation of type A) or B) is easier than of type C) and D). But a more complex presentation may sometimes appear more interesting.

Step 2. Make sure that you have the right equipment

Recording a presentation can be done using very basic equipment like a cellphone or the microphone and webcam of a laptop computer. However, the quality of built-in webcams and microphones if often poor. To record video presentations, I use:

A professional microphone. I have bought one that is not so expensive and can be plugged by USB, and comes with a tripod (the SAMSON C01UPRO – see below). Using such microphone makes a huge difference in sound quality compared to the built-in microphone of my laptop. Some people will also buy additional accessories for their microphone like a pop filter, and a microphone shock mount. Also, it is important to plug the microphone directly into a USB port of the computer rather than using a USB hub to avoid recording delays.

See the source image

A good webcam. I have also bought a good webcam (Logitech c922 Pro Stream), which can record in high definition with good colors. A nice feature is that the webcam can also be mounted on a tripod and that it has a free background removal feature that I will talk more about later.

Image result for c922

Light. A good lighting source is also important if you are going to record videos of yourself using a camera and want to look good. Some cheap LED lamps or LED panels can for example be purchased and installed on your desk.

The above is perhaps the most important piece of equipments to increase the quality of recorded talks. Other equipment could also be added like tripods, a green screen for shooting videos, good headphones, etc. Here is a picture of my relatively simple setup for recording videos. I use two LED lamps, and an external webcam and microphone.

Step 3. Prepare your presentation

Before recording a talk, it is recommended to prepare your talk well and rehearse it a few times. This is true for any talks so I will not talk about this here.

Step 4. Record the video

Depending on the type of presentation that you will make, it will be more or less complicated to record the presentation. I will discuss a few cases below.

For a presentation of type A) (slides + voice-over), it is quite simple. One can prepare his slides with a software such as Microsoft Powerpoint and then use the “Record slide show” feature to add voice to the presentation. This is done by clicking on the button below:

The result is a Powerpoint presentation that can be played with audio on any computer equipped with Powerpoint. Then, for more convenience, there is some software to convert a Powerpoint presentation to a video.

For a presentation of type B) (video from a camera), one can use some basic software to record from a camera such as a wecam. Some basic software to record a video come packaged with most operating systems (e.g. the Camera app in Windows 10). However, there also exist many other software programs that let you record videos but also add special effects, transitions, texts and other elements to your videos. Some video editing software are quite powerful and easy to use (e.g. Wondershare Filmora, Movavi Video editor) while some are harder to learn but are more powerful (e.g. Adobe Premiere).

For a presentation of type C) (slide + video of the person), it is more complicated to record because it requires to not only record your slides but also a video of yourself at the same time and then put them together in a video. Here is a picture of the result that we may want to achieve:

To do such video recording, I use the Camtasia software, which allows to record my screen or a Powerpoint presentation with a Webcam at the same time, and then to edit the resulting video with effects, transitions, text, etc. This software is not free, but it is very easy to use and powerful. Other alternative software could certainly be used.

I first open my slides with Microsoft Powerpoint.

Then I open the “Camtasia Recorder“. You can see the interface, below:

There, I first select the part of the screen that I want to record by clicking the “Custom” button. Then, I choose to record using my webcam by clicking “Camera on” and using my microphone by clicking “Audio on“. Then, to start recording, I click the “rec” button.

Then, after I finish recording my presentation, I click “Stop“.

After clicking “Stop” this opens the Camtasia editor and there I can edit the video that I have recorded. The interface of the Camtasia editor looks like this:

In the editor, it is possible to cut some part of the videos, add effects, and many other things. As you can see in the picture above, I have two different video tracks (at the bottom), one for the video recorded from the webcam (with a green background), and one for the presentation.

Then, since I have shot the video of myself with a green background, I can remove the background behind me. This is done by clicking on the video track of me and adding the “Remove a color” visual effect where I choose “green” as the color to be removed. (see a screenshot below):

This effect called “Chroma key” is a nice effect to have to do a nice presentation. It allows to have a transparent background so that I can overlay my video on top of my slides! If you also want to do this, you first need to shoot your video of yourself with a green background. There are two ways to do this. The traditional way is to shoot with a green screen behind you like this (source of the picture: Amazon).

However, buying a green screen is actually not necessary. A more simple solution is to use a virtual webcam software like ChromaCam that will use machine learning to automatically remove your background and put a green background behind you.

This is what I have done in the example above to avoid buying a real green screen. The latter would of course give a better effect but it would require additional space and money. The virtual webcam software Chromacam can be used for free but in that case, it will add a watermark to your videos. To remove the watermark, it is possible to buy a license. Or if you have a webcam like the C922 Pro Stream or Brio from Logitech, then ChromaCam will be free to use for the ChromaKey effect. So this is one of the reasons why I chose to buy the C922 Pro Stream for my setup. There are some other alternatives to ChromaCam like XSplit VCam but it is also not free and worse, it is based on a subscription model that requires to pay every month. There might be some other free alternatives to Chromacam but I did not find a good one that is easy to use and give good results. Here is a picture of the Chroma Key effect obtained using the Chroma Cam software:

As you can see above, it can remove the background quite well, although it may cut a bit of my hair and shoulder sometimes 😉

Another important thing that I do using the Camtasia Editor is to add a “Cursor effect” so that my mouse pointer is highlighted in yellow in my videos. The result looks like this:

To do that, I click on the video track of my slides in Camtasia Editor and select a Cursor effect from the one offered:

Lastly, after recording the video, the last step is to encode it in an appropriate format. I usually choose MP4 because it is read by most browsers and devices. Then I publish the video. There exists various websites for publishing videos with different features. In my case, I already pay for a hosting company to host my website. Thus, I put my videos on that platform.

Lastly, for a presentation of type D) using a virtual whiteboard for writing and drawing on the screen, one can use a pen tablet or a computer that support pen input. For example, I bought a Wacom Intuos pen tablet (see below) that I connected to my laptop. Using this tablet, I can draw inside a software like Microsoft Paint, Powerpoint or One Note.

Then, I can record what I am writing using a screen recording software like Camtasia. This allows to create video presentations where I appear to be writing on a virtual whiteboard. For example, here is a little drawing that I made using the pen tablet, just as example:

Using this type of presentations is very useful to teach topics such as mathematics that are typically taught using a whiteboard.

Conclusion

In this blog post, I have provided some tips about how to record a research talk on a computer. Hope this has been interesting and will be useful!

If you have some comments or other complementary advices, please leave a comment below.

=
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

(Video) Sequence prediction with the CPT and CPT+ Models

Today, I presents the CPT and CPT+ sequence prediction models in a video. Sequence prediction is an important task in data mining which consists of predicting the next symbols of a sequence. It can be used for example to predict the next word that someone will type on a keyboard, or the next location where someone will go.

The official implementations of  CPT and CPT+ models and datasets for evaluating their performance are available in the SPMF software, which is implemented in Java and open-source. There is also an unofficial implementation of CPT in Cython.

The CPT+ (Compact Prediction Tree+) model is described in this article:

Gueniche, T., Fournier-Viger, P., Raman, R., Tseng, V. S. (2015). CPT+: Decreasing the time/space complexity of the Compact Prediction Tree. Proc. 19th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD 2015), Springer, LNAI9078, pp. 625-636.

The original CPT algorithm was described in this paper:

Gueniche, T., Fournier-Viger, P., Tseng, V. S. (2013). Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction. Proc. 9th Intern. Conference on Advanced Data Mining and Applications (ADMA 2013) Part II, Springer LNAI 8347, pp. 177-188.

That is all for today. More data mining videos will be posted soon!

==
Philippe Fournier-Viger is a professor, data mining researcher and the founder of the SPMF data mining software, which includes more than 150 algorithms for pattern mining.

2020, a new year has started…

Fist I would like to wish a happy new year to all readers of this blog. I wish you health, hapiness and also success in your research projects! I am also thankful to all those who have used and/or contributed to the SPMF data mining software , which I have founded already a decade ago! Time goes fast, but the project is still active, and I am preparing a new release with about 10 new algorithms that will be released in one or two weeks. The new algorithms have been contributed by various people. By the way, if you would like to contribute code to SPMF, it is also welcome.

Fireworks

Now, I want to talk a little bit about the new year. The new year is a good time to think about past achievements and update our goals or set new goals. Having clear goals and working hard towards these goals is key to be successful.

That is all I wanted to say for today!

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 170 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

How to prepare for attending an academic conference?

So you have a paper accepted for presentation at an academic conference and you wonder how to prepare for attending the conference? In this blog post, I will discuss this topic.

Making a travel plan

For an international conference, the first thing to do before attending the conference is to check for the travel requirements. Travelling to several countries or territories require to apply for a visa and obtaining a visa can sometimes take a long time, and require to have various documents ready such as an invitation letter. Thus, it is better to start the process of applying for a visa early if needed. One may also require to obtain the approval from his university or company to attend a conference. If one cannot attend the organizers, he should also let the organizers know about it or arrange someone else to replace him.

After ensuring that you can enter the country/territory where the conference is held, the second most important thing is to have a transportation plan. For international conferences or domestic conferences that are far away, one should reserve an airplane/bus/train ticket early, as prices may increase and less choices may be available over time. Generally, I would recommend to arrive at least one day before the conference at the city where it is held.

You may also want to pay for a travel insurance and check if some vaccines are required. Travel insurance can sometimes be purchased with your airplane ticket.

Then, one should also book an hotel room early. When a conference is held in a famous city, sometimes the most affordable hotels or those that are the closest to the conference may become fully booked quickly.

Preparing your talk, and giving a good talk

If you are planning to give talk (a presentation of your research work) at a conference, you should prepare your presentation BEFORE the trip. I have previously written a blog post about how to give a good oral presentation at an academic conference and another one here. You may read these blog posts which gives many advices rabout how to prepare and deliver a good talk. Then, after your presentation is ready, if you are using electronic slides such as PPT slides, you want to put them on your laptop, on a USB drive and perhaps also keep a copy in your e-mail to avoid any problem.

If one has to present a poster at an academic conference, he should also prepare the poster in advance and keep some time for printing it.

Preparing a networking plan

In my opinion, the most important reason for attending an academic conference is to meet other researchers because all the papers presented at a conference can be read online anyway. To take advantage of the networking opportunities offered by a conference, you may look at the list of attendees before attending the conference and make a list of people that you would like to meet and discuss with. Meeting other researchers is important for the career of a researcher as it allows to exchange ideas and also develop collaborations and look for opportunities such as finding a post-doctoral, researcher or faculty position.

Close Up Photography of Yellow Green Red and Brown Plastic Cones on White Lined Surface

At a conference, many people will ask you where you are from? what kind of research are you doing? It is also good to have a short 30 second or 1 minute answer ready for these questions, as it may help to start some discussion. It is also good to bring your business cards if you have some, and it is useful to invite the people that you meet to connect on your profesionnal social network website like LinkedIn so you may want to install it on your phone. By the way, if you don’t already have a website, or profile on LinkedIn or on academic social networks like ResearchGate, it is a good idea to have one for your career so that people can find you online.

When you participate to an academic conference, you should also look at the schedule and make a plan of the activities that you want to attend to use your time well. And especially, you should not miss the networking activites like coffee breaks, banquet, reception, and poster sessions to talk with other people. Also don’t be shy. If you don’t know anyone, then remember that most people attending the conference also probably don’t know anyone and will be happy to talk with you.

Taking the airplane

If you fly to a conference, it is important to prepare your luggage well and what you will carry in the airplane. I generally prepare a luggage and also a backpack or small bag that I bring with me in the airplane. In that latter bag, I carry:

  • My passport, a printed copy of airplane tickets (because you may have to show your return ticket when arriving in another country), visa or other required travel documents, and travel insurance.
  • Computer and accesories (usb, charger, laser pointer, mouse, adapter to connect computer to a projector, etc.), cellphone.
  • Earplugs (for the noise in the airplane), headphones and adapter for using it in an airplane (because headphones provided in airplanes are sometimes quite bad),
  • Pens (always useful for filling forms when arriving in another country)
  • International power plug adapter (you should check if needed before travelling) to be able to use your electronic equipments
  • Cash, debit cards, credit cards, and other valuables items (jewelry, etc.).
  • Medicines (if needed)
  • Book (if I want to read in the airplane)
  • I also bring a very thin sport jacket to put in the airplane in case it is too cold (but you can also ask the air attendant for a blanket ).

I then put all other things in my luggage. For a conference, it is important to bring some nice clothings but it also does not need to be highly formal either.

Before entering the airplane, you should also choose your seat when checking in. In an airplane there are some good seats and some bad seats. For a long flight, I prefer to have an aisle seat (a seat beside the walking alley) because if I need to go to the washroom or walk a bit, I don’t need to ask other people to let me pass (they may be sleeping), and there is no one besides me on one side. The second best seat is the window seat, because there is also no one besides you on one side and you can lean on the window to have a rest. The worst seat are the seats where you sit between two persons because you may be squeezed between two persons and you can’t enjoy the window view and still need to ask other people to pass if you need to go to washroom or walk outside.

Arriving at the conference

When you will arrive at the conference, the first thing to do is to register at the registration desk. Then, you can enjoy the various activities of the conference.

Conclusion

In this short blog post, I gave some advices about attending a conference that I hope will be useful, especially to those attending an academic conference for the first time. If you have some questions or if you think that I forgot to mention something important, then please leave a comment below!

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 170 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

What is the difference between data mining and machine learning?

In this short blog post, I will answer the question: what is the difference between Machine Learning and Data Mining? I will first explain what is artificial intelligence, machine learning and data mining. Then, I will answer the question.

machine learning and data mining

What is artificial intelligence and machine learning?

Artificial intelligence is a field of research, which aims at developing software that can do some tasks that require intelligence. What is a task that requires intelligence is open to debate and can be for example to play chess, translate documents, write a novel, or choose the best route to drive from one location to another. This broad definition of artificial intelligence that I have given is defined based on the behavior of a software program (what a software program can do rather than how it works). Some people define artificial intelligence in a stricter way by requiring that an artificial intelligence should also simulate the mechanisms that intelligent beings such as humans use for producing intelligent behavior. In another word, an intelligent program should not only appear to behave intelligently but should also mimic how the brain works, for example.

There exist many types of artificial intelligence techniques. Some early research on artificial intelligence proposed the so called expert systems where a human expert would give knowledge to the system (for example, as a set of IF-THEN rules), which the system would then apply to behave intelligently. A problem with this approach is that writing knowledge by hand is time-consuming and prone to error for complex tasks, and that it is not always easy for a human expert to encode his knowledge.  Such systems have also been called knowledge-based systems.

Another type of artificial intelligence systems does not require knowledge or data. This is the case for example of algorithms such as A* (a-star), which are used for example to play games. Consider a simple game like Tic Tac Toe. All the possible moves in this game can be viewed as leading to different states, including some states where one wins or loses.  Because the number of possible states for such games is rather small, a simple algorithm to play such games can search through all the possible states or a subset of them to select the best move to perform.

Other artificial intelligence systems are not preprogrammed and are designed to learn by themselves from data. The field of research aiming at designing such systems is machine learning. Some popular types of machine learning systems are artificial neural networks, which are very loosely inspired by the brain. Such systems are generally trained to do some specialized task using some training data indicating what is the expected behavior in a given situation. The system then generalizes from this data to take decisions in new but similar situations. This process is called supervised learning. This is for example the case of a system for reading handwritten texts. Such system can be trained using handwritten letters where correct answers are provided by a human. After training the system with many examples of letters, the system can then recognize new letter drawings. There also exist some artificial intelligence systems that can learn from data without knowing the correct answers beforehand. This is called unsupervised learning. To summarize, machine learning is a subfield of artificial intelligence where a software program can learn from data.

What is data mining?

Data mining has a different focus. As the name implies, data is key to data mining. Without data, one cannot do data mining. The goal of data mining is to analyze data by discovering knowledge hidden in the data. For example, a classic data mining task is frequent pattern mining, which consists of finding the sets of values that frequently appear in data (e.g. discovering that many people buy bread with cheese and a chocolate bar at a supermarket). This task is unsupervised and has for only purpose of discovering something new in the data. Generally, such techniques can be used to understand the past or predict the future.

Some other data mining techniques are explicitly designed for extracting models from data that can then be used for making predictions. This is the case of techniques such as neural networks, decision trees, and regression models. Now, you probably remember that I already talked about neural networks as a machine learning technique. This is because data mining is actually overlapping with machine learning. In other words, some data mining techniques can also be called machine learning techniques.

What is the difference between machine learning and data mining?

Though, machine learning and data mining overlap, and both require data, data mining traditionally focus more on providing knowledge or models that are explainable or interpretable by humans, while machine learning studies are often more focused on what a model does. As a result, several machine learning models are designed to provide a high accuracy for some tasks such as handwritten character recognition, but appear to work like a black-box to humans. There is thus currently an important need to build more interpretable or explainable machine learning models. The problem of black-box machine learning models is illustrated in this funny picture from XKCD (credit: https://xkcd.com/1838/ ):

Machine Learning

Conclusion

That is all for this blog post. I just wanted to discuss differences and similarities between machine learning and data mining. If you would like to add something to this, you can post a message in the comment section, below.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Call for chapters: Machine Learning and Data Mining for Emerging Trends in Cyber Dynamics

Dear all readers of the blog.

I am co-editor of a new book to be published in 2020 by Springer about emerging technologies related to the cyberspace. The title of the book is “Machine Learning and Data Mining for Emerging Trends in Cyber Dynamics“.

Text Box:

We are now looking for chapters, to be submitted no latter than the 30th March 2020. The format is Microsoft Word and the length should be between 20 to 30 pages.

More details about the book can be found in the call for chapters.

Looking forward for your contributions to the book!


Philippe Fournier-Viger is a full professor, working in China, and founder of the SPMF open-source data mining library.