A brief report about the Big Data Analytics 2019 conference (BDA 2019)

This week, I have attended the 7th Big Data Analytics conference (BDA 2019), which was held in Ahmedabad, India from the 17th to 20th December 2019. This was a great event with good keynote speeches, invited talks, research papers, tutorials, a workshop on IT for agriculture, a panel and social activities. In this blog post, I will give a brief report about the conference.

The Big Data Analytics (BDA) conference

The BDA conference is an international conference about Big Data AnalyticsData Mining, Machine Learning and related topic. This year is the 7th edition of the conference. BDA is held every year in different cities of India but it attracts papers from several countries. This year, authors from 13 countries published papers, and the program committee, invited talks and keynote speeches comprised experts from numerous countries, as well as local experts. There was about 150 to 200 persons attending the conference.

The proceedings of the Big Data Analytics (BDA 2019) conference are published by Springer in the LNCS (Lecture Notes in Computer Science) series, which ensures a good visibility to the published papers. The papers are indexed by EI, DBLP and other major indexes for computer science. This is the proceedings book, which is available electronically to attendees:

bda conference BDA 2019 book

It was a pleasure for me to work as Program Committee co-chair for the conference to help select papers and build the program. This year, there was about 53 submissions, from which 13 were selected for publication (an acceptance rate of about 25%), and five invited papers were also published, for a total of 18 papers. The idea of having invited papers from top researchers was a good one, as it brought some really good papers.

Location of the BDA 2019 conference

The conference was held at Ahmedabad University. It is a relatively new university (10 years old). The university is located in the city of Ahmedabad, in the state of Gujurat, India.

Ahmedabad is famous for being a place where Mahatma Gandhi had lived, among other things. It also has some historical buildings and structures in and around the city, that are quite interesting. People living in this city are mostly vegetarian, and in that state, all alcohol is prohibited (unlike in other parts of India). There is also some local language spoken by the population. It was interesting to visit the city.

Local organization

The local organization was very well done. Everything was well arranged. For example, an airport pickup service was offered to all international attendees, and e-mails were always answered very quickly by local organizers.

Day 1. Registration

On the first day, I registered and received a nice bag with a pen, notebook, schedule and other things inside.

bda conference bag

The conference badges offered by the conference are of good quality. They are made of a wood-like material where names and affiliations appear to have been etched into the material.

bda conference badge

Day 1. Tutorial and Workshop on IT in Agriculture

On the first day of the conference, there was tutorials. Moreover, there was a workshop on IT in agriculture. I listened to the keynote by Prof. P. Krishna Reddy, which was quite interesting. It talked about how he has developed computer systems to provide advices to farmers in India, in various projects for more than 10 years. This is interesting as it is not just theory but has real practical applications that can change life of many people.

IT on agriculture workshop

Day 2, 3, 4 – Paper presentations

The paper presentations were quite interesting. I will not report about the details of each paper. But the paper covered a wide range of topics from pattern mining, information extraction, online review helpfulness prediction, urban tree type classification to data warehousing.

As I am a researcher working on pattern mining, I am particularly interested by this topic. There was three papers on pattern mining:

  • Fournier-Viger, P., Cheng, C., Lin, J. C.-W., Yun, U., Iran, U. (2019). TKG: Efficient Mining of Top-K Frequent Subgraphs. Proc. of 7th Intern. Conf. on Big Data Analytics (BDA 2019), Springer, 20 pages, pp. 209-226. [ppt] [source code] (this is my paper, it presents a new algorithm for finding frequent subgraphs in a set of graphs)
  • Duong, H., Truong, T., Le, B., Fournier-Viger, P. (2019). An Explicit Relationship between Sequential Patterns and their Concise Representations. Proc. of 7th Intern. Conf. on Big Data Analytics (BDA 2019), Springer, pp. 341-361.
    (this is a paper about a new way of finding frequent sequential patterns using generator and closed sequential patterns).
  • P. P. C. Reddy, R. Uday Kiran, Koji Zettsu, Masashi Toyoda, P. Krishna Reddy, Masaru Kitsuregawa: Discovering Spatial High Utility Frequent Itemsets in Spatiotemporal Databases. 287-306
    (this is a paper about extending high utility itemset mining for spatial data)

Day 2 – Cultural performance and reception

On the evening of the second day, there was a music and dance show, performed by students of the Ahmedabad University. Although students may not be professional, the show was quite good. It presented some traditional dances and Indian songs. The show was followed by a dinner.

bda 2019 cultural show

Day 3 – Panel: Big Data Analytics is not AI

On the third day, there was a panel titled “Big Data Analytics is not AI” that has sparked a lot of discussion, organized by Anirban Mondhal. I was one of the panel members, along with Goce Trajcevski, Shashi Shekhar, Ladjel Bellatreche, Sanjay Madrias and others. Here is a picture (some panel members not shown):

(credit: BDA 2019)

The topic was the relationship between machine learning and big data analytics. Four questions were asked to panel members, and then the audience asked additional questions.

  1. Should CS students learn theory and skills related to both BDA and ML?
    My answer: Artificial intelligence and big data analytics are popular. It is thus good for students to at least become familiar with these topics. Moreover, if one wants to become user of these techniques, he should not only learn how to utilize the many libraries available that are easy to use but also understand the theory, and the assumptions behind these techniques. This is important because if one does not understand the assumptions or theory behind these techniques, one may apply them wrongly. Also, before learning big data analytics and machine learning, it is better to have a strong foundation about the core concepts behind those such as databases, linear algebra and statistics.
  2. Should researchers work across both BDA and ML or specialize in any one of these areas?
    My answer: As researchers, we always tend to specialize in some area. This is reasonable because we are expected to publish state-of-the-art research, which requires to know well research in a given field. Having said that, I would like to talk about the relationship between big data analytics and machine learning. Generally, the goal of artificial intelligence is to build some software that can perform some task(s) that are said to require intelligence. On the other hand, the goal of big data analytics or data mining is to discover some useful information or build some useful models from data to understand the past or predict the future. Thus, artificial intelligence and big data analytics have different goals. The main one is that many techniques from artificial intelligence require data to train models. The artificial intelligence techniques that are not explicitly programmed but instead learn from data are called machine learning. The requirements for cleaning, preparing, transforming, storing and handling data may be the same as big data analytics. But there exists some artificial intelligence techniques that do not require training data. For example, this is the case of some traditional AI techniques such as theorem provers, path planners and logic reasoners. There are also some differences between machine learning and big data analytics. An important one is that machine learning tends to focus on building models that do something well or are accurate but are often black boxes (a model works, but the user don’t know why or how the model do predictions – this is the case of many deep learning models for example). On the contrary, many big data analytics techniques focus on discovering interpretable insights and on the visualization of results. For AI researchers, there is a lot to learn from data science/data mining about building explainable and interpretable models. But also, it is to be said that machine learning and big data analytics/data mining are also some fields that are overlapping. Some techniques such as neural networks can be said to belong to both machine learning and big data analytics.
  3. In the future, will the industry have separate roles for BDA and ML specialists?
    My answer: In the industry, it depends on the size of the company. Bigger companies tend to have persons doing more specialized tasks, while smaller companies may have persons doing many tasks. Recently, it has been interesting to see on some website like LinkedIn that many specialized job titles have been proposed such as: •Data scientist •Data engineer •Data architect •Data developer •Data analysist •Data warehouse software engineer •Database engineer •Statistician •Business analysis •Machine learning engineer •Predictive modeler…
    I personally don’t know very clearly the differences between all these job titles, and I often see contradictory definitions about these job titles.
  4. From a long-term perspective, do you see BDA and ML converging as a single research area or will they grow independently?
    My answer: No. As I said previously, big data analytics and machine learning have many things in common but also some different goals. Besides, in academia, there exists some communities that are clearly defined such as statisticians, data mining, machine learning, and researchers tend to stay in their field and publish in the journals and conferences of their community. It would take some time and major effort to redefine these communities.

Day 3 – Banquet

On the evening of the third day, there was a banquet outside. There were some tables serving Indian food and some chairs for those who wanted to sit. Others would eat standing and talk with others. As always, banquets are good for networking with other researchers. I had some good discussions with friends and met some other international and local researchers. Moreover, I was happy to talk with some local students who attended the conference and asked me some questions about how to learn about data science and machine learning. Besides, I was happy to meet some professors from some local universities who told me that they were using my SPMF data mining software for teaching data mining.

Group photo

Here is a group photo of BDA attendees:

(credit: BDA 2019)

Next year: BDA 2020

Next year, the BDA 2020 conference will be held in New Dehli, India. Then, BDA 2021 will be held in Allahabad, India.

Conclusion

In this blog post, I have given a brief report about the 7th Big Data Analytics conference (BDA 2019), from my perspective. On overall, it was a great conference, and I am very happy to have attended it. It was the first time that I went to India, and it has been a good experience. The quality of papers was quite high, and the invited speakers, tutorials and keynote speeches were very interesting. I will try to attend it again next year.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.(Visited 336 times, 1 visits today)

Posted in Uncategorized | Tagged , , , | 2 Comments

Brief report about the ADMA 2019 conference

This week, I am also attending the 14th International Conference on Advanced Data Mining and Applications (ADMA 2019) conference in Dalian, China, from the 21st to 23rd November at Dalian Neusoft University of Information.

Dalian Neusoft University
Dalian China

About ADMA

The ADMA conference is focused on data mining and its applications, and is generally held in China. It was held evey year since 2005, except in 2015. I have attended ADMA 2011ADMA 2012,  ADMA 2013 and ADMA 2014, ADMA 2018, and now I am here for ADMA 2019ADMA is a medium-size conference but I like to attend it as it generally still has some high quality papers and it is convenient for me to attend it as I am currently living in China.

Proceedings and acceptance rate

For the 14th edition of the conference, 170 submissions were received, and 39 were accepted as full paper (acceptance rate of 23%) and 26 more as short papers. This is a considerable increase in the number of submissions compared to last year, where 104 papers were submitted to ADMA 2018 .

The proceedings are published by Springer in the LNAI series, which ensures good visibility to the papers.

Registration

On the first day, I registered and received the conference bag containing the program, a pen, a note book and a guest conference badge. The proceeding book was available online. Although, I would have enjoyed having a physical copy of the proceedings, I have to admit that an online proceedings is more environment-friendly.

ADMA 2019 proceedings bag program

Day 1

The conference started with the opening ceremony, where the founder of the conference, Prof. Xue Li talked about the history of the conference.

ADMA 2019 opening ceremony

Then, there was a keynote speech by Chengqi Zhang about “AI for Social Good“. He first discussed about the AI turing test and the use of AI for different goals: functional simulation, perception and action. Then, he discusses three corresponding ways of doing AI that are knowledge-based reasoning systems (symbolism) and data-driven neural networks (connectivism), and behavior-based action system (behaviorism). He also emphasized the importance of combining different aspects of AI such as perception, action, and image and language understanding. He then talked more about what is AI, and how AI can make us happier, healthier and wealthier. He discussed applications such as medicine and self-driving.

Then, there was a second keynote, by Guoren Wang about ” Big Data 2.0: Future Data Computing“. He first talked about the history of innovation for Big Data technology, from Relational Database Systems relying on SQL/ACID database systems, to distributed systems, to NOSQL databases, to real-time technologies. He also talked about the evolution of big data computing frameworks such as Hadoop from Hadoop 0.0 (2007) to Hadoop 3.0 (2016), and newer frameworks such as Apache Flink and Spark Streaming for stream processing, and framework such as Apache Beam that support both stream and batch processing. He also talked about trends such as geo-distributed data centers and edge computing

Then, in the afternoon, there was several paper presentations. I presented a paper about a faster algorithm for high utility episode mining, named HUE-SPAN. In this paper, we first show that there is a problem in how the utility of episodes is calculated in previous work on high utility episode mining, and propose a solution to that problem. Then we present novel strategies and a tight upper-bound for high utility episode mining that result in the more efficient HUE-SPAN algorithm. The PPT about HUE-SPAN is available here.

Also related to the topic of mining patterns in data, I enjoyed the paper presentation of Acquah Hackman et al. called “Mining Emerging High Utility Itemsets over Streaming Database “, which receive the best student paper award.

I also enjoyed the presentation about discovering sequential rules in time series data by Benoit Vuillemin “TSRuleGrowth: Mining Partially-Ordered Prediction Rules From a Time Series of Discrete Elements, Application to a Context of Ambient Intelligence“, which was inspired by some ideas of my TRuleGrowth algorithm but for time series.

Then, there was a buffet in the evening to close the day.

Day 2

On the second day, there was a keynote by Prof. Vincent S. Tseng about deep learning and broad learning for medical AI. Broad learning means the fusion of multiple heterogeneous data sources for learning a model. To do broad learning, we can collect data from multiple data sources, devise a model to fuse the information from these heterogeneous data sources, and then mine information from each data source to then build a global model. Prof. Tseng then discusses medical AI systems, and some specific applications such as health prediction, and disease risk prediction.

ADMA 2019 vincent tseng keynote

There was then a keynote on geo-social recommendation by Prof. Hongzhi Yin.

Then, there was more paper presentations, and finally the gala dinner, where the best paper award winners were announced.

ADMA 2019 dinner

I was very happy to see that the paper “Tourist’s Tour Prediction by Sequential Data Mining Approach” by Baccar, L. B., Djebali, S., Guérard, G. won some award as they have used my SPMF data mining software in their work.

Day 3 and 4

On the third day, there was more paper presentations, and on the fourth days, there was a workshop related to health data.

ADMA 2020

Next year, the 15th ADMA conference (ADMA 2020) will be held in the Foshan area of the city of Guangzhou in China.

Conclusion

I enjoyed the conference. It is not a very big conference but usually the paper quality is fine. I will probably submit a paper again next year.


Philippe Fournier-Viger is a full professor, working in China, and founder of the SPMF open-source data mining library.

Posted in Conference, Data Mining, Data science | Tagged , , , , | 3 Comments

Brief report about the MIWAI 2019 conference

In this blog post, I will report about the MIWAI 2019 conference (13th Multi-disciplinary International conference on Artificial Intelligence), which was held from the 17th to 19th November 2019 at the EDC hotel in Kuala Lampur Malaysia.

MIWAI 2019 conference banner

About the MIWAI conference

This is the 13th edition of the MIWAI conference. The conference is called MIWAI since originally, it started as a workshop called Mahasarakham International Workshop on Artificial Intelligence in 2017. Initially, MIWAI was held every year in Thailand, and since 2011, it has been held outside Thailand as a conference:

  • Ho Chi Minh City, Vietnam (2012)
  • Krabi, Thailand (2013)
  • Bangalore, India (2014)
  • Fuzhou, China (2015)
  • Chiang Mai, Thailand (2016)
  • Brunei Darussalam (2017)
  • Hanoi, Vietnam (2018)
  • Kuala Lampur, Malaysia (2019)

Registration and proceedings

On the first day, I first registered and received the conference bag and proceedings.

MIWAI 2019 conference proceedings

The proceedings of MIWAI 2019 are published by Springer in the Lecture Notes in Artificial Intelligence (LNAI) series, which ensures good visibility to the papers. This year, there was 53 submissions from 23 countries, and 25 papers where accepted, for an acceptance rate of 45%.

Day 1 – Opening ceremony, keynote talk and paper presentations

On the first day, there was the opening ceremony.

Then, there was a keynote talk by me (Prof. Philippe Fournier-Viger) entitled “Algorithms to Find Interesting and Interpretable High Utility Patterns in Symbolic Data” about techniques for discovering useful patterns in data. In particular, I talked about high utility itemset mining, which has become a popular area of research, and introduced some of my recent contributions.

Then, there was several paper presentations. In particular, I enjoyed the talk about associative classification by “Generation of Efficient Rules for Associative Classification” by Chartwut Thanajiranthorn and Panida Songram. They proposed a novel associative classifier method that achieved high accuracy compared to other classifiers of that type.

MIWAI 2019 conference room

Another interesting paper that caught my attention, applied sequential pattern mining for building an academic chatbot. This paper is “Identification of Conversational Intent Pattern Using Pattern-Growth Technique for Academic Chatbot” by Suraya Alias, Mohd Shamrie Sainin, Tan Soo Fun and Norhayati Daut.

Day 1 – reception

In the evening, there was a nice reception dinner at the hotel with a traditional malaysian dance performance, and the best paper award was announced.

MIWAI 2019 conference banquet
MIWAI 2019 conference performance

Day 2 – keynote talk and other presentations

On the second day, there was a keynote by Prof. László T. Kóczy from Hungary about a novel Discrete Bacterial Memetic Evolutionary algorithm (DBMEA) for solving hard problems such as the travelling saleslman problem with a time window.

László T. Kóczy  keynote at MIWAI 2019

Then, it was followed by more paper presentations.

MIWAI 2020

New year, the MIWAI 2020 conference will be held in Seoul, Korea. See the information below.

And I heard that MIWAI 2021 would be held in Japan.

Conclusion

I am happy to have attended the MIWAI 2019 conference. I met some researchers that I knew beforehand and met several interesting people that I did not know. The quality of the papers was good, and some papers were particularly interesting for my research interests. The conference was well-organized.

In the next blog post, I will talk to you about the ADMA 2019 conference, which I will attend later this week to present a paper about high utility episode mining.


Philippe Fournier-Viger is a full professor, working in China, and founder of the SPMF open-source data mining library.

Posted in artificial intelligence, Conference, Data Mining | Tagged , , , | 1 Comment

Competitiveness in academia

In this blog post, I will talk about competitiveness in academia. I will discuss questions such as: What are the different forms of competition in academia? Is there too much competition in academia? and How to cope with competition?

The different forms of competition in academia

Generally competition means that many people will compete to access a limited amount of resources and opportunities. In academia, competition happens at many levels:

  • Students competing against each other in courses. Students taking courses at an undergraduate or graduate level sometimes compete with each other to obtain the highest grades. This is especially true for courses where the teacher uses a normal curve for grading. For example, when I was a graduate student, some professors would give the highest grade (A+) to only the top 5% of students. Then, some students would work quite hard to be in that top 5%.
  • Being admited in graduate school. The best students may be admitted in better research teams and research institutions for their master degree or PhD.
  • Competing for scholarships. The best students are often selected to receive scholarships.
  • Publishing papers in conferences and journals. Publishing research papers is a competitive process. This is especially true for conferences that only accept a limited number of papers and have a good reputation. Some journals are also very competitive because they receive many papers and only publish the best.
  • Competing for a post-doctoral researcher or faculty position. The job market in academia is also very competitive. Some universities receive hundreds of CVs for some faculty positions. In fact, there are much more people that have Ph.Ds than there are faculty positions available, in several countries. Thus, not all PhD graduates can continue working in academia.
  • Competing for research project funding. Obtaining funding is also a competitive process, as many researchers wants to obtain funding.
  • Competing for research impact. There are millions of research papers that are published but many of them are never cited. Writing papers that can have a major impact is difficult and is often a matter of publishing results first and doing a better work than other researchers.
  • Competing for awards. Several awards are given to researchers based on the quality of their work such as “best paper awards” at conferences. Few researchers may receive it.

Is there too much competition in academia?

Hence, there is competition in academia. But is there too much? It is hard to say if it is too much, but there is certainly quite a lot of competition. For example, competing for publishing papers in top conferences or obtaining faculty positions in some countries can be very difficult. Some people certainly don’t like to have that much competition, while others are comfortable with it. A positive aspect of competition is that it can push researchers to work harder. But a negative aspect is that some people may be discouraged or fail to attain their goals due to the limited resources and opportunities.

Generally, I think that it is necessary to have at least some minimum level of competition. For example, it make sense that some papers are not accepted in top conferences and journals because these papers are weak and contain major problems.

How to cope with competition?

Given that there is a high level of competition in academia, what one should do to be sucessful? Some people believe that they should solely focus on their own success and not contribute to the success of others. This is the mindset that some people have in sports where helping other people would decrease your chances of winning. However, academia is not like that. The most successful researchers generally have many collaborations with other researchers. The reason is that collaboration can bring benefits to all researchers that are cooperating (it is not a zero-sum game). For example, doing research projects with other researchers allows to obtain ideas and comments from collaborators that can be very valuable. Collaborating can also result in producing more papers. Building strong connections with other researchers can also help obtaining opportunities such as being invited to join committes of conferences. To know more researchers, a good way is to attend academic conferences.

Inside a research team, there can be some competition sometimes. However, members of a research team should try to collaborate or at least support each other. This can benefit all members, and also the whole team.

Also, one should not feel discouraged by competition. If one really wants to achieve some goals, it is always possible. But it requires to makes these goals clear as early as possible and to work hard to attain these goals. I think that working hard and smart are some of the most important skills in academia.

Conclusion

In this blog post, I talked about competitiveness in academia, as I think that it is a very important topic for researchers. I have shared a few ideas related to that. If you want to share your comments or your experience about how you are living competition in academia or if you think that I forgot to talk about something important, please post a message in the comment section below! I will be happy to read you.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

Posted in Academia, Research | Tagged , , | Leave a comment

The SPMF data mining library v.2.40 is released!

Hi all, I am please to announce that a new version of SPMF has just been published (v 2.40). It contains 9 novel algorithms:

  • the HUIM-ABC algorithm for approximate high utility itemset mining using Artificial Bee Colony Optimization (thanks to Wei Song and Chaoming Huang for contributing the code)
  • the TKG algorithm for mining the top-k frequent subgraphs in a graph database (thanks to Fournier-Viger, P. and Chao Cheng)
  • the gSpan algorithm for mining the frequent subgraphs in a graph database (thanks to Chao Cheng)
  • the SPP-Growth algorithm for mining stable periodic itemsets in a transaction database (by Peng Yang)
  • the MPFPS-BFS algorithm for mining periodic patterns common to multiple sequences (by Zhitian Li).
  • the MPFPS-DFS algorithm for mining periodic patterns common to multiple sequences (by Zhitian Li).
  • the NAFCP algorithm for mining frequent closed itemsets (thanks to Nader Aryabarzan et al.)
  • the OPUS-Miner algorithm for mining self-sufficient itemsets (thanks to Xiang Li for converting the original C++ code to Java)

It also includes some bug fixes and other minor improvements.

I did not release a new version of SPMF since a few months because I was quite busy recently. But the SPMF project is still very active. I am currently working on preparing a few more algorithms for release. I will try to make the next release in November.

Also I would like to say thanks again to all the persons who have contributed, used, cited, and supported the software! This is really helpful! Moreover, all contributions are always welcome.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

Posted in Data Mining, open-source, Pattern Mining, spmf | Tagged , , , , , | Leave a comment

Brief report about the 2019 World Conference on the Virtual Reality Industry (WCWRI 2019)

This week-end, I am attending the 2019 World Conference on VR Industry (WCVRI 2019) as an invited guest and panel member (on Monday). In this blog post, I will talk about this event, held in Nanchang, China from the 19th to 21st October at the Primus Hotel.

WCVRI is an international event focused on the industry that has both an exhibition part with booths from large companies, and also various forums, speeches, and talks. This event is held for the second year, and it is every year in the city of Nanchang, as it is a hub for virtual reality technology in China.

This year, the main theme is “VR+5G for a new era of perception“. Some key topics of the conference program are Cloud, Industrial Ecology, AI, XR technology, Film and television, Manufacturing, Education and Training, 5G, Deep learning and mixed reality, Anime, Investment, Talent development, Virtual Simulation, and Security and production.

Nanchang is the capital of the Jiangxi province. There are about 6 million people living there. The Gan river flows through the city.

Nanchang, Jiangxi province, China

This is a major event in the city. The event is announced everywhere, which shows the strong support of the government for this event. Here is a sign in front of the Tengwang pavilion, a popular tourist spot.

Opening ceremony

The opening ceremony was held on the morning of the 19th October. The governor of the Jiangxi province was the first to talk during the opening ceremony. There was then other government representatives who talked, including the secretary of the party of the Jiangxi Province Liu Qi, the secretary of the leading party members’ group of industry and information technology Miao Wei, and the Vice Premier of China, Liu He.

It was said that he electronic industry grows by 30% every year in Nanchang, and VR is an important part of that. VR is a key project of the city that may contribute to many other industries such as manufacturing. The city is trying to attract world-famous talents and entrepreneurs in Nanchang, as well as projects and funds.

The organizers of the conference read a congratulatory letter that they received from the President of ChinaXi Jinping, which highlights the support of the national government for the VR industry and this event.

The Vice Premier Liu He mentioned that he is very pleased to attend the event and would like to take the opportunity to exchange with people from the industry. He mentioned the importance of VR in the gaming and movie industry and that it can be used other areas such as manufacturing, medical services, and tourism. He also mentioned the importance of fundamental research to develop ground-breaking technologies, and enhancing education to produce a greater number of talents. He mentioned that the Chinese economy is moving in recent years from a phase of rapid development to a phase of quality development. Finally, he wished the industry and this event a great success.

vice premier liu he
Vice Premier Liu He

There was also an official signing ceremony between the leaders of Huawei, Inspur Group and the government of Jiangxi Province.

Keynote speech by Guo Ping, rotating chairman of Huawei

There was a keynote talk by the rotating chairman of Huawei Guo Ping. He said that VR provides a better experience than watching videos on mobile devices, and is more immersive than TV. Thus, VR may be the future of entertainment at home. He said that new generations of networks such as based on 5G are important to support a good VR experience. Latency must be low, data must not be lost, etc. Moreover, content delivery is important for VR, and thus having a fast and reliable cloud environment is important to deliver this content. Huawei has developed equipment and cloud computing technologies to support these requirements.

guo ping
Talk by Guo Ping

Keynote speech by Martin Hellman, Turing Award Winner

Martin Hellman, a Turing award winner also gave a talk. He first reminded us of the importance of public key cryptography to protect financial transactions, and that this technology it key to e-commerce, and the blockchain. He then explained the principles of public key cryptography. I will not explain this here but it was quite interesting to listen to the presentation. Below is a picture where he explains that he wants to send a message to his collaborator but that his wife Dorothée may read the message if it is not encrypted (interesting example)!

Talk by Martin Hellman

During the first day, there was also several other keynote speeches, including by the CEO of HTC, the Vice president of SAP China, and the Chief Scientist of Inspur. I will not report the details of these talks here.

Industry Exhibition

In parallel to the sessions, forums and talk, there was an exhibition by the VR industry with booths from hundreds of VR related companies. It was possible to try various VR-related products and also to see related applications and technologies such as 360 degrees cameras, motion capture equipment, augmented reality software and devices, 5G equipments, and drones. There was a lot of applications of VR related to gaming. A few pictures of the industry expo are below:

Talent development forum

On Monday, I participated as a speaker at a round table of the talent development forum, organized in the WCVRI 2019 conference. This forum was well-organized, and invited several experts for presentations, discussion, and there was also some official signing ceremony. I talked for a few minutes about the importance of international collaboration and talent development for the virtual reality industry, and programs related to talent development.

Conclusion

I am now back in Shenzhen, China. The event has now ended and it was a great event that I would recommend for anyone interested in virtual reality and related technologies. I have greatly appreciated the work of organizers and volunteers that have been very helpful. I am looking forward to attend this event again in the future.


Philippe Fournier-Viger is a full professor, working in China, and founder of the SPMF open-source data mining library.

Posted in Conference | Tagged , , , | Leave a comment

25 years of pattern mining

This year, we are in 2019, and it is already 25 years since Agrawal wrote his seminal papers on frequent itemset mining and association rule mining in 1994. Since then, there has been thousands of papers published on this topic, some about algorithm design, new pattern mining problems, and others about applications in a multitude of fields. And there is still many research issues to work on!

After all these years, it is a good time to look back at what has been achieved to get a new perspective. This is what I did recently with colleagues in a survey paper called “Frequent Itemset Mining: a 25 Years Review“. If you are interested by frequent pattern mining, I encourage you to read the paper, as it makes some interesting observations. For example, it is found that some ideas used in recent algorithms for mining patterns in big data can be traced back to some of the early algorithms. Here is a picture from the paper showing a timeline of key algorithms and events in frequent pattern mining:

That is all I wanted to write for today!


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

Posted in Big data, Data Mining, Data science, Pattern Mining | Tagged , , , | Leave a comment

Brief report about DAWAK 2019 / DEXA 2019

This week, I am attending the DAWAK 2019 and DEXA 2019 conferences in Linz, Austria from the 26th to the 29th August 2019. In this blog post, I will provide a report about these conferences.

About the DAWAK and DEXA conferences

DAWAK ( Intern. Conf. on Data Warehousing and Knowledge Discovery ) and DEXA ( International Conference on Database and Expert Systems Applications ) are well-established conferences related to data mining and database systems. This year, it is the 30th edition of DEXA, and the 21st edition of DAWAK. These conferences are co-located and held in Europe.

It is not the first time that I attend these conferences. You can also see my reports about DAWAK 2018 and DEXA 2018, and about DAWAK 2016 and DEXA 2016.

Proceedings

The proceedings of DEXA and DAWAK are published by Springer in the LNCS (Lecture Notes in Computer Science) series, which ensures that it is indexed in all major databases (EI).

DEXA 2019 received 157 submissions, and 32 were accepted as full papers (acceptance rate of 20%) and 34 as special research papers.

DAWAK 2019 received 61 submissions, and 22 were accepted as full papers (acceptance rate of 36%).

Location

The conferences were held at the Johannes Kepler University of Linz, Austria. The city of Linz has some old buildings and streets, some hill, and the Danube river passes through the city. Holding the conferences in a university is fine. However, the drawback is that the campus of the university is located about 5 km from the city center.

Registration

On the first day, I registered for the conference, and everything went smoothly. The registration started at 12:00 AM, which gave plenty of time for arriving at the conference. Some drinks were served but there was no lunch. The conference bag contains the program, proceedings on USB as well as a few papers and tickets for lunch and other activities.

Keynote by Vldimir Marik “AI in manufacturing”

The first keynote was by Prof V. Marik from Czech Technical University. He talked about how AI can be used in manufacturing. He mentioned that there is a lot of expectations about AI in recent years, and AI has the potential to improve production efficiency and develop new business models. He talked about Industry 4.0, and concepts such as augmented reality, internet of things and services, multi-agent systems, and using robots in production facilities.

Welcome reception

On the first day, there was a welcome reception at the university where the conference was held.

Keynote by Axel Polleres about the semantic web and linked data

There was a keynote by A. Polleres about the Semantic Web. It first talked about how the concept of Semantic web has evolved from the idea of Tim Berners Lee in early 2000. Initially, the main idea was to use description logics to annotate Web content with ontologies to perform reasoning about the Web content. Some of the key results from 2000-2009 was that researchers have found which logics are decidable and scalable. A question was also how much reasoning do we really need for the web? and how can one publish knowledge on the Web? To publish data on the Web, it was proposed to use technologies such as URI and RDF to create what is called (open) linked data.

The speaker also mentionned that some lessons learned is that the OWL standard is perhaps too complicated for users (which I agree), and RDFS is among the most used standard. Also in practice, ontologies may contain inconsistencies. The speaker then talked about a prototype semantic web search engine that was created, and how there is more and more open data published by organizations such as governments, and also now there is open data portals to find open data.

The speaker talked about the Knowledge Graph of Google and how we don’t know exactly how it works but it may be related to work on Semantic Web and linked data, and it is used for question answering and showing related data to queries. Then, there was more discussion, but I will not report everything about the talk.

Keynote talk by Dirk Draheim “Future Perspectives of Association Rule Mining Based on Partial Conditionalization

There was a keynote talk about association rule mining by Prof. Dirk Draheim from Estonia. He first indicated that data can be often misleading, and we may draw wrong conclusions if we don’t have enough data or don’t look at all the data. He mentionned the Simpson Paradox and that if we have more data or more information about the context, we can better understand the data. For example, although the average salary in Seattle may be higher than the average salary in Boston, it does not mean that people in Seattle really earn more than those in Boston, because in Seattle more people may be working in the IT industry and have high salary, which increases the average, but at the same time people in other industries in Seattle may be earning less than in Boston.

Prof. Draheim then suggested that we need to use other interesting measures and also consider probability theory. We can reformulate the problem of association rule mining using that theory and see a transaction database as a probability space. He then explained his idea, which I will not report all the details here. I think it is an interesting idea to use more statistics in pattern mining, and it is not the first work that goes in such direction (e.g. work on self-sufficient itemsets by Webb et al. uses statistical testing in pattern mining).

Banquet

On the evening of the third day, the conference banquet was held on a boat on the Danube River.

This year, several papers about pattern mining

I was pleased to see that there was many papers on pattern mining (e.g. itemsets, sequential patterns, association rules) this year such as:

  1. Philippe Fournier-Viger, Jiaxuan Li, Jerry Chun-Wei Lin, Tin Truong-Chi: Discovering and Visualizing Efficient Patterns in Cost/Utility Sequences. 73-88
  2. Hoang-Son Pham, Gwendal Virlet, Dominique Lavenier, Alexandre Termier: Statistically Significant Discriminative Patterns Searching. 105-115
  3. Philippe Fournier-Viger, Chao Cheng, Zhi Cheng, Jerry Chun-Wei Lin, Nazha Selmaoui-Folcher: Finding Strongly Correlated Trends in Dynamic Attributed Graphs. 250-265
  4. T. Yashwanth Reddy, R. Uday Kiran, Masashi Toyoda, P. Krishna Reddy, Masaru Kitsuregawa: Discovering Partial Periodic High Utility Itemsets in Temporal Databases. 351-361
  5. Hieu Hanh Le, Tatsuhiro Yamada, Yuichi Honda, Masaaki Kayahara, Muneo Kushima, Kenji Araki, Haruo Yokota: Analyzing Sequence Pattern Variants in Sequential Pattern Mining and Its Application to Electronic Medical Record Systems. 393-408
  6. Joe Wing-Ho Lin, Raymond Chi-Wing Wong: Frequent Item Mining When Obtaining Support Is Costly. 37-56
  7. Parul Chaudhary, Anirban Mondal, Polepalli Krishna Reddy: An Efficient Premiumness and Utility-Based Itemset Placement Scheme for Retail Stores. DEXA (1) 2019: 287-303
  8. P. Revanth Rathan, P. Krishna Reddy, Anirban Mondal: Discovering Diverse Popular Paths Using Transactional Modeling and Pattern Mining. DEXA (1) 2019: 327-337
  9. Raj Bhatta, Christie Ezeife, Mahreen Nasir Butt Mining Sequential Pattern of Historical Purchases for E-Commerce Recommendation

Next year

The DAWAK 2020 and DEXA 2020 conferences will be held in Bratislava, Slovakia on September 14th to 17th 2020.

Conclusion

That is all for this blog post! Globally, it was an interesting conference. It is not so big, nor too small, but it is an established conference, and some excellent researchers are attending it. The quality of papers was good. I have attended DEXA and DAWAK a few times, and will be looking forward to the next one.

Update: You may also be interested to read my newer posts about DEXA and DAWAK 2021.


Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

Posted in Big data, Conference, Data Mining, Data science | Tagged , , , , | 1 Comment

Brief report about the HPCC 2019 conference

In this blog post, I will write a short report about the HPCC 2019 conference (21st IEEE Conferences on High Performance Computing and Communications).The HPCC 2019 conference was held in Zhangjiajie, China from the 10th to 12nd August. It is colocated with DSS 2019 and SmartCity 2019, and organized by Hunan University.

Registration

I did the on-site registration and I received the conference bag, which contained the conference program, a notebook, a pen, and other information. However, I found that the conference bag did not contained the conference proceedings (neither printed or on a USB drive). So, I checked the website of HPCC which clearly say that:
each registrant will receive a copy of the conference proceedings.

Then, I asked the registration desk why I did not receive a copy of the proceedings since it is written on the website. But they did not wanted to give me one. I am not sure what is the reason for that and they did not explain but just said that there is no proceedings. My guess is that it is because I paid the regular registration free (about 550$) rather than the author registration fee. But still, the website said that ALL registrants would receive the proceedings. After talking with the registration desk, they only offered to copy it to my computer from their USB drive… which is not convenient, and it should not be that way. It should be provided in the bag, or in the worst case, it should be downloadable from the website.

One hour later, after talking with other participants, I found that some of them had received the proceedings on a USB… Thus, while attending the keynotes I sent an e-mail to organizers to ask why I did not receive the proceedings. After about one hour, they apologized and asked me to go back to the registration desk (for the third time) to give me a proceedings on USB. They did not give me a clear explanation but by listening to them talking in Chinese, it seems that they did not have enough proceedings so some people did not receive it. But there might also have been some misunderstanding.

Keynote by Bart Selman on the future of AI

This speaker said that he is excited about recent developments in AI research, and its increasing applications into the real-world. He mentioned that finally machines are starting to “hear” and “see” after about 50+ year of research on AI. Some recent changes is that big set of labelled data are now used to make AI understand our conceptualization of the world, and that there is a strong commercial interest in AI. The speaker said that by 2030, a 1000$ computer will be as powerful as the human brain in terms of computing power and storage (see picture below). I think that this is a bold claim given that the brain has a very different architecture from a computer. I would be curious about how they come with these numbers that the brain has billions megabytes capacity and billions MIPS.

About the future of AI, he mentionned that the next phase is further integration of perception, planning, inference, and learning. Moreover, we also need depper semantics of natural language such as common sense knowledge and reasoning. Common sense is also needed to handle extreme or unforeseen case (for example, to ensure the safety of self-driving cars). Moreover, the speaker mentioned that non human intelligence may be developped. Overall the talk was interesting.

Other keynotes

There was also several other keynotes by some good speakers, including Prof. Witold Pedrycz, editor of Information Science and other journals. And there was a keynote by Yunhao Liu about the internet of things, and a talk by Xindong Wu among others. I will not describe all of the keynotes since some of them are not so much related to my research (e.g. keynote on sensor networks).

One keynote speaker had several videos but could not play them due to some technical problem. The talk was still very interesting, but it is a reminder that one should always do a test on the equipment before giving a talk especially when using videos.

Paper presentation

I came to the conference because I am co-author of the following paper (which was presented by the first author):

Win, K. N., Chen, J., Xiao, G., Chen, Y., Fournier-Viger (2019) A Parallel Crime Activity Clustering Algorithm based on Apache Spark Cloud Computing Platform. Proc. of 21st IEEE Conferences on High Performance Computing and Communications (HPCC-2019). to appear.

This paper is about analyzing criminal activity data to discover interesting patterns (fuzzy clusters). The proposed algorithm is implemented on Apache Spark.

Conclusion

This was a brief report about the HPCC 2019 conference. It is a medium-sized conference (I would guess about 400 persons including the two colocated conferences), with many parallel sessions. The highlight of the conference was for me the keynotes, which were given by some good researchers. The conference proceedings is published by IEEE and included in the EI index, which is interesting. The location of the conference in Zhangjiajie, China was also great. There is a nice national park.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Conference | Tagged , , , | Leave a comment

What are the milestones in the career of an academic researcher?

Today, I will talk about the different milestones that a researcher may meet during his career. I will start from the first stage, which is graduate studies until reaching the stage of being a permanent researcher working at a research institution or being a well-known researcher. I will give some advices about what is important at each stage of the career of a researcher.

Stage 1: Graduate student

The first stage is graduate studies. The goal of the master degree is to learn how to do research, by joining a research team. At that stage, one should learn how to read research papers about state of the art research,  develop ideas to solve some research problems, develop a solution, carry experiments, and write papers.

During the master degree, the supervisor usually guide the student and help him with some of the tasks (e.g. writing a paper). This is different from doing a PhD, where a student should do more tasks by himself. After completing PhD, one should be an autonomous researcher. It means that someone who has completed a PhD should be able to find interesting research problems by himself (without help from others) and to perform all other steps of a research project by himself.

Normally a graduate student will initially need much help to do research. But after completing a few projects and writing papers, one will become more and more efficient and autonomous. It is important to have that as a goal.

What one should focus on during graduate school?

  • learn to write well research papers (writing is a key skill for a researcher), 
  • publish several papers, and at least some in good conferences and journals (to convince other people of your research ability and then land a researcher job),
  • learn to find research problems and develop original research solutions,
  • improve your presentation skills (not only to present papers at conferences but because researchers who will work as lecturers or professors will be expected to teach well),
  • try to obtain grants and prizes during studies,
  • try to build a network of contacts in academia and have collaborations with other students or researchers,
  • try to publish some papers that may obtain citations (because citation count is sometimes considered as a performance indicator),
  • try to have some teaching experience such as teaching an undergraduate course, or being a teaching assistant,
  • try to have good grades (although this is less important than having good research output),
  • learn other useful research related skills such as finding papers online, using LaTeX for writing papers (especially for science papers), managing time well,
  • learn to identify limitations and weaknesses in the research of others when reading a paper or attending a presentation,
  • try to always ask at least one question when attending a presentation,
  • try to be involved in reviewing papers and other important academic activities.

Stage 2: Postoctoral researcher

Many persons become a postdoctoral researcher after doing the PhD. Such position may be for one or two years and sometimes more, with usually the goal of then obtaining a position of professor or lecturer, or working in the industry.

Why doing a postdoc? It gives the opportunity of exploring new research topics, that are often different from the PhD, and to write more papers further improve research skills, and gives some extra time to find a job. A postdoc will also generally be done with a research team that is not the same as that of the PhD, and sometimes even in another country. This allows to learn other ways of doing research and to build contact with other researchers.

What one should focus on during a postdoc?

  • Find a good team,
  • Write quality papers,
  • Be almost autonomous in finding research problems and doing research,
  • Try to participate in the research of other team members or researchers and perhaps even unofficially cosupervise students,
  • Try participating in funding applications,
  • Work on projects that will lead to papers in a relatively short time and have relatively low chance of failure as a postdoc is often short and may need to show results to then apply for other jobs,
  • Don’t be a postdoc for too many years (ideally no more than two years) as more than that may be considered negative in some fields.

Step 3: Faculty member / researcher

The next stage for an academic researcher is usually to become a faculty member or professional researcher, that is to work for a university or research center and perform research and perhaps also teach.

There are different ranks for faculty members in universities, which depends on the countries. In north america and China some typical ranks in a university  are lecturer, assistant professor, associate professor and professor (also called full professor). Sometimes there are also some honorific ranks such as distinguished professor. Typically, the rank of lecturer consists of only teaching (no research), while the lowest rank that consists of doing research and teaching is assistant professor.

The goal of a new faculty member should be to climb ranks by:

  • Creating a research program that spans over several years with a long-term vision (different from a graduate student that typically do not think more far than a paper at a time).
  • Writing research proposals that obtain significant research funding,
  • Writing high quality papers that have a significant impact,
  • Being an excellent teacher,
  • Obtaining awards, getting involved in international committees,
  • Supervising graduate students successfully, and learning to manage a team,
  • Having international collaborations and industry collaborations,
  • Being involved in university affairs,
  • Having other activities such as publishing books, organizing workshops, conferences, and being a journal editor.

Several young faculty members have problems developing a long term research plan, and/or are still having difficulty finding good research problems. This lead to the inability of obtaining research funding and publishing good papers, and is often caused by not learning to become autonomous during  the PhD. It is thus important to develop these skills as early as possible during one’s career. If one is unable to have a research plan or obtain funding, he may not be promoted and may even not have his contract renewed. I have seen this several times.

Besides climbing the ranks, one may aim at becoming influential and well-known in his field. This requires the same goals but to put extra effort and to work strategically to obtain this goal.

For young faculty members, the most critical period is the first three to five years, where one needs to prove himself to become permanent or be promoted. This requires a huge amount of work because one not only need to prepare new courses as a new faculty member, but also to teach and do well in terms of research.

Conclusion

This post has given an overview of the main steps in the career of an academic researcher. Hope it was interesting. If you have comments and think that I have missed something important, please post a comment below. I will be happy to read it.


Philippe Fournier-Viger is a full professor working in China and founder of the SPMF open source data mining software.

Posted in Academia, Research | Tagged , , , | 1 Comment