This week, I have attended the 7th Big Data Analytics conference (BDA 2019), which was held in Ahmedabad, India from the 17th to 20th December 2019. This was a great event with good keynote speeches, invited talks, research papers, tutorials, a workshop on IT for agriculture, a panel and social activities. In this blog post, I will give a brief report about the conference.
The Big Data Analytics (BDA) conference
The BDA conference is an international conference about Big Data Analytics, Data Mining, Machine Learning and related topic. This year is the 7th edition of the conference. BDA is held every year in different cities of India but it attracts papers from several countries. This year, authors from 13 countries published papers, and the program committee, invited talks and keynote speeches comprised experts from numerous countries, as well as local experts. There was about 150 to 200 persons attending the conference.
The proceedings of the Big Data Analytics (BDA 2019) conference are published by Springer in the LNCS (Lecture Notes in Computer Science) series, which ensures a good visibility to the published papers. The papers are indexed by EI, DBLP and other major indexes for computer science. This is the proceedings book, which is available electronically to attendees:
It was a pleasure for me to work as Program Committee co-chair for the conference to help select papers and build the program. This year, there was about 53 submissions, from which 13 were selected for publication (an acceptance rate of about 25%), and five invited papers were also published, for a total of 18 papers. The idea of having invited papers from top researchers was a good one, as it brought some really good papers.
Location of the BDA 2019 conference
The conference was held at Ahmedabad University. It is a relatively new university (10 years old). The university is located in the city of Ahmedabad, in the state of Gujurat, India.
Ahmedabad is famous for being a place where Mahatma Gandhi had lived, among other things. It also has some historical buildings and structures in and around the city, that are quite interesting. People living in this city are mostly vegetarian, and in that state, all alcohol is prohibited (unlike in other parts of India). There is also some local language spoken by the population. It was interesting to visit the city.
The local organization was very well done. Everything was well arranged. For example, an airport pickup service was offered to all international attendees, and e-mails were always answered very quickly by local organizers.
Day 1. Registration
On the first day, I registered and received a nice bag with a pen, notebook, schedule and other things inside.
The conference badges offered by the conference are of good quality. They are made of a wood-like material where names and affiliations appear to have been etched into the material.
Day 1. Tutorial and Workshop on IT in Agriculture
On the first day of the conference, there was tutorials. Moreover, there was a workshop on IT in agriculture. I listened to the keynote by Prof. P. Krishna Reddy, which was quite interesting. It talked about how he has developed computer systems to provide advices to farmers in India, in various projects for more than 10 years. This is interesting as it is not just theory but has real practical applications that can change life of many people.
Day 2, 3, 4 – Paper presentations
The paper presentations were quite interesting. I will not report about the details of each paper. But the paper covered a wide range of topics from pattern mining, information extraction, online review helpfulness prediction, urban tree type classification to data warehousing.
As I am a researcher working on pattern mining, I am particularly interested by this topic. There was three papers on pattern mining:
- Fournier-Viger, P., Cheng, C., Lin, J. C.-W., Yun, U., Iran, U. (2019). TKG: Efficient Mining of Top-K Frequent Subgraphs. Proc. of 7th Intern. Conf. on Big Data Analytics (BDA 2019), Springer, 20 pages, pp. 209-226. [ppt] [source code] (this is my paper, it presents a new algorithm for finding frequent subgraphs in a set of graphs)
- Duong, H., Truong, T., Le, B., Fournier-Viger, P. (2019). An Explicit Relationship between Sequential Patterns and their Concise Representations. Proc. of 7th Intern. Conf. on Big Data Analytics (BDA 2019), Springer, pp. 341-361.
(this is a paper about a new way of finding frequent sequential patterns using generator and closed sequential patterns).
- P. P. C. Reddy, R. Uday Kiran, Koji Zettsu, Masashi Toyoda, P. Krishna Reddy, Masaru Kitsuregawa: Discovering Spatial High Utility Frequent Itemsets in Spatiotemporal Databases. 287-306
(this is a paper about extending high utility itemset mining for spatial data)
Day 2 – Cultural performance and reception
On the evening of the second day, there was a music and dance show, performed by students of the Ahmedabad University. Although students may not be professional, the show was quite good. It presented some traditional dances and Indian songs. The show was followed by a dinner.
Day 3 – Panel: Big Data Analytics is not AI
On the third day, there was a panel titled “Big Data Analytics is not AI” that has sparked a lot of discussion, organized by Anirban Mondhal. I was one of the panel members, along with Goce Trajcevski, Shashi Shekhar, Ladjel Bellatreche, Sanjay Madrias and others. Here is a picture (some panel members not shown):
The topic was the relationship between machine learning and big data analytics. Four questions were asked to panel members, and then the audience asked additional questions.
- Should CS students learn theory and skills related to both BDA and ML?
My answer: Artificial intelligence and big data analytics are popular. It is thus good for students to at least become familiar with these topics. Moreover, if one wants to become user of these techniques, he should not only learn how to utilize the many libraries available that are easy to use but also understand the theory, and the assumptions behind these techniques. This is important because if one does not understand the assumptions or theory behind these techniques, one may apply them wrongly. Also, before learning big data analytics and machine learning, it is better to have a strong foundation about the core concepts behind those such as databases, linear algebra and statistics.
- Should researchers work across both BDA and ML or specialize in any one of these areas?
My answer: As researchers, we always tend to specialize in some area. This is reasonable because we are expected to publish state-of-the-art research, which requires to know well research in a given field. Having said that, I would like to talk about the relationship between big data analytics and machine learning. Generally, the goal of artificial intelligence is to build some software that can perform some task(s) that are said to require intelligence. On the other hand, the goal of big data analytics or data mining is to discover some useful information or build some useful models from data to understand the past or predict the future. Thus, artificial intelligence and big data analytics have different goals. The main one is that many techniques from artificial intelligence require data to train models. The artificial intelligence techniques that are not explicitly programmed but instead learn from data are called machine learning. The requirements for cleaning, preparing, transforming, storing and handling data may be the same as big data analytics. But there exists some artificial intelligence techniques that do not require training data. For example, this is the case of some traditional AI techniques such as theorem provers, path planners and logic reasoners. There are also some differences between machine learning and big data analytics. An important one is that machine learning tends to focus on building models that do something well or are accurate but are often black boxes (a model works, but the user don’t know why or how the model do predictions – this is the case of many deep learning models for example). On the contrary, many big data analytics techniques focus on discovering interpretable insights and on the visualization of results. For AI researchers, there is a lot to learn from data science/data mining about building explainable and interpretable models. But also, it is to be said that machine learning and big data analytics/data mining are also some fields that are overlapping. Some techniques such as neural networks can be said to belong to both machine learning and big data analytics.
- In the future, will the industry have separate roles for BDA and ML specialists?
My answer: In the industry, it depends on the size of the company. Bigger companies tend to have persons doing more specialized tasks, while smaller companies may have persons doing many tasks. Recently, it has been interesting to see on some website like LinkedIn that many specialized job titles have been proposed such as: •Data scientist •Data engineer •Data architect •Data developer •Data analysist •Data warehouse software engineer •Database engineer •Statistician •Business analysis •Machine learning engineer •Predictive modeler…
I personally don’t know very clearly the differences between all these job titles, and I often see contradictory definitions about these job titles.
- From a long-term perspective, do you see BDA and ML converging as a single research area or will they grow independently?
My answer: No. As I said previously, big data analytics and machine learning have many things in common but also some different goals. Besides, in academia, there exists some communities that are clearly defined such as statisticians, data mining, machine learning, and researchers tend to stay in their field and publish in the journals and conferences of their community. It would take some time and major effort to redefine these communities.
Day 3 – Banquet
On the evening of the third day, there was a banquet outside. There were some tables serving Indian food and some chairs for those who wanted to sit. Others would eat standing and talk with others. As always, banquets are good for networking with other researchers. I had some good discussions with friends and met some other international and local researchers. Moreover, I was happy to talk with some local students who attended the conference and asked me some questions about how to learn about data science and machine learning. Besides, I was happy to meet some professors from some local universities who told me that they were using my SPMF data mining software for teaching data mining.
Here is a group photo of BDA attendees:
Next year: BDA 2020
Next year, the BDA 2020 conference will be held in New Dehli, India. Then, BDA 2021 will be held in Allahabad, India.
In this blog post, I have given a brief report about the 7th Big Data Analytics conference (BDA 2019), from my perspective. On overall, it was a great conference, and I am very happy to have attended it. It was the first time that I went to India, and it has been a good experience. The quality of papers was quite high, and the invited speakers, tutorials and keynote speeches were very interesting. I will try to attend it again next year.
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.