The top journals and conferences in data mining / data science

A key question for data mining and data science researchers is to know what are the top journals and conferences in the field, since it is always best to publish in the most popular journals or conferences. In this blog post, I will look at four different rankings of data mining journals and conferences based on different criteria, and discuss these rankings.

TOPDM

1) The Google Ranking of data mining and analysis journals and conferences

A first ranking is the Google Scholar Ranking (https://scholar.google.com/citations?view_op=top_venues&hl=en&vq=eng_datamininganalysis). This ranking is automatically generated based on the H5 index measure.  The H5 index measure is described as “the h-index for articles published in the last 5 complete years. It is the largest number h such that h articles published in 2011-2015 have at least h citations each“. The ranking of the top 20 conferences and journals is the following:

  Publication h5-index h5-median Type
1 ACM SIGKDD International Conference on Knowledge discovery and data mining 67 98 Conference
2 IEEE Transactions on Knowledge and Data Engineering 66 111 Journal
3 ACM International Conference on Web Search and Data Mining 58 94 Conference
4 IEEE International Conference on Data Mining (ICDM) 39 64 Conference
5 Knowledge and Information Systems (KAIS) 38 52 Journal
6 ACM Transactions on Intelligent Systems and Technology (TIST) 37 68 Journal
7 ACM Conference on Recommender Systems 35 64 Conference
8 SIAM International Conference on Data Mining (SDM) 35 54 Conference
9 Data Mining and Knowledge Discovery 33 57 Journal
10 Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 30 56 Journal
11 European Conference on Machine Learning and Knowledge Discovery in Databases (PKDD) 30 36 Conference
12 Social Network Analysis and Mining 26 37 Journal
13 ACM Transactions on Knowledge Discovery from Data (TKDD) 23 39 Journal
14 International Conference on Artificial Intelligence and Statistics 23 29 Conference
15 Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 22 30 Conference
16 IEEE International Conference on Big Data 18 30 Conference
17 Advances in Data Analysis and Classification 18 25 Journal
18 Statistical Analysis and Data Mining 17 30 Journal
19 BioData Mining 17 25 Journal
20 Intelligent Data Analysis 16 21 Journal

 

Some interesting observations can be made from this ranking:

  • It shows that some conferences in the field of data mining actually have a higher impact than some journals. For example, the well-known KDD conference is ranked higher than all journals.
  • It appears strange that the KAIS journal is ranked higher than DMKD/DAMI and the TKDD journals, which are often regarded as better journals than KAIS. However, it may be that the field is evolving, and that KAIS has really improved over the years.

2) The Microsoft Ranking of data mining conferences

Another automatically generated ranking is the Microsoft ranking of data mining conferences (http://academic.research.microsoft.com/RankList?entitytype=3&topdomainid=2&subdomainid=7&orderby=1). This ranking is based on the number of publications and citations.  Contrarily to Google, Microsoft has separated rankings for conferences and journals.

Besides, Microsoft offers two metrics for ranking :  the number of citations and the Field Rating. It is not very clear how the “field rating” is calculated by Microsoft. The Microsoft help center describes it as follows: “The Field Rating is similar to h-index in that it calculates the number of publications by an author and the distribution of citations to the publications. Field rating only calculates publications and citations within a specific field and shows the impact of the scholar or journal within that specific field“.

Here is the conference ranking of the top 30 conferences by citations:

Rank Conference Publications Citations
1 KDD – Knowledge Discovery and Data Mining 2063 69270
2 ICDE – International Conference on Data Engineering 4012 67386
3 CIKM – International Conference on Information and Knowledge Management 2636 28621
4 ICDM – IEEE International Conference on Data Mining 2506 18362
5 SDM – SIAM International Conference on Data Mining 708 9095
6 PKDD – Principles of Data Mining and Knowledge Discovery 994 8875
7 PAKDD – Pacific-Asia Conference on Knowledge Discovery and Data Mining 1255 6400
8 DASFAA – Database Systems for Advanced Applications 1251 4001
9 RIAO – Recherche d’Information Assistee par Ordinateur 574 3551
10 DMKD / DAMI – Research Issues on Data Mining and Knowledge Discovery 103 3264
11 DaWaK – Data Warehousing and Knowledge Discovery 503 2648
12 DS – Discovery Science 553 2256
13 Fuzzy Systems and Knowledge Discovery 4626 2171
14 DOLAP – International Workshop on Data Warehousing and OLAP 177 1830
15 IDEAL – Intelligent Data Engineering and Automated Learning 1032 1789
16 WSDM – Web Search and Data Mining 196 1499
17 GRC – IEEE International Conference on Granular Computing 1351 1434
18 ICWSM – International Conference on Weblogs and Social Media 238 1142
19 DMDW – Design and Management of Data Warehouses 70 993
20 FIMI – Workshop on Frequent Itemset Mining Implementations 32 849
21 MLDM – Machine Learning and Data Mining in Pattern Recognition 313 822
22 PJW – Workshop on Persistence and Java 41 712
23 ADMA – Advanced Data Mining and Applications 562 676
24 ICETET – International Conference on Emerging Trends in Engineering & Technology 712 376
25 WKDD – Workshop on Knowledge Discovery and Data Mining 527 342
26 KDID – International Workshop on Knowledge Discovery in Inductive Databases 70 328
27 ICDM – Industrial Conference on Data Mining 304 323
28 DMIN – Int. Conf. on Data Mining 434 278
29 MineNet – Mining Network Data 22 278
30 WebMine – Workshop on Web Mining 15 245

And here is the top 30 conferences by Field rating:

Rank Conference Publications Field Rating
1 KDD – Knowledge Discovery and Data Mining 2063 122
2 ICDE – International Conference on Data Engineering 4012 104
3 CIKM – International Conference on Information and Knowledge Management 2636 67
4 ICDM – IEEE International Conference on Data Mining 2506 56
5 SDM – SIAM International Conference on Data Mining 708 45
6 PKDD – Principles of Data Mining and Knowledge Discovery 994 40
7 PAKDD – Pacific-Asia Conference on Knowledge Discovery and Data Mining 1255 33
8 RIAO – Recherche d’Information Assistee par Ordinateur 574 28
9 DMKD / DAMI – Research Issues on Data Mining and Knowledge Discovery 103 27
10 DASFAA – Database Systems for Advanced Applications 1251 26
11 DaWaK – Data Warehousing and Knowledge Discovery 503 22
12 DOLAP – International Workshop on Data Warehousing and OLAP 177 22
13 DS – Discovery Science 553 20
14 ICWSM – International Conference on Weblogs and Social Media 238 19
15 WSDM – Web Search and Data Mining 196 19
16 DMDW – Design and Management of Data Warehouses 70 19
17 PJW – Workshop on Persistence and Java 41 16
18 FIMI – Workshop on Frequent Itemset Mining Implementations 32 14
19 GRC – IEEE International Conference on Granular Computing 1351 13
20 IDEAL – Intelligent Data Engineering and Automated Learning 1032 13
21 MLDM – Machine Learning and Data Mining in Pattern Recognition 313 13
22 Fuzzy Systems and Knowledge Discovery 4626 11
23 ADMA – Advanced Data Mining and Applications 562 10
24 KDID – International Workshop on Knowledge Discovery in Inductive Databases 70 10
25 ICDM – Industrial Conference on Data Mining 304 9
26 MineNet – Mining Network Data 22 9
27 ESF Exploratory Workshops 17 8
28 TSDM – Temporal, Spatial, and Spatio-Temporal Data Mining 13 8
29 ICETET – International Conference on Emerging Trends in Engineering & Technology 712 7
30 WKDD – Workshop on Knowledge Discovery and Data Mining 527 7

Some observations:

  • The ranking by citations and by field rating are quite similar.
  • The KDD conference is still the #1 conference.  This make sense, and also that the CIKM, ICDM and SDM conferences are among the top conferences in the field
  • PKDD is higher than PAKDD, which are higher than DASFAA and DAWAK as in the Google ranking, and I agree with this.
  • Some conferences were not in the Google ranking like ICDE. It may be because the Google ranking put the ICDE conference in a different category.
  • Microsoft rank DMKD / DAMI as a conference, while it is a journal.
  • The FIMI workshop is also ranked high although that workshops only occurred in 2003 and 2004. Thus, it seems that Microsoft has no restrictions on time. Actually, since the FIMI workshop was not help since 2004, it should not be in this ranking. The ranking would probably be better if Microsoft would consider only the last five years for example.

3)The Microsoft Ranking of data mining journals

Now let’s look at the top 20 data mining journals according to Microsoft, by citations.

Rank Journal Publications Citations
1 IPL – Information Processing Letters 7044 62746
 2 TKDE – IEEE Transactions on Knowledge and Data Engineering 2742 60945
3 CS&DA – Computational Statistics & Data Analysis 4524 24716
4 DATAMINE – Data Mining and Knowledge Discovery 584 19727
5 VLDB – The Vldb Journal 631 17785
6 Journal of Knowledge Management 747 9601
7 Sigkdd Explorations 491 9564
8 Journal of Classification 550 8041
9 KAIS – Knowledge and Information Systems 741 7639
10 WWW – World Wide Web 540 7182
11 INFFUS – Information Fusion 567 5617
12 IDA – Intelligent Data Analysis 477 4167
13 Transactions on Rough Sets 221 1653
14 JECR – Journal of Electronic Commerce Research 122 1577
15 TKDD – ACM Transactions on Knowledge Discovery From Data 110 716
16 IJDWM – International Journal of Data Warehousing and Mining 102 366
17 IJDMB – International Journal of Data Mining and Bioinformatics 132 256
18 IJBIDM – International Journal of Business Intelligence and Data Mining 124 251
19 Statistical Analysis and Data Mining 124 169
20 IJICT – International Journal of Information and Communication Technology 111 125

And here is the top 20 journals by Field Rating.

Rank Journal Publications Field Rating
1 TKDE – IEEE Transactions on Knowledge and Data Engineering 2742 109
 2 IPL – Information Processing Letters 7044 80
3 VLDB – The Vldb Journal 631 61
4 DATAMINE – Data Mining and Knowledge Discovery 584 57
5 Sigkdd Explorations 491 50
6 CS&DA – Computational Statistics & Data Analysis 4524 49
7 Journal of Knowledge Management 747 46
8 WWW – World Wide Web 540 42
9 Journal of Classification 550 37
10 INFFUS – Information Fusion 567 36
11 KAIS – Knowledge and Information Systems 741 33
12 IDA – Intelligent Data Analysis 477 28
13 JECR – Journal of Electronic Commerce Research 122 21
14 Transactions on Rough Sets 221 20
15 TKDD – ACM Transactions on Knowledge Discovery From Data 110 13
16 IJDMB – International Journal of Data Mining and Bioinformatics 132 8
17 IJDWM – International Journal of Data Warehousing and Mining 102 8
18 IJBIDM – International Journal of Business Intelligence and Data Mining 124 7
19 Statistical Analysis and Data Mining 124 7
20 IJICT – International Journal of Information and Communication Technology 111 5

Some observations:

  • The ranking by citations and field rating are quite similar.
  • The TKDE journal is again in the top of the ranking, just like in the Google ranking.
  • It make sense that the VLDB journal is quite high. This journal was not in the Google ranking probably because it is more a database journal than a data mining journal.
  • Sigkdd explorations is also a good journals, and it make sense to be in the list. However, I’m not sure that it should be higher than TKDD and DMKD / DAMI.
  • The KAIS journal is still ranked quite high. This time it is lower than DMKD / DAMI (unlike in the Google Ranking) but still higher than TKDD. This is quite strange. Actually, TKDD is arguably a better journal. As explained in the comment section of this blog post, a reason why KAIS is ranked so high may be because in the past, the journal has encouraged authors to cite papers from the KAIS journal. Besides, it appears that the Microsoft ranking has no restriction on time (it does not consider only the last five years for example).
  • It is also quite strange that “Intelligent Data Analysis” is ranked higher than TKDD.
  • Some journals like WWW and JECR should perhaps not be in this ranking. Although they publish data mining papers, they do not exclusively focus on data mining. And this is probably the reason why they are not in the Google ranking. On overall, the Microsoft ranking seems to be broader than the Google ranking.

4) Impact factor ranking

Now, another popular way of ranking journals is using their impact factor (IF). I have taken some of the top data mining journals above and obtained their Impact Factor from 2014/2015 or 2013, when I could not find the information for 2015. Here is the result:

Journal Impact factor
DMKD/DAMI Data Mining and Knowledge Discovery 2.714
IEEE Transactions on Knowledge and Data Engineering 2.476
Knowledge and Information Systems 1.78
VLDB – The Vldb Journal 1.57
TKDD – ACM Transactions on Knowledge Discovery From Data 1.14
Advances in Data Analysis and Classification 1.03
Intelligent Data Analysis 0.50

Some observations:

  • TKDE and DAMI/DMKD are still among the top journal
  • As in the Microsoft ranking, DAMI/DMKD is above KAIS, which is above TKDD.
  • As pointed out in the comment section of this blog post, it is strange that KAIS is so high, compared for example to TKDD, or VLDB, which is a first-tier database journal. This shows that IF is not a perfect metric.
  • Compared to the Microsoft Ranking, the IF ranking at least has the “Intelligent Data Analysis” journal much lower than TKDD.  This make sense, as TKDD is a better journal.

Conclusion

In this blog post, we have looked at three different rankings of data mining journals and conferences:  the Microsoft ranking, the Google ranking, and the Impact Factor ranking.

All these rankings are not perfect. They are somewhat accurate but they may not always correspond to the actual reputation in the data mining field.  The Google ranking is more focused on the data mining field, while the Microsoft ranking is perhaps too broad, and seems to have no restriction on time. Also, as it can be seen by observing these rankings, different measures yield different rankings. However, there are still some clear trends in these ranking such as TKDE being ranked as one of the top journal and KDD as the top conference in all rankings. The top journals and conferences are more or less the same in each ranking.  But there are also some strange ranks such as KAIS and Intelligent Data Analysis being ranked higher than TKDD in the Microsoft ranking.

Do you agree with these rankings?  Please leave your comments below!

Update 2016-07-19:  I have updated the blog post based on the insightful comments made by Jefrey in the comment section. Thanks!

==
Philippe Fournier-Viger is a full professor  and the founder of the open-source data mining software SPMF, offering more than 110 data mining algorithms. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.

This entry was posted in Big data, Data Mining, Data science, Research. Bookmark the permalink.

3 Responses to The top journals and conferences in data mining / data science

  1. H5 (the Google measure) is not useful to assess the average quality of papers, but it is a good proxy for total impact on the field. The reason for this is that H5 does not punish for having very many publications. This kind of information should not be used to determine which journal to submit your paper to.

    Perhaps you are also aware that KAIS conducted terribly unethical behaviour in the past; they required every publication to cite at least 4-5 papers from the journal, suggesting to authors that had fewer such references that there paper was ‘not sufficiently clearly in the scope of the journal’. They were banned from ISI then, and have since improved their behaviour and were recently indexed again. It depends on how far back the data is included in the scores whether this would still be visible in the numbers above.

    For #3, the Microsoft Journal ranking, you copied the ‘field ratings’ not the citation counts. I do not know what this field rating means, maybe you could find out?

    The most recent IFs put DMKD (or DAMI, as it is typically called in the field) just ahead again of TKDE: 2.714 vs. 2.476. My subjective view is that they are both top-tier, with DAMI slightly ahead in quality. I would put SIGKDD Explorations in the same group as well, but that really includes very few papers per year (10 or even fewer). Apparently TKDE publishes considerably more papers than DAMI, so its total impact on the field is higher, hence its field rating may be appropriate. Subjectively, I would rank TKDD and KAIS simply in tier two, just like IPL, CS&DA, SADM. VLDB is not data mining, but it is absolutely top tier in databases, so its IF is apparently a bad indicator. As you probably know IF includes only citations from venues that are indexed in Thomson Reuters’ Web of Science, which includes only a fraction of computer science publications. The Microsoft and Google data is much better for analysis of computer science publication venues.

    Of course there are computer science journals that also publish data mining that are (much) more prestigious: Pattern Analysis and Machine Intelligence (PAMI), Journal of the ACM (JACM), ACM Computing Surveys (for surveys only of course). Finally, there are occasionally papers in Science and PNAS. Recent evidence: http://science.sciencemag.org/content/353/6295/163.

    Thanks for taking the time and effort to write this!

    • Hi Jefrey,

      Thanks a lot for your insightful comments. I have updated the blog post based on your comments.

      The blog post now includes both the “citations” and “field rating” rankings by Microsoft. It is not very clear how Microsoft calculate the field rating. According to their Help Center, it is somewhat similar to the H index but they don’t give the exact formula. Here is their explanation: “The field rating is similar to h-index in that it calculates the number of publications by an author and the distribution of citations to the publications. Field rating only calculates publications and citations within a specific field and shows the impact of the scholar or journal within that specific field.”

      Based on your comments, I have also found that Microsoft has no time restriction in their formula for calculating the ranking. For example, they include the FIMI workshop in their ranking, which was not held since 2004. Thus, I think that it explains the high ranking of KAIS. It would means that they are considering the all-time citations of KAIS rather than just the last few years. I was not aware of the past problems of the KAIS journal with citations. Given your explanation, I think that we have found a plausible explanation for the high ranking of KAIS.

      By looking again at the ranking, I also found quite unusual that Intelligent Data Analysis is ranked higher than TKDD! IDA is in my opinion a second or third tier journal. It should clearly not be above TKDD.

      Thanks again for your discussion on this blog. It is quite interesting to read your comments.

      Philippe

      • Hi Philippe,

        I agree with you regarding IDA and TKDD. I did not know even that the IDA journal published that many papers (>400 according to Microsoft in your tables). Anyway, the Impact Factor for IDA is 0.631 (2016) while for TKDD it is 1.0 (2015, couldn’t find 2016), which aligns with our impressions.

        Best wishes,

        Jefrey

Leave a Reply

Your email address will not be published. Required fields are marked *