I have often been asked what are some good books for learning data mining. In this blog post, I will answer this question by discussing some of the top data mining books for learning data mining and data science from a computer science perspective. These books are especially recommended for those interested in learning how to design data mining algorithms and that wants to understand the main algorithms as well as understand some more advanced topics.
- “Introduction to data mining” by Tan, Steinbach & Kumar (2006)
This book is a very good introduction book to data mining that I have enjoyed reading . It discusses all the main topics of data mining: clustering, classification, pattern mining and outlier detection. Moreover, it contains two very good chapters on clustering by Tan & Kumar, which are specialists in this domain. What I like about this book is that the chapters explain the techniques with enough details to have a good understanding of the techniques and their drawbacks unlike some other books that do not go into details. Some free sample chapters of the book can be found here. Before buying this book, note that a 3rd edition has been announced to be released soon, although it has been delayed for more than a year.
2. Data Mining: Concepts and Techniques, Third Edition by Han, Kamber & Pei (2013)
This book is another great book that I like. I have also used it for teaching data mining. It covers all the main topics of data mining that a good data mining course should covers, as the previous book. However, this book is more like an encyclopedia. It covers a lot of topics and give a very broad view of the field but does not cover each topics in much details. It is also designed for a computer scientist audience. Besides, it is also written by some top data mining researchers (Han & Pei).
3. Data Mining and Analysis Fundamental Concepts and Algorithms by Zaki & Meira (2014)
This is another great data mining book written by a leading researcher (Zaki) in the field of data mining. It also target computer scientist. This books covers all the main topics of data mining but also has some chapters on some advanced topics such as graph mining, which are very interesting. A version of the book that can be used for personal use only is offered freely here. The algorithms in this books are very detailed and it is possible to implement them by reading the book. In general, some algorithms are presented in each chapter. They are not always the best algorithms but are often the most popular (the classical algorithms).
4. Data Mining: The Textbook by Aggarwal (2015)
This is probably one of the top data mining book that I have read recently for computer scientist. It also covers the basic topics of data mining but also some advanced topics. Moreover, it is very up to date, being a very recent book. It is also written by a top data mining researcher (C. Aggarwal). It also covers many recent and advanced topics such as time series, graph mining and social network mining, not covered in several other books.
5. “The Elements of Statistical Learning” by Freidman et al (2009)
This is aquite popular book a little bit more focused toward statistics. It covers both many data mining techiques such as Neural networks, association rule mining, SVM, regression, clustering and other topics. What is interesting about this book is that it is a top book used in many university courses like the others and can be downloaded for free here.
In this blog post, I have discussed some of the top books for learning data mining algorithms for computer scientists. I have tried to discuss about general books that gives a good foundation for learning data mining and that can also be interesting for advanced topics. However, note that if one is interested in specific topics such as recommender systems and text mining, there also exists some specialized books that covers only these topics in details, that may also be interesting.
That is all I wanted to write for now. If you like this blog, you can tweet about it and/or subscribe to my twitter account @philfv to get notified about new posts.
Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 52 data mining algorithms.
how to improve the efficiency of apriori algorithm?
which algorithms are used to reduce the scanning time in it?
There are different ways to improve Apriori such as implementing apriori using a hash-tree or using transaction id lists (tids) with bit vectors. However, if you care about efficiency, you should use an algorithm such as FPGrowth which is much faster than Apriori. Apriori is an old algorithm that is very slow. It is better to use another algorithm than to optimize apriori.
Thank you for sharing your opinions on Data mining books. I totally agree that they are really useful and enjoyable for reading. I read the book of Han, Kamber & Pei when I start doing my research on Data mining. This book provides a good introduction and it covers many topics of Data mining. Recently, I read the book of Zaki and I really love this book because it is easy to follow and I can learn the Maths background used by algorithms.
Hope that you continue writing many more posts about Data mining and Machine learning.