I have seen many people asking for help in data mining forums and on other websites about how to choose a good thesis topic in data mining. Therefore, in this this post, I will address this question.
The first thing to consider is whether you want to design/improve data mining techniques, apply data mining techniques or do both. Personally, I think that designing or improving data mining techniques is more challenging than using already existing techniques. Moreover, you can make a more fundamental contribution if you work on improving data mining techniques instead of just applying them. However, you need to be aware that improving data mining techniques may require better algorithmic and/or mathematics skills.
The second thing to consider is what kind of techniques you want to apply or design/improve? Data mining is a broad field consisting of many techniques such as neural networks, association rule mining algorithms, clustering and outlier detection techniques. You should try to get some overview of the different techniques to see what you are the most interested in. To get a rough overview of the field, you could read some introduction books on data mining such as the book by Tan, Steinbach & Kumar (Introduction to data mining) or read websites and articles related to data mining (such as posts on this data mining blog!). If your goal is just to apply data mining techniques to achieve some other purpose (e.g. analysing cancer data) but you don’t know which one yet, you could skip this question.
The third thing to consider is which problems you want to solve or what you want to improve. This requires more thoughts. A good way is to look at recent good data mining conferences (KDD, ICDM, PKDD, PAKDD, ADMA, DAWAK, etc.) and journals (TKDE, TKDD, KAIS, etc.), or to attend conferences, if possible, and talk with other researchers. This helps to see what are the current popular topics and what kind of problems researchers are currently trying to solve. It does not mean that you need to work on the most popular topic. Working on a popular topic (e.g. social network mining) has several advantages. It is easier to get grants or in some case to get your papers accepted in special issues, workshops, etc. However, there are also some “older” topics that are also interesting even if they are not the current flavor of the day. Actually, the most important is that you find a topic that you like and will enjoy working on it for perhaps a few years of your life. Also remember that even if a topic has never been done, it does not mean that it is a good topic. Sometimes, a topic has never been done simply because it should not be done (for example, it may have no real applications or rely on assumptions that are unrealistic)! Finding a good problem to work on can require to read several articles to understand what are the limitations of current techniques and decide what can be improved. So don’t worry. It is normal that it takes time to find a more specific topic.
Fourth, one should not forget that helping to choose a thesis topic is also the job of the professor that supervise the Master or Ph.D Students. Therefore, if you are looking for a thesis topic, it is good to talk with your supervisor and ask for suggestions. He should help you. If you don’t have a supervisor yet, then try to get a rough idea of what you like, and try to meet/discuss with professors that could become your supervisors. Some of them will perhaps have some research projects and ideas that they could give you if you work with them. Choosing a supervisor is a very important and strategic decision that every graduate student has to make. For more information about choosing a supervisor, you can read this post : How to choose a research advisor for M.Sc. / Ph.D ?
Lastly, I would like to discuss the common question “please give me a Ph.D. topic in data mining“, that I read on websites and that I sometimes receive in my e-mails. There are two problems with this question. The first problem is that it is too general. As mentioned, data mining is a very broad field. For example, I could suggest you some very specific topics such as detecting outliers in imbalanced stock market data or to optimize the memory efficiency of subgraph mining algorithms for community detection in social networks. But will you like it? It is best to choose something by yourself that you like. The second problem with the above question is that choosing a topic is the work that a researcher should do or learn to do. In fact, in research, it is equally important to be able to find a good research problem as it is to find a good solution. Therefore, I highly recommend to try to find a research topic by yourself, as it is important to develop this skill to become a successful researcher. If you are a student, when searching for a topic, you can ask your research advisor to guide you.
Also, just for fun, here is a Ph.D thesis title generator.
If you like this blog, you can subscribe to the RSS Feed or my Twitter account (https://twitter.com/philfv) to get notified about future blog posts.