Skills needed for a data scientists? (comments on the HBR article)

Recently, I have read an article of the Harvard Business Review (HBR) website about data sciences skills for businesses. This article proposes to categorize skills related to data on a 2×2 matrix where skills are labelled as useful VS not useful, and time-consuming VS not time-consuming. The author of that article has drawn such a 2×2 matrix illustrating the needs of his team (see below).

Obtained from Harvard Business Review

This matrix has received many negative comments online, in the last few days. These comments have mainly highlighted two problems:

  • Why mathematics and statistics are viewed as useless?
  • Data science is viewed as useful but mathematics and statistics are viewed as useless, which is strange since math and stats are part of data science.

Having said that, I also don’t like this chart. And many people have asked why it is published in Harvard Business Review (a good magazine). But  we should keep in mind that this chart illustrates the needs of a company. Thus, it does not claim that mathematics and statistics are useless for everyone. It is quite possible that this company does not see any benefits in taking mathematics and statistics courses or training. Following the negative comments, the author and editor of HBR have reworded some parts of the article to try  to make clearer that this should be interpreted as a case study.

A part of the problem related to this chart and article is that the term “data science” has always been very ambiguous. Some people with very different backgrounds and doing very different things call themselves data scientists. This is a reason why I usually don’t use this term. And it could be a part of the reason why this chart shows a distinction between data science, math and stats, which I would describe as overlapping.

From a more abstract perspective, this article highlights that some companies are not interested into investing into skills that takes too much time to acquire (have no short-term benefits).  For example, I know that some companies prefer to use code from open-source projects or ready-made tools to analyze data rather than spending time to develop customized tools to solve problems. This is understandable as the goal of companies is to earn money and there are many tools available for data analysis.  However, one should not forget that using these tools often requires to possess an appropriate background in mathematics, statistics or computer science to choose an appropriate model given its assumptions and correctly interpret the results. Thus having those skills that take more times to acquire is also important.

What is your opinion about this chart and the most important skills for a data science?  Please share your opinion in the comment section below.

Philippe Fournier-Viger is a professor of Computer Science and also the founder of the open-source data mining software SPMF, offering more than 150 data mining algorithms.

This entry was posted in Big data, Data science and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *