Previously posted on July 16, 2015, on The Data Incubator.
There’s a lot of good discussion about technology in big data, but not enough informed discussion about the talent in the field. We usually spend more time thinking about how to optimize our MapReduce jobs than we do thinking about how to motivate the data scientists who write them. We often use the term “data scientist” to encompass two very different types of roles: data scientists who produce analytics for humans, and data scientists who produce analytics for machines. It’s an important distinction, especially because the backgrounds and skill sets necessary for success in these two roles are quite different.
Lately, I have been seeing increasing awareness among employers of the importance of understanding data science and this division within the data science role. This certainly isn’t the only distinction among data scientists, but when it comes to formulating a successful big data strategy, it’s the most significant.
Here’s the difference and the kinds of backgrounds and motivations an employer can expect to look for in each type of data scientist.
Analytics for Humans
In the case of data scientists who produce analytics for humans, another human is the final decision maker and consumer of the analysis. This type of data scientist often has to deliver a report on her findings and answer questions like what groups are using a product or what factors are driving user growth and retention.
Though they may sift through the same data sets as their analytics-for-machines counterparts, this type of data scientist delivers the results of their models and predictions to another human, who makes business or product decisions based upon these recommendations. Often, that decision-maker is not a data scientist, so the data scientist must be able to explain her results in a non-technical way, which introduces an additional layer of complexity to the job.
The need to explain implies that the data scientist might deliberately choose more basic models over more accurate but overly complex ones. Data scientists also must be comfortable coming to higher-level conclusions – the “why” and “how” – that are a step removed from the raw data.
A typical background for this kind of role is that of a social or medical scientist (often at the Ph.D. level). They are trained to ask the deeper questions (the “how” and “why”), making them better suited to produce analytics for humans. They are often trained to employ “simple” models and convey the results to those without deep technical understanding, like management or sales. Data scientists with these sorts of backgrounds frequently thrive on the intellectual challenge of explaining a model to another human and drawing clarity from obscure data. They also love seeing the direct impact of decision-making at their organization.
Analytics for Machines
The other major division of data scientist is those who produce analytics for machines. In this instance, the final decision maker and consumer of the analysis is a computer. These data scientists build highly complex models that ingest vast data sets and try to extract subtle signals using machine learning and sophisticated algorithms. They tend to work in areas like algorithmic trading, online content/advertising targeting, or personalized product recommendations, to name a few. Their digital models are established and then act on their own, making recommendations, choosing ads to display, or automatically trading in the stock market.
Data scientists who produce analytics for computers must have remarkably strong mathematical, computational, and statistical skills to construct models that can make quality predictions quickly. They can piece together an array of technical tricks in order to create sophisticated models that squeeze out the last drop of performance and typically operate with easily measurable, unambiguous metrics from management such as clicks, profits, and purchases. Their value lies in leveraging their technical virtuosity over millions of situations where even small gains aggregated across millions of users and trillions of events can lead to huge wins.
Data scientists who produce analytics for machines often have mathematics, natural science, or engineering backgrounds (again, often at the Ph.D. level) with the deep computational and mathematical knowledge necessary to do the high-powered work. They also have strong software engineering backgrounds that enable them to build robust large-scale systems to deploy their analyses. They thrive on the technical challenge of building these large-scale, complex systems.
Why the Distinction Matters
It’s rare to find someone who is well-suited for both roles, so employers would do well to figure out which role they need. An MIT-trained physicist hungry for a deep machine-learning challenge likely would not be the best fit for a role in which their models must be “simple” enough for management to understand. She also may not be as comfortable extrapolating the “why” and “how” from the data. Likewise, a Harvard-trained social scientist might be great for explaining and drawing deeper conclusions from data, but may not be as well suited to produce analytics for machines. If he lacks the necessary deep mathematical and computational skills, he may not be able to build robust systems or may engineer simplistic models that fail to capture the data’s full value.
Understanding your data science team – what makes them tick, what drives them up the wall – is just as important to the success of a big data strategy as understanding your technology stack. It’s important to figure out what you really need from a data scientist so that you can determine which backgrounds and temperaments would be best suited to getting the job done.