The Modern Data Scientist at Netflix: Modeling and Tools in Unstable Environments
This article is based on a recent podcast from Data Science in 30 Minutes. Listen to it here.
By Cali Frisbie, Pragmatic Institute
Netflix is not only one of the most recognized names in the world, it’s also one of the most recognized names in data science and Dr. Becky Tucker knows all too well about the power of data and what it can tell us. Whether it be recommendations for our next big binge watch, or how people will most likely react in a crisis, data is one of the most valuable tools available when developing strategy, making decisions and deciphering your next move.
In times like these, uncertainty and instability is everywhere. It seems as though fear, worry and despair have taken the reins and rationality and sound judgement are nowhere to be found. But this is where data comes in. What can we learn in these unstable environments and how can we pivot according to what these data models are telling us?
“Models are an abstraction of the world that we hope tells us something useful. Could be a prediction or an inference of how we think the world works” Dr. Becky explains. This is where the importance of environment comes into play. “If your model is an abstraction of a real-world problem, then your model is only as good as the data you can put into it.”
Data is directly produced by environment so it’s imperative when dealing with unstable circumstances such as COVID-19, that we first detect and recognize that instability and then look deeper into our training and testing sets to be sure there is nothing missing.
But that doesn’t mean we still won’t be seeing unorthodox results or shifts.
“Everyone’s data is being affected by quarantine, whether it be retail data, traffic pattern data, grocer’s data (think toilet paper). Even changes in behaviors at home, increase in Netflix (wink, wink), increase in home exercise equipment and fitness. These shifts are a direct result of the current environment people are finding themselves in and is a reflection of the actions they are now taking because of it.
We can’t see the future or predict something unprecedented, but what data can do is expose gaps and help us identify those gaps and use their findings to our advantage. Especially in unstable environments, the role of domain knowledge becomes increasingly more important because your ultimate goal is to create an abstraction.
It’s knowing where you can take shortcuts and where you can’t that will ultimately define the quality of your data findings and help to best predict the next move in these continuing times of uncertainty.
Listen to the full podcast with Dr. Becky Tucker, as well as other episodes of Data Science in 30 Minutes, here.