3 DATA SCIENCE PREDICTIONS FOR 2021

Every new year people try to think of it as a blank slate to work with. But the start of 2021 will feel in many ways as a continuation of the year we’ve just left behind. It’s impossible to think of a blank slate as the transition into a brave new world started the minute the COVID pandemic struck us all worldwide. We are not talking only about a new year that has come, but about a new way of doing things that will impact pretty much every aspect of our lives. Data Science included.

How so? Business will need to adapt fast. Remote working, sustainability and optimizing work processes – and their impact on ROI- will take a preponderant role into how companies will evolve into the “new normal”. And Data Science is going to become a key instrument into this evolution.

During my career I’ve overseen the implementation of Data Science into different companies and experienced first hand how it can impact the way they operate. For the last decade most companies have been investing in Data Science initiatives, learning how it can benefit them and taking the initial steps to incorporating this set of technologies and practices into their organization. It was the exploratory era of Data Science for companies outside of the Big Tech/A.I. bubble. But starting in 2021, with increased competition and trends accelerated by (but predating the) pandemic, the efficient application of Data Science practices and incorporation of Machine Learning will become a competitive necessity. 

With that in mind, here are three trends that will drive Data Science adoption in 2021.

1 – MACHINE LEARNING IN THE CLOUD: FROM DEVELOPMENT TO DEPLOYMENT

Traditionally a Data Scientist (DS) or Machine Learning Engineer (MLE) would develop everything locally and just then would upload their work to the cloud for deployment. But that model its limited to smaller samples of data and a small training search space due to the local resources the developer has. It introduces issues around data privacy. While also presenting barriers for effective collaboration and automation of workflows (and a seamless transition between development and production environments).

Taking the whole process to the cloud will make it easier for companies to work on an integrated environment that would shorten the steps from development to deployment. It will increase reproducibility through automation and bring better security guarantees since data doesn’t leave the cluster and sets the stage for better collaboration across teams

Tools like MLFlow, KubeFlow and Snorkel are going to become more widespread as a way to glue together the DS workflow in the cloud.

2 – LANGUAGE MODEL ADOPTION INCREASES

Since the creation of BERT (the transformer-based machine learning technique for NLP developed by Google, not the Muppets character) and the widespread adoption of the transformers architecture in Deep Learning, there have been a huge advance in language models and it’s potential implementations for business. 

A good example would be that by October 2020 almost every single English-based Google query was processed by BERT. And we are not just talking about indexing the words inputted by users. The BERT model, and it’s successors, not only understands the words but also it’s context. Being able to predict not just the next word, but giving a compass -if not a map- into predicting user’s next movements, conducts and needs. 

There’s also the new possibilities that the GPT-3 model brings. With a quality of text generation so high that it’s difficult to distinguish it from that written by a human. The new ways of automatized interactions with users via text that this presents will become a key asset in customer experiences and the general way into which people interacts with companies (going from next gen chatbots, improving text classification and tasks like Q&A and text summarisation). It’s awe inspiring, but also a little scary when you start thinking about it. 

The possibilities that this type of models  bring to the table are endless when being applied to -pretty much- every single business need that we could have. Even more, it wouldn’t be science fiction material for those same systems to predict what those same business needs will be.  

There isn’t a company that doesn’t work with huge amounts of text, and so that wouldn’t benefit from exploring the potential applications that this sophisticated language models allow.

3 – DATA ENGINEERING WILL BE KING

Everything is pointing to this natural conclusion: Data Engineering will be of utmost importance in order for companies to be able to implement and sustain their Data Science efforts, in a scalable, resilient way. 

If companies want to get ahead and start taking advantage of these new technologies they will have to create a talent pipeline to acquire qualified Data Engineers that will not just be implementing pre-made solutions,  but also will be applying creative thought to develop custom products for their specific needs. Bringing the full use of ML into each business process.


It’s a new year, a new set of rules and companies will need this “new” profile to keep themselves at the top of the game.

2 responses to “3 DATA SCIENCE PREDICTIONS FOR 2021”

  1. Hey Juan, I think this is right on the money and my one take away here is ‘more data engineers!’ Great blog. Keep posting.

    Liked by 1 person

Leave a comment