What is MLOps and do you need it?
TLDR: immediately: no; at some point: yes
Hey, you! Thanks for being here. Don’t forget to leave some feedback — let me know if there is something more you wish to know about MLOps!
More and more companies are interested in getting ML models to production. Many companies have tried AI in the past. By looking at the clients I used to work with — many of them deployed AI to generate profits but this ended up as a very costly endeavour. In many cases MLOps approach would mitigate many risks companies were facing.
I also hear some of you that are upset with MLOps — as with any vague and flashy term, it gets misused. There are companies that want you to buy a product that you might not need (hey, doesn’t your company hire sales people to do just that, too?). And in many cases you can survive without MLOps. In my opinion though, any ML-focused, successful company must employ the principles and approaches advocated by MLOps. Let me show you why.
What is MLOps?
MLOps represents a set of tools and methodologies of managing ML models. Many people compare DevOps to MLOps and in some sense they are correct — models to MLOps are what code is to DevOps. If you think about it, you will realize that models are also represented as a code. This code can be used in many more contexts and it is useless by itself — it needs correct and valid data and it has to be put in a very specific environment. Moreover, most of the time it is built by non-engineers. In a most general sense, MLOps engineers should think about much more than the model — starting from the way people build models (to make it easy for non-engineers to work with models in a reliable way — most likely freely using Python and Notebooks), through getting the source code of a model, running CI / CD / CT pipelines against it, serving, monitoring and re-training models at a valid point in production.
And we haven’t touched Feature Stores and Model Registries yet. You see, humans often have very narrow perspective. They see a problem, they tend to simplify things. Someone says ML — people think about fancy models like various deep neural net architectures. The real problem is not to get the model code, not even to train it. To call something production-ready and actually base your revenue on something you should think about A TON of operational aspects.
This picture went viral over 6 years ago and it is getting more and more true as more and more companies try to run ML in production. Getting ML code out is just a small piece to consider in the whole landscape.
Who doesn’t need MLOps platform?
Does it mean that you always need to bother with MLOps? There are no definitive answers to almost any question in this world. If you are a small startup and you need to iterate fast, it will be unwise to spend months setting up MLOps platform to serve a single model behind the API. You could still consider using some managed solution but it might still be an overkill. I remember times at the university where I was able to spin up a fully functional ML platform with scalable API over a weekend. Of course, deploying new version of a model was a daunting, manual task that required some downtime but hey, it got the job done.
This doesn’t mean that MLOps principles wouldn’t help me. Did I think about ways to monitor my model? No. Would I benefit from doing that? Of course. Did I need a complex governance workflow? Not so much.
So if you are running a handful of models and doesn’t hire tens of Data Scientists willing to execute tens of new experiments weekly then I would suggest just improving your existing platform. Be inspired by MLOps tools — improve monitoring, try to introduce lineage, run more assertions against your production data in CI/CD/CT workflows.
Who does need MLOps platform?
Your company hires 10 bright and meticulous Data Scientists, each one with their own set of ideas. On average, they each wish to run 1 model a week and they plan to put into production one model a month. Ask yourself
- if you are an MLOps / DevOps engineer — how many sprints would you need to put a single model into production, let alone ten? Would you know if any of these models were affected by data drift?
- if you are a support engineer — do you have tools in place to monitor that many models?
- if you are a manager — how many more engineers would you need to hire to satisy that pace?
- if you are a Data Scientist — do you believe you could provide more value if you could actually serve that many models into production?
Of course, a company can hire more engineers to build the right platform. I advocate that at a certain scale you cannot simply survive without a dedicated platform. There are just many variables and you will fall into a common pitfall…
After all, your company might not have as many resources as Netflixes of this world to build your own frameworks (BTW, some parts of it are open-sourced). Of course, managed MLOps platform might not suit your needs exactly but it leverages years of experience running models for wide range of customers.
A middle ground would be to build your own platform based on Google’s Vertex AI, Amazon’s Sagemaker, Azure’s Machine Learning capabilities. These tools integrate seamlessly with other cloud services. My friends at GetInData recently published an e-book showing an example of such platform based on Vertex AI.
Conclusion
MLOps platforms and principles can provide immense value to companies and vastly simplify running more models in production. Reproducibility, governance, monitoring and continuous pipelines are all very valuable features but the architects should ask themselves — how many of these features do we really need right now? And wouldn’t it be easier to start off with a simpler platform and eventually migrate out if we become really successful with ML in production.