It’s all very well and good talking about MLOps in principle – why it’s useful, and what makes it valuable. But it is another to actually show how to implement it, and how to make it work in a real team, doing real work.
Lots have been written and espoused about why a course of action is sensible – but if you’re first taking your steps into the world of MLOps, a concrete example of some of the patterns and behaviors can make it much easier to consider the how of implementing such a change within your own work or organization.
This article series will reveal some of the processes and technologies we employ at Relative Insight which allows us to deliver iterative and continuous incremental improvement to our platform, and provide a healthy and sustainable work environment for those who are developing it.
Applying agile principles to MLOps
If this sounds familiar, it’s because in my experience, a lot of the challenges around getting machine learning into production are similar in their symptoms and causes to those with other software.
Reams and reams of (virtual, and physical) paper have been written about the agile manifesto and therefore one way for us to think about “doing MLOps right” is to find ways to work with agility and apply the principles of an agile methodology to the context of managing machine learning systems and artificial intelligence.
But before we get to the implementation detail, it’s important to understand the elements that motivate us to think so deeply and commit diligently to agile approaches at Relative Insight. These three directives will all overlap and intersect with the tents of agile, and hopefully, will aid you in understanding what we do, how we do it, and why we care.
Motivations behind Machine Learning Operations
In terms of prime principles, there are three main things that motivate us at Relative regarding MLOps. These are determined both by how we can create a safe and creative space for our technical team to thrive, and also by how we can best deliver novel functionality at the best price for our customers. They are:
- Continuous improvement
- Conscientious pricing
These practices are key because they operate at multiple levels – they impact developer well-being, product health, customer experience and more.
So, now that we know what matters to us at Relative Insight, let’s explore it in greater detail. This article will discuss communication, and we’ll go in-depth on continuous improvement and conscientious pricing in subsequent weeks.
The power of communication in MLOps
Feedback is really about communication. Now, we could write a mountain on the merits, methods and manner of communication. But the key thing is to recognize that communication – whether automated and technical (in the form of process logging), verbal (in the form of meetings and discussions) or written (in the form of comments and documentation) is difficult to get right and requires trust between participants.
It’s not just about technical implementation details, but also cultural practices. It’s about ensuring that you’ve built an environment where questions will be asked, and listened to; but also about making sure there’s enough information to hand that the right questions are being asked. It’s also about making a space where you can learn from the decisions made and “get uncomfortable.” That means proactively avoiding letting our comfort zones stop us from trying new things or pushing technological boundaries.
With that understood, let’s explore the different levels of communication and aim to understand the distinction between synchronous and asynchronous communication – as well as how to maximize both.
Asynchronous communication is a label for any scenario where information is collected at a different point to the time at which it might be used. For day-to-day development, PR reviews are probably the most common form of async communication. Everyone is able to review a software change in their own time, and contribute concerns or approval where appropriate. But there are many other activities that fall under this bracket. Looking at a realistic scenario shows some common cases of async communication.
For example, if you have a machine learning model, (for example, a classifier) that detects the sentiment across an entire sentence – how do you know if it’s doing its job correctly? First, you’ll want to be able to collect the predictions it makes and understand how they differ from the distributions of the training set.
You’ll want to understand the confidence of this classifier (after all it’s important to know if it suddenly becomes less confident in a prediction, as it indicates that something in the data has changed) and you might also want to know if the classifications the model predicts are correct. This means collecting the data it accessed at inference time and the class label it predicted.
Turn your unstructured data into measurable metrics
Leveraging asynchronous methods
All these aspects can be covered under the bracket of asynchronous communication. In the case of the automated metrics above, this provides a valuable signal to data scientists and ML engineers to understand where the model is going wrong, and indeed where it is right.
This obsession with metrics allows us to rely on science and numbers, rather than hunches and guesswork when the time comes to ask questions like “Do we need to update our model?” and “Do we handle data from certain domains (for example, medicine) as accurately as we do from others?”
Crucially, this means that there’s evidence to be able to measure the impact of the work we do. If we can change the size of a model, distilling it into a smaller footprint; it is this automated data collection that can inform us about the reduction in inference time, the cost savings from lower hosting requirements, or the speed at which scaling can be done.
Examples of automated communication
- Model monitoring
- Service monitoring
- App instrumentation
Asynchronous communication also includes acts like documentation. Model documentation helps to communicate to other stakeholders what a new ML system is capable of, and where its limitations lie. This helps ensure that the product fits well and identifies areas for innovation and development.
Outside of the world of model development, persona creation is another form of non-automated async communication. By allowing product-minded engineers to understand the end-use case, the system can cater to customer needs beyond API output and frontends.
If a model was designed for a particular persona or a product use case – and suddenly we detect (using the automated metrics) that there’s a misalignment – that is the start of a feedback cycle. Crucially, it is a cycle being driven by working software.
The second half of the cycle then kicks off, with a new iteration trying to ameliorate the problem. This process might necessitate a code change and consequently, a PR review. Once the change is made, you can then use the combination of metrics to see if the ML product created is better for the customer’s needs and your feedback cycle starts anew.
Of course, it is always possible that the source of error is in the identification of your customer persona. Maybe there’s been some detected model drift because users are starting to behave in a novel way.
As customers’ needs evolve, it is crucial for businesses to cater to those demands effectively. In this case, the feedback cycle might be directed from your Data Science team towards your Product team. This could result in further market research to re-evaluate existing personas and is a form of customer collaboration from the agile manifesto.
In order to achieve the second part of your feedback loop in a timely way, there is a necessity for synchronous or sometimes ‘real-time’ communication.
Synchronous communication is incredibly powerful, but incredibly expensive when compared to async. The most common form of formal synchronous communication is a meeting. Meetings allow for everyone to get on the same page, and for key information to be distributed and queried in one spot. But when used ineffectively they can be disastrous.
Nearly everyone, at some point, has sat through a meeting where you say “This could have been an email”. Think of the hourly cost for 20 professionals in a room together in comparison to that email. But there are other kinds of synchronous communication.
Sometimes, when time is of the essence (imagine a really damaging software bug or an example of extreme bias in a model) it can make sense to do a real-time code review. Even development can be done synchronously, in the form of pair-programming.
An automated approach
A more automated form of synchronous communication is to have an alerting system in place. This means that if an issue of suitable severity is detected in a system – an alert is sent to the internal messaging system like Slack, or ping an on-call engineer via text or phone to look over the issue. This allows for a rapid response to the most glaring of issues and allows for the time to resolution to be kept at a minimum.
However, the downsides again, are drastic. If you’re talking about having an out-of-hours alerting system this can be detrimental to developer wellbeing. Being alerted via an on-call system is a high-pressure scenario and it makes it incredibly difficult to switch off from work. This in turn contributes to burnout, if not carefully managed.
In addition, if your alerts are incorrectly configured, one can be left with a “boy who cried wolf” scenario. The first few times, you’ll drag an on-call dev out of bed at 3 a.m. to look into fixing an issue that doesn’t exist, but if this persists – people will stop reacting at all, even if a real problem does come along. This has a calamitous knock-on effect on happiness, well-being, and of course, platform quality.
Harnessing MLOps for improved communication
So, in all of these cases, it is clear that ensuring your processes are nailed down is the best way to facilitate great communication, and thus rapid feedback cycles.
If you’re able to make most of your communication async-by-default you benefit from having stores of information that can be queried whenever they’re needed. You put data and metrics at the core of your operations, and allow decisions to be made autonomously with working production systems. This democratizes and empowers your team members.
It also means that when turning to the faster and more powerful synchronous methods of communication, you’ll get higher engagement and traction from those acts. The simple fact of their comparative rarity will garner more focus and attention – in turn providing the capacity for them to be more productive and to be able to respond to change whenever the situation calls for it.
To get the best out of a collective thought requires an environment of trust. Trust in the team to work as a group, and a trust network that psychological safety will be valued and maintained.
This means that when mistakes happen, you can follow a path of zero-fault resolution. Trust that the product needs are well understood and that the customers are being listened to. Trust that collaborators will put aside ego, and facilitate and collect input from everyone. Trust that if help is needed, the act of asking will be seen as a sign of strength, not weakness. Trust that the metrics that are collected are the right ones and that the data is collected accurately.
And of course, trust that when an alert goes off in the middle of the night, and you’re pinging your manager at 3 a.m. to jump on a call to fix a major bug, it is indeed worth getting out of bed in the first place!