A Startup’s Guide to Agile Machine Learning Engineering

At TWOSENSE.AI, we build products that implement continuous biometric authentication.  Our core technology relies on Machine Learning (ML) to learn user-specific behaviors and automate the user workload of authentication. Although this idea has been around for a bit, the challenge is to execute on it.  An Agile approach was absolutely necessary for both execution at speed, and to get our product into the hands of customers early. 

We are able to execute on this because we are an extremely experienced ML and product engineering team with a tremendous depth of knowledge around the problem we’re solving. However, as with anything in ML development, there is always a component of “you don’t know what you don’t know” involved, and you have to answer those questions to execute. The key to success is not knowing all the answers ahead of time, but being able to find answers and get those answers into the product quickly and repeatably: it’s about process.  Agile Machine Learning Engineering is all about answering those questions at speed by breaking things down into plannable, scalable, shippable chunks.  The essence of the problem lies in reconciling the long-term explorative nature of ML Research (MLR) with the short-term iterative nature of Agile Software Engineering (SWE).  Agile Machine Learning Engineering (MLE) is the combination of MLR and SWE, which we see as an admirable and attainable goal.  

It has been a difficult process and we made many mistakes, but we’ve learned from those mistakes with success so far. However, we couldn’t find an actionable guide for this journey, although we did spend time looking and shared what we found.  Now we’d like to share what we’ve learned so far, and our approach to making MLE workable in production, hopefully in a way that’s actionable. 

The first thing we’ve learned is that trying to implement Agile MLE requires organizational changes that go beyond just thinking about how we divide work into tasks and execute on them. Our first attempt was to simply apply our SWE process to MLR, which was a bit of a disaster. The primary issue was that we slowed down (by a lot), indicating that we needed to make some changes to the process to make it workable, which brought us to where we are now. We also found that there were strategic components missing, things that are ingrained in SWEs about how to execute on projects larger than a spint that needed to be adapted for MLE. The slowdown was also very frustrating for the engineers involved, leading us to reevaluate how we looked at ourselves in the context of MLE.

How we think about MLE is broken down into three aspects: Process, Paradigm, and People. Paradigm refers to the larger approaches and concepts that are outside the scope of a sprint, and often on longer timelines and strategic in nature. Here we focus on aspects such as starting any new initiative by building the plumbing and infrastructure first, iterating on that infrastructure to try and beat the performance of the existing pipeline against carefully-defined metrics, and never reinventing the wheel. Process contains our guidelines for integrating the complexities of MLR into our existing two-week SWE sprint in a tactical way. This includes how to scope, ticket, implement and review research, and incorporate the uncertainty of it into the process while getting the results into production.  People is about how humans fit into the context of the first two. The goal here is to find a way to be pragmatic and play to the strengths of the great people we have, without allowing research and engineering to get out of sync, all while keeping things scalable as we grow. 

We go in-depth in what we hope is an actionable way into each of these aspects on our blog: 

PARADIGM: Machine Learning Engineering Strategy at TWOSENSE.AI

PROCESS: Machine Learning Engineering Tactics at TWOSENSE.AI

PEOPLE: Machine Learning Engineers at TWOSENSE.AI

Finally, it’s important to point out that we are aware of the fact that (optimistically only) 30% of all of what we’re doing now is wrong or at least sub-optimal.  We are all invested as a team in continuously evaluating and adapting this process as we go. We’re always working hard to make our space an idea meritocracy, and to create an environment of continuous improvement on an individual and team level.

We’re hiring for Machine Learning Engineers.  If this sounds like an environment you’d like to work in please reach out!  

John Tanios