At Twosense, we build Enterprise products that implement continuous biometric authentication. Our core technology relies on Machine Learning (ML) to learn user-specific behaviors and automate the user workload of authentication. Although this idea has been around for a bit, the challenge is to execute on it. An Agile approach was absolutely necessary for both execution at speed, and to get our product into the hands of customers early. Our goal was to combine Agile Software Engineering (SWE) with Machine Learning Research (MLR) to create what we call Agile Machine Learning Engineering (MLE). In order to do this, we had to make changes to our existing Software Engineering process on a tactical level (Process), on a strategic level (Paradigm) and on how we see ourselves in the process (People).
Here we’re going to share what we’ve learned on the longer-term strategies we needed for implementing Agile MLE. The term “Paradigm” refers to the way we see these changes as being implemented and having an impact across sprints, rather than the tactical aspects that we put under “Process.”
As ML and DS professionals we are often tempted to jump into research and prototyping of any new product or feature. The problem becomes apparent when it’s time to port that initial chunk of research to production. In our experience, this creates a delta between research and production that’s too large, and the effort of bridging that gap balloons out of proportions due to the amount of refactoring and restructuring required. At TWOSENSE.AI we call this gap “death by a thousand production cuts.” We borrowed our solution to this problem from Google and others, which is to start with the pipeline first, even if it’s just garbage-in-garbage-out or implements a simple heuristic. Get the infrastructure right first, and then iterate the research piecemeal off of that. Even if the pipeline isn’t perfectly architected for whatever comes next, the iterative delta of improving it while continuing research vastly reduces the chasm between research and production which results in improved speed and quality. It also allows different people and teams to work on different aspects of the pipeline independently.
Similarly, even beyond the cold-start problem, there is a tendency to dive deep on the research side. As a researcher this feels right, asking all the relevant questions and finding answers and new questions, then getting those right until we have something that we’re confident in before peer review. The problem is that this puts us back into the realm of having a large delta between research and production, and bridging that execution gap repeatedly is again death by a thousand production cuts. Our solution is to keep research and production in lockstep. Our initial approach to this is a combination of keeping research tickets iterative (see Process), and making sure everyone on the ML team has a foot in both research and engineering (see People).
Strategically, it can be difficult to ensure that the ML efforts are moving the company and product in the right direction. With so many aspects in play when it comes to evaluating ML, it can be difficult to avoid lateral moves across sprints, where we improve on one aspect but performance degrades on another, or we focus on aspects until beyond the point of diminishing returns. To address this, we aim for a Challenger/Incumbent approach. This involves a few different components. The first is establishing a set of metrics that are important to the customer, product and overall progress of the company and implementing those in some form of executable. The current status quo can be estimated by running those metrics on whatever is currently in production. This represents the incumbent. Now every feature can be evaluated in terms of its delta against the relevant metrics. This is particularly relevant for evaluating new models and methods, where you may see metric changes across the board, but also for iterative research and production tickets. While we’re not quite to a point where this eval step is fully automated yet, our goal is to work towards full automation as a form of integration testing with performance metrics and simulation as the KPIs.
Another strategic pitfall for ML and DS teams is to succumb to the self-indulgence of building things from scratch. Often we have the feeling that we can do better than what’s already been done, and with deep-tech startups it’s often true too!! But there is a tremendous cost associated with not using established tools, repositories and libraries. First, we’re not learning from other’s mistakes that went into that work, and will most probably repeat those mistakes. Second, even if our widget is better, community widgets benefit from the efforts of others and are always improving, which means our home-made widgets, though initially ahead of the curve, will eventually lag into technical debt. As academics would say, one should always be “standing on the shoulders of giants.” The solution to this seems straightforward, which is to never reinvent the wheel and use whatever we can out of the box. But the implementation of this solution into practice can be tricky. Part of this work is on an individual level where everyone cultivates a spirit of inherent laziness, looking for ways to use tools and open-source repositories to get things done before we go off building new solutions. On a team level, it’s a bit trickier. If you want the whole team looking for open-source and public solutions, you need to provide the time to do so and incorporate it into planning. On the product side, this involves looking for tools and libraries that already do what we need done. On the research side, it’s around looking for research papers and repositories that answer similar questions, and then starting by using or reimplementing that work as a baseline, or as a first incumbent. What we’ve learned is that if we want this to happen, we need to set time aside for it, and work at it both at the individual and team levels.
Finally, it’s important to keep in mind that the process should never hinder progress. Every rule has an exception, every guideline should be crossed at some point. This applies especially to the trial-and-error and “not knowing what we don’t know” of research. As what we know changes, the process needs to change with it. This could be as small as dropping or adding a ticket mid-sprint, or changing some of these core strategies as we go. Our solution to this problem is to assume that 30% of everything is wrong, and there is no doctrine or dogma that is untouchable. If something feels like it’s getting in the way, or certain guidelines are repeatedly being skipped for an advantage, let’s talk about it at retro and drop what’s not working while keeping what is. Kaizen is doctrine, but everything else is subject to change.
We’re always hiring. If this sounds like an environment you’d like to work in please reach out!