A Software Engineering process for a Machine Learning startup

At TWOSENSE.AI, we create continuous biometric authentication in the workplace, based on behavior. Human behavior, as measured by the systems we interact with, is time-series data, which is not easily interpreted by human senses. However, as we have proven, it is interpretable by machine intelligence.  Machine learning is, therefore, an integral component of our product, and machine learning dev and product dev need to operate in lockstep. As a startup we need to aim small, ship small, miss small, and iterate, so we have no choice but to apply Agile engineering methodologies to both software engineering (SWE) and machine learning or data science (read more on that in the first post of this series). We define machine learning engineering (MLE) as data science with the applied rigor of SWE. 

We’ve established our need to apply SWE rigor to ML. It’s tempting to wax philosophical and try to come up with a framework that’s universally implementable and covers all possible verticals.  And people have. But who’s got that kind of time? Surely not a startup that needs to be agile or die. We have taken a practical approach of limiting our design space for our MLE process to our existing SWE setup and going from there, so to start with we’d like to share our SWE process and the decisions behind it. 

We have two main components that make up the definition of SWE: process and tools. I’ll explain our process and tooling here, because that sets the tone for the rest of the conversation. For us, process is primary, and the tooling and technology are enablers for that process. 

Our process was adopted, for the most part, from the process Scott Werner designed as CTO of SaySpring before they were acquired by Adobe.  Scott is brilliant, and if you ever want to talk about SWE process, just corner him at any of the nerdy meetups he frequents and mention that you’re thinking about it. He will love it. 

We looked for a process that would allow us to move faster against a 6-month event horizon.  Early on we didn’t have much process at all.  We were accomplishing amazing things on a month-to-month basis (1-month horizon) but were heaping on technical debt that really slowed us down long term and to some extent, we’re still paying back now.  Those short-term accomplishments were critical and we wouldn’t be here without them so it was the right call at the time, but we needed long term legs if we were going to hit our ambitiously disruptive long-term goals. Right now, anything more than 6 months out and we’re just guessing at what our universe will look like anyway, so it feels like 6 months is the sweet spot. The further along we get as a company, the longer our horizon needs to be, which feels right.

One of the most important components for achieving that 6-month-horizon goal is incorporating rigorous testing into the development process! We practice test-driven development where, as Scott says, “you’re only either writing a red [failing] test, or writing code to turn a red test green [passing].”  Scott and his team take a paired-programming approach which doubles up on man-power but obviates the need for code review which is effort-intensive. We opted for code review instead, simply because the heterogeneity of our tech platform meant we can’t afford to double up on every developer role, at least for the time being. 

We run our dev cycle in sprints and try to ship every sprint. The planning overhead of operating on weekly sprints ate into productivity too much, 4-week sprints were not agile enough, and 3-week sprints didn’t fit into monthly deliverable deadlines we have for a government customer. For us, 2 weeks just seemed to fit just right.   On the first day of the sprint, we have a planning meeting where the stories are presented that we want to accomplish. Having all hands in a meeting is expensive, so we split up and divide the stories across the responsible teams. Each story is broken down into tickets for an individual contributor with a description, a definition of done, and an upper bound of 3 estimated workdays per ticket. we find 3 days a good general guideline. Any more than that and the tickets become too big and unwieldy to develop and review, and should be broken down further.

One of the issues we caught early was that little misunderstandings in these tickets could lead to a lot of wasted effort that wasn’t noticed until code review, which is way too late.  To avoid this, there’s an open whiteboarding session to kick off a new ticket for dev where the author talks everyone interested on the team through their proposed approach, how they want to design the tests, API definitions, etc.  This is where misunderstandings evaporate. Doing work that’s unnecessary or goes in the wrong direction is incredibly frustrating and demoralizing for an engineer. Using process to avoid that is not wasted overhead, even if it seems heavy. We have standup every morning to give everyone visibility into what’s getting worked on, and a retro at the end of every week to find ways to improve.

Our Engineering dashboard. We need a bigger screen.

Our Engineering dashboard. We need a bigger screen.

In order to enable that process, we use a suite of great tools.  We’re an Atlassian shop and love their software, so Jira for sprint tracking and ticketing, Confluence as a knowledge base, and Bitbucket for repo hosting (it’s not as good as git hub, but it’s fully integrated with the Atlassian stack which is better for the process, and process is primary). We also adhere pretty strictly to git flow as code tracking paradigm and run CI/CD on Jenkins, Docker and Firebase, and SonarQube for linting, security, and static analysis. We also use FunRetro to manage retros, and a Trello dev dashboard (Jira doesn’t support our needs) on a monitor in the office so everyone can see what the team is working on and what’s on their own plate at a glance. There are many other components to our stack, but they’re not immediately relevant to the SWE process so we’ll leave them out.

So this is the specific SWE environment and tooling that we need to integrate MLE into.  The rest of this series will be focused on sharing our efforts to create an MLE process that synchronizes with this specific SWE process. I think of this post as a reference that you can use to see if our MLE process could be helpful for you. 

John Tanios