
Note: this article is part 9 of a series called Accelerated Velocity. This part can be read stand-alone, but I recommend that you read the earlier parts so as to have the overall context.
Anyone who’s run a team knows there’s always more to do and pressure to do even more than that. Most managers will respond by asking for more workers and/or make the existing workers put in more hours. I’ve been there myself, and learned the hard way that there is a better, faster and cheaper way: focus on efficiency first. Let’s look at why.
For clarity and simplicity, I’ll frame this in a simple mathematical representation:
[ total output ] = [ # workers ] x [ hours worked ] x [ output / worker-hour (i.e. efficiency) ]
In other words, output is a factor of the number of people working, the amount of time they work, and their efficiency. So if we want to increase total output, we can:
- Increase the number of workers.
- Increase hours worked.
- Increase efficiency.
Pretty straightforward so far, but when we start pulling the threads we quickly discover that the three options are not equal in their potential impact. The problem with #1 is that it is expensive, has a long lead time and, unless the organization is very mature, generally results in a sub-linear increase in productivity (since it also adds additional management and communication load). #2 may be ok for a short time, but it’s not sustainable and will quickly lead to a significant loss in efficiency as people burn out, start making expensive mistakes and eventually quit (which exacerbates #1).
That leaves us with #3. Efficiency improvements are linear, have an immediate impact and are relatively cheap when compared to the other options. This is where the focus should be.
Really? It can’t be that simple…
It is. Let’s look at a real-world example. You have a 20 person software development team. To double the output you could hire an additional 22-25 FTE (Full Time Equivalents) and start seeing increased velocity in maybe 3-6 months. (Why not just 20? Because the you also need to hire more managers, supporting staff, etc. You also have to account for the additional burden of communication. That’s why this is non-linear.)
You could ask them to work twice as many hours, but very quickly you’ll find yourself processing the flood of resignation letters. Let’s cross this off the list.
Or you could ask each person on the team to spend 10% of their time focussing on tools, techniques, frameworks and other boosts to efficiency. If done right, you’ll start seeing results right away and in short order you can cut the development cycle time by as much as half. In effect you’ve doubled efficiency (and therefore output) for the equivalent of 2 FTE (20 people x 10%). In economic terms, this would be 10:1 leverage.
Not bad. This is why I recommend always focussing on efficiency first.
(This isn’t to say that growing the team is the wrong thing to do – Google wouldn’t be doing what Google does with 1/10 the engineers – but this is an expensive and long-term strategy. Team growth is not a substitute for an intense focus on efficiency. Adding a bunch of people to an inefficient setup is a good way to spend a lot of money with low ROI.)
So what would the team focus on to boost efficiency? The list is long and includes both technical (reducing build times, Don’t Repeat Yourself (DRY), etc.) and non-technical (stop wasting time in stupid meetings) topics. All are valid and should be addressed (especially stupid meetings and time management), but in this article I’m going to focus on leverage through automation and software driven infrastructure; in other words: devops.
DevOps
Ask ten people and you’ll get ten different answers as to what devops is. I’m less interested in dogmatic purity and more in the foundational elements of devops that drive the benefits, which to me are:
- Tools: an intense focus on increased efficiency through DRY / automation.
- Culture: increased efficiency through eliminating arbitrary and stupid organization boundaries.
Bonial understood this. When I arrived in 2014, nearly all of the major build and deployment functions were fully scripted and supported by a fully integrated and aligned “devops” team (though it wasn’t called that at the time), which had even gone so far as to enable interactions with the scripts via a cool chat bot named Marvin. Julius, Alexander and others on the devops team were wizards with automation and were definitely on the cutting edge of this evolving field. For the most part we had a full CI capability in place.
Unfortunately, future gains were largely blocked by code and environmental constraints. No amount of devops can solve problems created by code monoliths and limited hardware. So, as described in other articles in this series, we invested heavily in breaking up the monolith. It was painful, but opened up many pathways for team independence, continuous delivery and moving to cloud infrastructure. After we’d broken apart and modularized the code monolith, we moved into AWS which created further challenges. On one hand we wanted everybody to make full use of the cloud’s speed and flexibility. On the other hand it was important to ensure governance processes for e.g. cost control and security. We balanced those requirements with infrastructure as code (IaC) and we standardized on a few automation platforms (Spinnaker, Terraform, etc.) but let teams customize their process to meet their needs. At this point, the central “devops” team became both a center-of-excellence and a training and mentoring group. Our foundation in automation enabled us to very rapidly embrace and adopt IaaS and explore server-less and container approaches. It took some time to settle on which automation frameworks would best meet our needs, but once that was done the we could spin up entire environments in minutes, starting with only some scripts and ending up with a fully running stack. Due to the sheer speed of changes, adopting a “You Own It You Run It” (YOIYRI) approach and moving more responsibilities into the teams came naturally. All those changes took us to a whole new level.
SDLC
The natural next step was to address one of the most painful and persistent bottlenecks in our previous software development lifecycle (SDLC) – development and test environments (“stages”). Previously, all of the development teams had had to share access to handful of “stage” environments, only one of which was configured to look somewhat like production. This created constant conflict between teams who were jockeying for stage access and no-win choices for me on which projects to give priority access.
Automation on AWS changed the game for the SDLC. Now instead of a single chokepoint throttling every team, each team could spin up its own environment and we could have dozens of initiatives being developed and tested in parallel. (We also had to learn to manage costs more effectively, especially when people forgot to shut off their stage, but that’s another story…)
We still had the challenge of dependencies between teams – e.g. what if the web team needed a specific version of a certain API owned by another team to test a new change. One of our engineers, Laurent, solved this by creating a central repository and web portal (“Stager”) that any developer could use to spin up a stage with any needed subsystems running inside.
As it stands today, the “Stager” subsystem has become a critical piece our of enterprise; when its down or not working properly people make noise pretty quick. As befits its criticality, we’ve assigned dedicated people for focus purely on ensuring that its up and running and continually evolving to meet our needs. Per the math above, the leverage is unquestionable and investing in this area is a no-brainer.
Closing Thoughts
- It’s simple: higher efficiency = more output
- Automation leads to efficiency
- Breaking down barriers between development and ops leads to efficiency
- Invest in devops
Many thanks to Julius for both his contributions to this article and, more importantly, the underlying reality.