Accelerated Velocity: Creating an Architectural Runway

Note: this article is part 8 of a series called Accelerated Velocity.  This part can be read stand-alone, but I recommend that you read the earlier parts so as to have the overall context.

Most startups are, by necessity and by design, minimalistic when it comes to feature development.  They build their delivery stack (web site or API), a few tools needed to manage delivery (control panel, CMS) and then race to market and scramble to meet customer requests.  Long term architecture thinking is often reduced to a few hasty sketches and technical debt mitigation is a luxury buried deep in the “someday” queue. 

At some point success catches up and the tech debt becomes really painful.  Engineers spend crazy amounts of time responding to production issues which they could have used to develop new capabilities.  New features take longer and longer to implement.  The system collapses under new load.  At this point tweaks won’t save the day.  An enterprise architecture strategy and runway is needed.

What is an architecture runway?  In short it’s a foundational set of capabilities aligned to the big picture architecture strategy that enable rapid development of new features.  (SAFe describes it well here.)  In plain english – it’s investing in foundational capabilities so features come faster.

The anchor of the architecture runway is, of course, the architecture itself.   I’m not going to wade into the dogmatic debate about “what is software architecture”; rather, I’ll simply state that a good architecture creates and maintains order and adaptability within a complex system.  The architecture itself should be guided by a strategy and long-term view on how the enterprise architecture will evolve to meet the needs of the business in a changing market and tech-space.   

In developing an architecture strategy and runway, architects should start with the current state. At the very least, create a simple diagram that gives context to everyone on the team as to what pieces and parts are in the system and how they play together.   Once the “as is” architecture is identified and documented, the architects can roll up their sleeves and develop the “to be” picture, identify the gaps between the two states, and then develop a strategy for moving towards the “to be”.  The strategy can be divided into discreet epics / projects, and construction of the runway can begin.

Bonial’s Architecture Runway

Success had caught up to Bonial in 2014.  Given the alternative I think everyone would agree that that’s the right problem to have, but it was a problem none-the-less.  The majority of the software was packaged into a single, huge executable called “Portal3,” which contained all of the business logic for the web sites, mobile APIs, content publishing system and a couple dozen batch jobs.  There were a few ancillary systems for online marketing and some assorted scripts, but they were largely “rogue” projects which didn’t add to the overall enterprise coherence.  While this satisfied the immediate needs and had fueled impressive early growth and business success, it wasn’t ready for the next phase.

One of my first hires at Bonial was Al Villegas, an experienced technologist who I asked to focus on enterprise architecture.  He was a great fit as he had the right mix of broad systems perspective and a roll-up-his-sleeves / lead-from-the-front mentality.  He and I collaborated on big-picture “as-is” and “to-be” diagrams that highlighted the full spectrum of enterprise domains and showed clearly where we needed to invest going forward.   Fortunately we version and save the diagrams, so here are the originals:

Original 2014 “As Is” High Level Enterprise Architecture
Original “To Be” 2015 High Level Enterprise Architecture

These pictures served several purposes: (1) they gave us an anchor point for defining and prioritizing long-term platform initiatives, (2) they let us identify the domains that were misaligned, underserved or needed the most work, and (3) they gave every engineer additional context as they developed their solutions on a day-to-day basis.

Then the hard work started.  We would have loved to do everything at once, but given the realities of resource constraints and business imperatives we had to prioritize which runways to develop first.  As described in other articles of this series, we focussed early on our monitoring frameworks and breaking up the monolith.  In parallel we also started a multi-phase, long-term initiative to overhaul our tracking architecture and data pipelines.  Later we moved our software and data platforms to AWS in phases and adopted relevant AWS IaaS and SaaS capabilities, often modifying or greatly simplifying elements of the architecture in the process.  Across the span of this period, we continually refined and improved our APIs, moving to a REST-based, event-driven micro-services model from the dedicated/custom approach previously used. We also invested in an SDLC runway, building tools on top of the already mature devops capabilities to further accelerate the development process. 

The end result is a massive acceleration effect.  For example, we recently implemented a first release of a complex new feature involving sophisticated machine-learning personalization algorithms, new APIs and major UI changes across iOS, Android and web.  The implementation phase was knocked out in a couple of sprints.  How?  In part because the cross-functional team had available a rich toolbox of capabilities that had been laid down as part of the architecture runway: REST APIs, a flexible new content publishing system, a massive data-lake with realtime streaming, a powerful SDLC / staging system that made spinning up new production systems easy, etc.  The absence any of these capabilities would have added immensely to the timeline.

The architecture continues to evolve.  We’ve recently added realtime machine learning and AI capabilities as well as integrations with a number of external partners, both of which have extended the architecture and brought both new capabilities and new (and welcome) challenges.  We are continually updating the “as is” picture, adapting the architecture strategy to match the needs of the business, and investing into new runway.

And the cycle continues.

Closing Thoughts

  • Companies should start with a simple single solution – that’s fine, it’s important to live to fight another day.  But eventually you’ll need a defined architecture and runway.
  • Start with a “big picture” to give everyone context and drill down from there.
  • Don’t forget the business systems: sales force automation, order management, CRM, billing, etc.  As much as everyone likes to focus on product delivery, it’s the enterprise systems that run the business.
  • Create a long-term architectural vision to help guide the big, long-term investments.

How we Plan at Bonial (part 3)

Collaborative digital stickies board that we use for planning.

Ok, after all that, how do we actually plan at Bonial?

The heart of our planning activities is the Quarterly Planning which is loosely modeled on Program Implement (PI) Planning from SAFe.  During quarterly planning / PI planning, everyone in the product development organization – developers, designers, architects, testers, product managers, operations specialists, designers, etc. – get together for a couple of days to map out their next phase.  We do our planning during the previous quarter’s HIP (Hardening Innovation and Planning) sprint, which is sprint 6 of each quarter.

Before I dive into the actual planning days, I should point out that the preparations start several weeks before when the product teams actively work with stakeholders, customer facing teams and the executive team to validates the backlogs against the current company priorities and business realities.  The prep phase looks something like this:

  • The senior management team and product strategy board review the overall strategy and primary business goals to assess if any change in focus is needed.  
  • Next we make sure that product and delivery management has the same level of clarity. We get the delivery leads and product owners together and communicate the company goals for the upcoming quarter to them, taking the time to answer questions about strategy, challenges, current market trends etc. Our goal here is to make sure that all our leaders are able to bring clarity to their teams so that local decisions are made with the right context.
  • 3-4 weeks before the planning event, the product management team starts curating the backlogs for the different product and system streams.  They create a “long list” of major features and work items and meet with stakeholders, customers and Bonial management to validate priorities. 
  • A week before planning the “long lists” are reduced to “short lists” of the highest priority items. This is probably the hardest part of the process and it requires saying “no” to things… we find that our stakeholders and customers all agree that discipline is needed so long as it mostly impacts other stakeholders and customers.  Over the years we’ve tried various formal mechanisms to prioritization – Weighted Short Job First, Feature Bucks, etc. – but in the end we find that different tools are needed for different situations and that, with experience, people often people intuitively know the order.
  • Over the next week the product team spends time working through open questions and details while architects and engineers do the same on the technical side.  There’s also generally some intense discussions about “bubble” items – features that are right on the cusp of making the list – as well as hot items that didn’t make the list.

I wish I could say that this process was easy.  The truth is that a great deal changes in three months – new opportunities and challenges, unexpected curveballs – so we’re constantly challenged to re-assess our priorities with each planning cycle.  On top of that there’s a lot we want to do, so we find ourselves often having hard discussions up until the planning day, especially around the “bubble items”.  It’s not clear to me that there’s a much easier way – we’re in a fast industry and a complex business – but we try to get better each quarter.

So the primary inputs to planning are a short, discreet, prioritized set of epic-sized initiatives for each team.  Most of these are functional but there are usually some architectural or operational topics as well.  That brings us now to the actual planning days (typically a Th/F):

  • On planning day 1, we start with a team breakfast at 0900 and then a kickoff presentation at 0930.  The kickoff presentation covers the big picture goals for the quarter and a quick review of each team’s focus and top items so everyone has context.  We also cover logistics – where they can find flip-charts and stickies, who’s in which rooms, etc.
  • Following the kickoff (and the kitchen cleanup), the teams go to their planning spaces and get started.  Basically, they start with the top priority item, plan it through to completion, and then repeat with the next item.  Once they get to the allocated capacity they stop planning.  The remaining items simply don’t get done.

Teams plan with flip charts for each sprint and colored stickies for tasks, milestones, etc.

  • “Full capacity” is an interesting and oft debated question.  We have a loose agreement that teams should reserve ~20% for bugs and team discretion and should reserve another ~20% for refactoring and architecture work. 
  • As the teams are planning they’re also working with other teams on inbound and outbound dependencies.  We’ve organized the teams to minimize dependencies but they’re still a fact of life.  The teams negotiate how to support each other based on overall priorities and goal (ref. the “context” from the breakfast).  Any un-resolved conflicts are escalated or raised at the review meeting (below).
  • At 4PM on the first day the scrum masters and other delivery managers get together to share their current plans with the group.  We use a web-based collaboration tool that allows each team to put virtual stickies on their assigned row with different colors illustrating milestones, spikes, tasks, releases, etc.  Dependencies are made visible by connecting two stickies with a line.  

Teams gather to review the day 1 draft plan.

  • Putting everything together allows us to visualize the major streams, see what made the cut and what didn’t, and address any dependency challenges or conflicts.  Generally there are several to-dos coming out of the review, primarily around working through dependencies or going to business stakeholders for clarification.
  • The morning of day 2 is primarily for making adjustments from the previous day, collaborating with other teams where combined efforts are needed and tying up loose ends.  Most teams wrap this up pretty early and then get back to their HIP sprint, others need most or all of the day.  
  • At 4PM on day two we grab a beer and get back together in front of the stickies board to review any changes from the previous day and discuss any unresolved conflicts.  This exercise typically goes much faster than the day 1 review.  At the end we check confidence and then head home for a much needed break.

Here’s the final plan from last quarter.  

Q2 final plan

It looks complex and it is complex.  Without developing our process, our teams and ourselves over the last couple of years we’d be hard pressed to effectively manage this complexity.

Following the planning we package up the plan and communicate a high level, consumable version for to the business and stakeholders.  We emphasize that these are our current targets and best estimates – this isn’t a contract.  We’ll do everything we can to stick to it but we may be surprised or, in good agile fashion, we may decide to make changes as the situation evolves.

So that brings us full nearly full circle.  I started this series during our last planning days and expected it to be a quick post.  As I pulled the thread, however, I realized how much work had gone into our evolution in this area.  I could also see that a high-level flyover would leave huge gaps in the journey, so I decided to fly lower.   

You can see by now that undertaking a journey like this takes a fair amount of time, experience and honest self-evaluation, regardless of the specific methodology you choose.  That said, the investment is worth it, and a great deal of value can be realized even early in the process.

In Bonial’s case, we had a few advantages as we set off on the journey.  First, everyone was open to change, even when the change made them nervous.  The importance of this can’t be overstated.  I’ve lost count of the organizations I’ve worked with in which the teams had no motivation to improve (though paradoxically most of them complained constantly about the status quo).  In the end the team has to want or at least be willing give it shot.  Which brings us to point two…

Second, we had good people and a healthy culture.  Where we lacked in experience and skills, we more than compensated by having a team of smart, energetic professionals.  With good people, you can generally solve any problem. 

Last, but not least, we have a skilled, SAFe-trained Release Train Manager to drive the process (though her role has evolved).  Even the finest orchestras of the world don’t play on they own- they have a conductor.  In our case the conductor/RTE ensures:

  • The stage is set. Everybody knows the timing, their roles and the rules of the game and All the needed supplies are in place and easily accessible to everybody.
  • Short (really short!) list of candidates for planning is finalized before we start.  The RTE ensure we’re observing Work in Progress (WIP) constraints, which are critical to maximizing throughput.  As she often says, “Let’s stop starting things and start finishing things instead.”  
  • People know who to go to regarding priorities and impediments during planning.
  • The planning is properly wrapped up, all roadmaps and agreements put together, and outcomes are properly communicated to all key stakeholders.
  • Solid retrospectives are done both on the quarter itself as well as the planning process so we can continue improving.

Whew!  That was a lot of writing for me and reading for you.  Kudos if you made it this far – I hope it was worth it.  So now you know how we do it – feel free to share your own stories about how you and your teams plan.  Best of luck in your own journey!

(Special thanks to Irina Zhovtobrukh (the mysterious RTE) for her contributions to this post as well as teaching us how to “conduct” better planning evolutions.)

How we Plan at Bonial (part 2: competence)

In the previous two posts I talked about the importance of clarity and control, but even perfect clarity and unlimited control will likely still lead to failure and frustration if the team isn’t ready to take on these new responsibilities. That’s where Competence comes in.

To build competence across the team we invested in experienced practitioners as well as training and mentoring. We hired a talented SAFe-trained development manager (“Release Train Engineer” in SAFe parlance) to both lead our transformation as well as provide training and mentoring.  We brought in agile and SAFe trainers for multi-day training sessions on team and enterprise agile (more on SAFe in later posts).  We started leadership and management training for our product owners, new team leads and lead developers. The more experienced members of the team actively coached others in best practices.

Why go through all this trouble?  Simple – a common source of failure I’ve seen over the years is this: the fantasy that calling something ‘agile’ somehow makes it agile.  Too often I’ve seen organizations slap on the label of “scrum teams,” appoint a newly hired Scrum Master or Agile Coach, tell them to have stand-ups and sprints, and then hope that “agile happens”… a.k.a. “fake it until you make it”.  Good luck.  Like it or not, you have to invest in training, excellent people and experienced leadership.

A word of advice: don’t skimp on the training. Our first training session involved a half-day session for only key leaders. As we quickly learned, that’s not training – that’s just a teaser.  Frankly I was part of the problem – I needed to shift my attitude and accept that, unless the whole team is on-board and up-to-speed, we’d never be able to run a full speed.  Yes, it was expensive in both time and money, but necessary.  We’ve since opened up both the breadth and depth of the training.

We also learned by doing. We built on a strong culture of open and honest retrospectives and we actively shared the learnings between teams. We experimented with new techniques and, when they worked, spread them throughout the organization. We actively cultured an environment of “low fear” so that people had space to learn and grow.

As a management team, we also worked hard to “specify goals, not methods” as part of the shift away from the Roadmap Committee described in the previous post. Why is this a competence topic? Because by forcing ourselves to stay out of the details we provided space for the teams to learn and grow. This also opened up room for lots of great ideas that may never have been voiced in a top-down approach.

Key takeaway: invest in training and regular, iterative experiential learning. Put your teams in positions where they need to stretch their knowledge and experience so that they have the context and confidence going forward to execute the mission (but actively support them as they learn).  And, as always, hire and retain great people.

One thing before we get back to the original topic – as I re-read these last three posts I can see how a reader might get the sense that we executed smoothly via a carefully orchestrated plan.  Not so.  There was trial-and-error, plenty of course adjustments and a mix of successes and failures.  That’s ok – it takes time.  What’s important is keeping your eye on the ultimate goal, being realistic and working together as a team to make it happen.

Ok, after a long detour through the background, back to the original topic

How we Plan at Bonial (part 2: control)

blue angels - extreme control
Blue Angels – extreme control

As you read in the previous post, we shed some light on what we were (and weren’t) doing with some simple Clarity mechanisms with regards to planning our software development.  Now we needed to make sure everyone knew who should be doing what – a.k.a. Control. 

We started with a new roadmap governance process.  We knew that if we wanted to scale the organization we had to fundamentally rationalize the “roadmap committee”.  To that end we developed the following decision flow chart:

Bonial’s first update to roadmap governance

Though it appears complex, it’s built around a single principle: push as many decisions to the teams as possible.  The “roadmap committee” would be responsible for major strategy and funding decisions and for monitoring progress; the teams would execute under the broad guidance from the committee.  

This shift to distributed control was fundamental to our later growth and success but the truth is that it took the better part of a year until we “got it right-ish”.  It was an iterative process of building trust on all sides – management had to trust the teams to make good decisions, the teams had to trust management to provide clear guidance and hold to it, and the stakeholders had to trust both.  But it was worth it.  

Most importantly, the teams began to “own” their mission which changed everything. 

The Roadmap Committee has long since been replaced with other more focussed and lighter-weight mechanisms, but the principles still hold true – executive management sets the goals, allocates resources and provides experience and mentoring; the teams decide how to achieve the goals and execute.  We continue to explore different organizations and alignments to optimize our software development and delivery and we assume we’ll continue to experiment as we grow and our missions changes.

Another major step we took that impacted both control and clarity was to align our teams into Value Streams.  In our effort to improve how we applied Lean and Agile principles at the team and group levels, we decided to adopt best principles from the Scaled Agile Framework (SAFe) for software development at the enterprise level.  SAFe teams are built around “Programs” or “Value Streams” that allowed teams to focus on a specific portion of the mission and operate as independently as possible.  We deviated quite a bit from pure SAFe and formed three streams around our user facing efforts, our business systems and our operations initiatives.  Never-the-less the benefits were immediate as we reduced “prioritization hell” which is what I call the often fruitless act of trying to compare a revenue generating topic with, for example, a cost savings or security topic. 

Key takeaway: it’s impossible to both scale and maintain central control.  Effective scaling requires creating semi-autonomous, fully-capable teams organized to be relatively independent and provided with the clarity needed to tackle their mission.  This can be a tough step, especially in organizations with a long history of central control, but it’s a step that must be taken.  (FWIW I’ve seen the opposite and it’s not pretty.)

So now we knew what we were doing and who should be doing what.  We were getting a lot closer, but we had one more big step