Accelerated Velocity: Situational Awareness

“If a product or system chokes and it’s not being monitored, will anyone notice?”  Unlike the classic thought experiment, this tech version has a clear answer: yes.  Users will notice, customers will notice, and eventually your whole business will notice. 

No-one wants their first sign of trouble to be customer complaints or a downturn in the business, so smart teams invest in developing “situational awareness.” What’s that?  Simple – situation awareness is the result of having access to the tools, data and information needed to understand and act on all of the moving factors relating to the “situation.”  This term is often used in the context of crisis situations or other fast-paced, high-risk endeavors, but it applies to business and network operations as well.

Product development teams most definitely need situational awareness.  The product managers and development leads need to know what their users are doing and how their systems are performing in order to make wise decisions – for example, should the next iteration focus on features, scale or stability.  Sadly, these same product teams often see the tracking and monitoring that is needed for developing situational awareness as “nice-to-have’s” or something to be added when the mythical “someday” arrives. 

The result?  Users having good or bad experiences and no-one knowing either way.  Product strategy decisions being made on individual bias, intuition and incomplete snippets of information.  Not good.

Sun Tzu put it succinctly:

“If you know neither the enemy nor yourself, you will succumb in every battle.”

Situational awareness is a huge topic, so in this series I’m going to limit my focus to data collection (tracking and monitoring) and insights (analytics and visualization) at the product team level.  For the purposes of this series I’ll define ”tracking” as the data and tools that show what users/customers are doing and “monitoring” as the data and tools that focus on systems stability are performance.  Likewise I’ll use “analytics” to refer to tools that facilitate the conversion of data into usable intelligence and “visualization” as the tools for making that intelligence available to the right people at the right time.  I’ll cover monitoring in this article and tracking in a later article.

At Bonial in 2014 there was a feeling that things were fine – the software systems seemed to be reasonably stable and the users appeared happy.  Revenue was strong and the few broad indicators viewed by management seemed healthy.  Why worry?   

From a system stability and product evolution perspective it turns out there was plenty of reason to worry.  While some system-level monitoring was in place, there was little visibility into application performance, product availability or user experience.  Likewise our behavioral tracking was essentially limited to billing events and aggregated results in Google Analytics.  Perhaps most concerning: one of the primary metrics we had for feature success or failure was app store ratings.  Hmmm.

I wasn’t comfortable with this state of affairs.  I decided to start improving situational awareness around system health so I worked with Julius, our head of operations, to lay out a plan of attack.  We already had Icinga running at the system level as well as DataDog and Site24x7 running on a few applications – but they didn’t consistently answer the most fundamental question: “are our users having a good experience?” 

So we took some simple steps like adding new data collectors at critical points in the application stack.  Since full situation awareness requires that the insights be available to the right people at the right time, we also installed large screens around the office that showed a realtime stream of the most important metrics.  And then we looked at them (a surprisingly challenging final step). 

The Bonial NOC Monitor Wall
One of my “go to” overviews of critical APIs, showing two significant problems during the previous day.

The initial results weren’t pretty.  With additional visibility we discovered that the system was experiencing frequent degradations and outages.  In addition, we were regularly killing our own systems by overloading them with massive online marketing campaigns (for which we coined the term: “Self Denial of Service” or SDoS).  Our users were definitely not having the experience we wanted to provide.

(A funny side note: with the advent of monitoring and transparency, people started to ask: “why has the system become so unstable?”)

We had no choice but to respond aggressively.  We set up more effective alerting schemes as well as processes for handling alerts and dealing with outages.  Over time, we essentially set up a network operations center (NOC) with the primary responsibility of monitoring the systems and responding immediately to issues.  Though exhausting for those in the NOC (thank you), it was incredibly effective.  Eventually we transferred responsibility for incident detection and response to the teams (“you build it you run it”) who then carried the torch forward.

Over the better part of the next year we invested enormous effort into triaging the immediate issues and then making design and architecture changes to fix the underlying problems.  This was very expensive as we tapped our best engineers for this mission.  But over time daily issues became weekly became monthly.  Disruptions became less frequent and planning could be done with reasonable confidence as to the availability of engineers.  Monitoring shifted from being an early warning system to a tool for continuous improvement. 

As the year went on the stable system freed up our engineers to work on new capabilities instead of responding to outages.  This in turn became a massive contributor to our accelerated velocity.  Subsequent years were much the same – with continued investment in both awareness and tools for response, we confidently set and measure aggressive SLAs.  Our regular investment in this area massively reduced disruption.  We would never have been able to get as fast as we are today had we not made this investment.

We’ve made a lot of progress in situational awareness around our systems, but we still have a long way to go.  Despite the painful journey we’ve taken, it boggles my mind that some of our teams still push monitoring and tracking down the priority list in favor of “going fast”.  And we still have blind spots in our monitoring and alerting that allow edge-case issues – some very painful – to remain undetected.  But we learn and get better every time.

Some closing thoughts:

  • Ensuring sufficient situational awareness must be your top priority.  Teams can’t fix problems that they don’t know about.
  • Monitoring is not an afterthought.  SLAs and associated monitoring should be a required non-functional requirement (NFR) for every feature and project.
  • Don’t allow pain to persist – if there’s a big problem, invest aggressively in fixing it now.  If you don’t you’ll just compound the problem and demoralize your team.
  • Lead by example.  Know the system better than anyone else on the team.

 

In case you’re interested, here are some of the workhorses of our monitoring suite:

 

Accelerated Velocity: Building Great Teams

Note: this article is part 3 of a series called Accelerated Velocity.  This part can be read stand-alone, but I recommend that you read the earlier parts so as to have the overall context.

People working in teams are at the heart of every company.  Great companies have great people working in high performing teams.  Companies without great people will find it very difficult to get exceptional results. 

The harsh reality is that there aren’t that many great people to go around.  This results in competition for top talent, which is especially true in tech.  Companies and organizations use diverse strategies in addressing this challenge.  Some use their considerable resources (e.g. cash) to buy top talent though with dubious results – think big corporations and Wall Street banks.  Some create environments that are very attractive to the type of people they’re looking for – think Google and Amazon.  Some purposely start with inexperienced but promising people and develop their own talent – a strategy used by the big consulting companies.  Many drop out of the race altogether and settle for average or worse (and then hire the consulting companies to try to solve their challenges with processes and technology – which is great for the consulting companies).

But attracting talent is only half the battle.  Companies that succeed in hiring solid performers then have to ensure their people are in a position to perform, and this brings us to their teams.  Teams have a massive amplifying affect on the quantity and quality of each individual’s output.  My gut tells me that the same person working on two different teams may be 2-3X as productive depending on the quality of the team. 

So no matter how good a company is at attracting top talent, it then needs to ensure that the talent operates in healthy teams. 

What is a healthy team?  From my experience it looks something like this:

  • Competent, motivated people who are…
  • Equipped to succeed and operate with…
  • High integrity and professionalism…
  • Aligned behind a mission / vision

That doesn’t seem too hard.  So why aren’t healthy teams the norm?  Simple: because they’re fragile.  If any of the above pieces are missing, the integrity of the team is at risk.  Throw in tolerance for low performers, arrogant assholes, and whiners, mix in some disrespect and fear, and the team is broken.

(Note that the negatives influences outweigh the positives – as the proverb says: “One bad apple spoils the whole bushel.”  If you play sports you know this phenomenon well – a team full of solid players can easily be undone by a single weak link that disrupts the integrity of the team.)

This leads me to a few basic rules I follow when developing teams:

  1. Provide solid leadership
  2. Recruit selectively
  3. Invest in growth and development
  4. Break down barriers to getting and keeping good people
  5. Aggressively address low-performance and disruption

Bonial had a young team with a wide range of skill and experience in 2014.  Fortunately many of the team members had a bounty of raw talent and were motivated (or desired to be motivated).  Unfortunately there were also quite a few under-performers as well as some highly negative and disruptive personalities in the mix.  The combination of inexperience, underperformance and disruption had an amplifying downward effect on the teams.

To build confidence and start accelerating performance we needed to turn this situation around.  We started by counseling and, if behavior didn’t change, letting go the most egregiously low performers and disruptive people – not an easy thing to do and somewhat frowned upon in both the company and in German culture.   But the cost of keeping them on the team, thereby neutralizing and demoralizing the high performers, was far higher than the pain and cost of letting them go. 

(A quick side note: there were concerns among the management that letting low-performers go would demoralize the rest of the team.  Not surprisingly, quite the opposite happened – the teams were relieved to have the burdens lifted and were encouraged to know that their leads were committed to building high performing teams.)

We started doing a better job of mentoring people and setting clear performance goals.  Many thrived with guidance and coaching; some didn’t and we often mutually decided to part ways.  Over time the culture changed to where low performance and negativity were no longer tolerated.

At the same time we invested heavily in recruiting.  We hired dedicated internal recruiters specifically focussed on tech recruits.  We overhauled our recruiting and interview process to better screen for the talent, mentality and personality we needed.  We added rigor to our senior hiring practices, focussing more on assessing what the person can do vs what they say they can do.  And we added structure to the six month “probation” period, placing and enforcing gates throughout the process to ensure we’d hired the right people.  Finally, we learned the hard way that settling for mediocre candidates was not the path to success; it was far better to leave a position unfilled than to fill it with the wrong person.

How did we attract great candidates?  We focussed on our strengths and on attracting people who valued those attributes: opportunities for growth, freedom to make a substantial impact, competent team-mates, camaraderie, a culture of respect, and exposure to cutting-edge technologies.  Why these?  Because year over year, though employee satisfaction survey and direct feedback, we find these elements correlate very strongly with employee satisfaction, even more so than compensation and other benefits.  In short, we’ve worked hard to create an environment where our team-mates are excited to come to work every day.

(This is not to say we ignored competitive compensation; as I’ll describe in a later post, we also worked to ensure we paid a fair market salary and then provided a path for increasing compensation over time with experience.)

Over time, as our people became more experienced, our processes matured and our technology set became more advanced, Bonial became a great place for tech professionals to sharpen their skills and hone their craft.  New team members brought fresh ideas and at the same time had an opportunity to learn both by what we already had as well as what they helped create.  The result is what we have today: a team of teams full of capable professionals who are together performing at a level many times higher than in 2014

Some closing thoughts:

  • You’re only as good as the people on the teams.
  • Nurture and grow talented people. Help under-performers to perform. Let people go when necessary.
  • Get really good at recruiting.  Focus on what the candidate will do for you vs what they claim to have done in the past.
  • Don’t fall into the trap of believing process and tools are a substitute for good people.

Footnote: If you haven’t yet, I suggest your read about Google’s insightful research on team performance and how  ”psychological safety” is critical to developing high performing teams. 

Accelerated Velocity: Building Leaders

Note: this article is part 2 of a series called Accelerated Velocity.  This part can be read stand-alone, but I recommend that you read the earlier parts so as to have the overall context.

Positive changes require a guiding hand.  Sometimes this arises organically from a group of like-minded people, but far more often there’s a motivated individual driving the change.  In short – a leader.

Here’s the rub: the tech industry is notoriously deficient in developing leaders. Too often the first step in a leader’s journey starts when their manager leaves and they’re blessed with a dubious promotion… “Congratulations, you’rein charge now.  Good luck.”  If they’re fortunate their new boss is an experienced leader and has time to mentor them.  In a larger organization they may have access to some bland corporate training on how to host a meeting or write a project plan.  But the vast majority of people thrust into leadership and management roles in tech are largely left to their own devices to succeed.

Let me pause for a moment and highlight a subtle but important point: leadership and management are different skills.  Leadership is creating a vision and inspiring a group of people to go after the vision; management is organizing, equipping, and caring for people as well as taking care of the myriad details needed for the group to be successful.  There’s some overlap, and an effective leader or manager has competence in both areas, but they require different tools and a different mindset. This article is focussed on the leadership component.

So what does it take to develop competent and confident leaders?  When I look at some of the best-in-class “leadership-centric organizations” – militaries and large consulting companies for example – I see the following common elements:

  1. Heavy up-front investment in training
  2. High expectations of the leaders
  3. An reasonably structured environment in which to learn and grow
  4. A continuous cycle in which role models will coach and mentor the next generation

How did this look at Bonial?

Upon arriving I inherited a 40-ish strong engineering organization broken up into five teams, each headed by a “team lead.” The problem was that these team leads had no clear mandate or role, little or no leadership and management training, and essentially no power to carry out a mandate even if they’d had one.

This setup was intended to keep the organization flat and centralize the administrative burden of managing people so as to allow the team leads to focus on delivery. Unfortunately this put the leads in a largely figurehead role – they represented their teams and were somehow responsible for performance but had few tools to employ and little experience with which to effectively deploy them. They didn’t hire their people, administer compensation or manage any budgets. In fact, they couldn’t even approve vacations.  To this day it’s not clear to me, or them, what authority or responsibility they had.

This arrangement also created a massive chokepoint at the center of the organization – no major decisions were made without approval from “above”. The results were demoralized leads and frustrated teams.

Changing this dynamic was my first priority.  To scale our organization we’d need to operate as a federation of semi-autonomous teams, not as a traditional hierarchical organization.  For this we needed leads who could drive the changes we’d make over the coming years, but this would require a major shift in mindset.  After all, if I couldn’t trust them to approve vacations, why should I trust them with driving ROI from the millions of euros we’d be investing in their teams?  Engineers have the potential to produce incredibly valuable solutions; ensuring they have solid leadership is the first and most important responsibility of senior management.

We started with establishing a clear scope of responsibility and building our toolbox of skills.  I asked the leads if they were willing to “own” the team results and, though a little nervous, most were willing.  This meant they would now make the calls as to who was on the team and how those people were managed. They took over recruiting and compensation administration. They played a much stronger role in ensuring the teams had clarity on their mission and how the teams executed the mission. They received budgets for training, team events and discretionary purchases. And, yes, they even took responsibility for approving vacations.

We agreed to align around the leader-leader model espoused by David Marquet (https://www.davidmarquet.com/) in his book “Turn the Ship Around!  We read the book together and discussed the principles and how to apply them in daily practice.  The phrase “I intend to…” was baked into our vocabulary and mentality.  We eliminated top-down systems and learned to specify goals, not methods.  We focussed on achieving excellence, not just avoiding errors.  The list goes on.

I also started a “leadership roundtable” – 30 minutes each week where we’d meet in a small group and discuss experiences and best practices around core leadership and management topics: motivating people and teams, being effective, basic psychology, communicating, coaching and mentoring, discipline, recruiting, personal organizational skills, etc.  Over time, dozens of people – ranging from prospective team leads to product managers to people simply interested in leadership and management – participated in the roundtables, giving us a common foundation from which to work.

As I’ll share in a future article, we also created a career growth model that fully supported a management track as well as a technical track and, most importantly, the possibility to move back and forth freely between the two.  We encouraged people to give management a try and offered mentoring and support plus the risk-free option of being able to switch back to their former role if they preferred.  In the early days this was a tough sell – “team lead” had the reputation of being mostly pain with little upside.  Never-the-less a few brave souls gave it a shot and, to their surprise, found it rewarding (and have since grown into fantastic leads).

It wasn’t easy – we had a fair share of mistakes, failures and redos – but the positive effects were felt almost immediately. Over time this first generation of leads grew their teams and created cultures of continuous improvement. As the teams grew, the original leads mentored new leaders to take over the new teams and the cycle continued. As it stands today we have a dozen or so teams/squads led by capable leaders that started as software engineers, quality assurance pros, system engineers, etc. 

For what it’s worth, I believe the number one factor driving Bonial’s accelerated velocity was growth in leadership maturity. If you’re looking to engineer positive change, start here.

Some closing thoughts:

  • Positive change requires strong leadership.
  • A single leader can start the change process, but large-scale and enduring change requires distributed leadership (e.g. ”leader-leader”).
  • Formal training can be a great source of leadership and management tools, but mastering those tools requires time, a safe and constructive environment and active coaching and mentoring.
  • Growing a leadership team is not a linear or a smooth process.  The person driving and guiding the development must commit to the long game and must be willing to accept accountability for inconsistent results from first generation leads as they learn their trade.

Read part 3: Building Great Teams

Accelerated Velocity: How Bonial Got Really Fast at Building Software

My boss, Max (Bonial Group’s CEO), and I sat down recently for a “year-in-review” during which we discussed the ups and downs of 2017 as well as goals for the new year.  In wrapping up the conversation, I shared with him my gut feeling that velocity and productivity had improved over the past couple of years and were higher than they’d ever been at Bonial – perhaps as much as double when compared to 2014.  

He asked if I could quantify the change, so on a frigid Sunday a couple of weeks ago I sat down with a mug of hot tea and our development records to see what I could do. We’ve used the same “product roadmap” format since 1Q14 (described here), which meant I could use a “points” type approach to quantify business value delivered during each quarter.  As I was looking for relative change over time and I was consistent in the application, I felt this was a decent proxy for velocity.  

It took me a couple of hours but was well worth the effort.  Once I’d finished scoring and tabulating, I was pleasantly surprised to find that I’d significantly underestimated the improvements we’d made.  Here’s a high level overview of the results:

7X Velocity! Bonial team size, value delivered and productivity over time.

The net-net is that in 1Q 2018 we’ll be delivering ~630% more business value than we delivered in the first quarter of 2014, largely driven by the fact that each person on the team is ~250% more productive.  

Sweet.

The obvious next question: how did we do this?

The short answer is that there is no short answer.  There was no single magic button that we pushed to set us on this path to accelerated velocity; this was a long campaign that started small and grew, eventually spanning people, process, technology and culture.  Over time these learnings, improvements, changes and experiments – some large, some small, some successful, some not – built on each other and eventually created an environment in which the momentum sustained itself.  

Over the next few weeks I’ll summarize the major themes here in this blog for both myself as well as anyone who’s interested.  Along this journey I plan to cover (and will link when available):

  1. Building Leaders
  2. Building Great Teams
  3. Creating Situational Awareness
  4. Providing a Growth Path
  5. Clarifying Processes and Key Roles
  6. Enabling Independent Action
  7. Creating an Architecture Runway
  8. Optimizing the SDLC with DevOps
  9. Getting Uncomfortable
  10. Doing the Right Things
  11. Taking Ownership
  12. Building on What You’ve Got

Each of those topics could alone make for a small book, but I’ll try to keep the articles short and informative by focussing only on the most important elements.  If there’s an area in which you’d like me to dig deeper, let me know and I’ll see what I can do.  Assuming I get through all of those topics I’ll wrap things up with some final thoughts.

So let’s get started with part 2: Building Leaders

Special Forces Architecture

Architects scanning for serious design flaws

 

I’ve been spending some very enjoyable time recently with our architecture team working through some of the complexities that we’ll be facing in our next planning iteration.  Many of those topics make for interesting posts in their own right, but what I want to discuss in this post is the architecture team itself.  

Why?  Because I’m pretty happy with how we’ve evolved to “do” architecture here.  

And why is that noteworthy?  Because too many of the software architecture teams I’ve worked in, with or around have had operating models that sucked.  In the worst cases, the teams have been “ivory tower” prima donna chokepoints creating pretty diagrams with methodological purity and bestowing them upon the engineering teams. At the other end of the scale, I’ve seen agile organizations run in a purely organic more with little or no architectural influence until they ran up against a tangled knot of incompatible systems and technologies burdened with massive architectural debt.  And everything in between.

So, how do we “do” architecture at Bonial?  I think it helps to start with the big picture, so I brainstormed with Al Vilegas (our chief architect) and we came up with the following ten principals that we think clearly and concisely articulate what’s important when it comes to architecture teams:

  1. Architects/teams should think strategically but operate tactically.  They should think about future requirements and ensure there is a reasonable path to get there.  On the flip side, only just enough should be developed iteratively to meet the current requirements while leaving a reasonable runway. 
  2. Architects/teams should have deep domain expertise, deep technical expertise, and deep business context.   Yes, that’s a lot, but without all three it’s difficult to give smart guidance in the face of ambiguity – which is where architects need to shine.  It takes time to earn this experience and the battle scars that come with it; as such, I generally call BS when I hear about “architects” with only a few years of experience.
  3. Architects must be team players.  They should be confident but humble.  Arrogance has no place in architecture.  They should recognize that the engineering teams are their customers, not their servants and approach problems with a service-oriented mindset.  They should listen a lot.  
  4. Architects/teams should be flexible.  Because of their skills and potential for impact, they’ll be assigned to the most important and toughest projects, and those change on a regular basis.  
  5. Architects/teams should be independent and entrepreneurial.  They should always be on the lookout for and seize opportunities to add value.  They shouldn’t need much daily or weekly guidance outside of the mission goals and the existing/target enterprise architecture.  They should ask lots of questions and keep their finger on the pulse of the flow of technical information.
  6. Architects must practice Extreme OwnershipThe should embrace accountability for the end result and expect to be involved in projects from start to finish. This means more often than not that they will operate as part of the specific team for the duration of the project.  They may also assist with the implementation, especially the most complex or most strategic elements.  “You architect it, you own it.”
  7. Architects/teams should be solid communicators.  They need to be able, through words, pictures and sometimes code, to communicate complex concepts in a manner that is understood by technical and non-technical people alike.  
  8. Architects/teams should be practical.  They need to be pragmatic and put the needs of the business above technical elegance or individual taste.  “Done is better than perfect.”
  9. Architects/teams should be mentors.  They should embrace the fact that they are not only building systems but also the next generation of senior engineers and architects.  
  10. Architects/teams must earn credibility and demonstrate influence.  An architect that has no impact is in the wrong role.  By doing the above this should come naturally.

If you take the principles above and squint a little bit, you’ll see more than a few analogs to how military special forces teams structure themselves and operate, as illustrated below: 

High-performing Architecture Teams Military Special Forces Teams
Small Small
Technical experts and domain specialists Military experts and domain specialists
Extensive experience, gained by years of practice implementing, fixing and maintaining complex systems Extensive experience, gained by years of learning through intense training and combat
Flexibly re-structure according to the mission Flexibly re-structure according to the mission
High degree of autonomy under the canopy of the business goals and enterprise architecture High degree of autonomy under the canopy of the mission and rules of engagement
Often join other teams to lead, support, mentor and/or be a force multiplier Often embed with other units to lead, support, mentor and/or be a force multiplier
Accountable for the end results Accountable for the mission success

Hence the nickname I coined for this model (and the title of this post): “Special Forces Architecture.”  

How does this work in practice?  

At Bonial, our 120 person engineering team has two people with “architect” titles, but another half dozen or so that are working in architecture roles and are considered part of the “architecture team.“  An even broader set of people, primarily senior engineers, regularly attend a weekly “architecture board” where we share plans and communicate changes to the architecture, generally on a weekly basis.  We recognize that almost everyone has a hand in developing and executing our architectural runway, so context is critical.  To paraphrase Forest Gump: “Architect is as architect does,” so we try to be as expansive and inclusive as possible in communicating.

The members of the architecture team are usually attached to other teams to support major initiatives, but we re-assess this on a quarterly basis to make sure the right support is provided in the right areas.  In some cases, the architecture team itself self-organizes into a project team to develop a complex framework or evaluate a critical new capability.  

Obviously there’s a lot going on – we typically have 8-10 primary work streams each with multiple projects – so the core architecture team can’t be closely involved with every project.  To manage the focus, we use a scoring system from 1-5 (1 = aware, 3 = consulting, 5 = leading) for what level of support or involvement is provided to each team or initiative.  In all cases, the architects need to ensure there’s a “big picture” available (including runway) and communicated to all of the engineers responsible for implementing.

For example, right now we have team members embedded in our critical new content management and publishing platform and our Kraken data platform.  We have one person working on a design and PoC updating the core user model.  Several members of the team are also designing and prototyping a new framework for managing machine learning algorithm lifecycles and testing.  And a few people have individual assignments to research or prepare for future runway topics.  In this way we expect to stay just far enough in front of the rest of the team to create runway without wasting time on phantom requirements.

Is this model perfect?  No.  But perfection isn’t our goal, so we optimize for the principles above – adaptability, autonomy, expertise, ownership, impact – and build around that.  Under this model, the Bonial platform has gone from a somewhat organic collection of monolithic apps running on a few dozen servers to a coherent set of domains consisting of APIs, micro-services, legacy systems and complex data stores running on hundreds of cloud instances across multiple continents.  I have my doubts that this would have happened in some of the more traditional models. 

I’m happy to answer questions about this model and talk about the good and the bad.  I’d also love to hear from you – what models have worked well in your experience?

Fear Factor: Guns vs Burgers

 

In my previous article I analyzed the statistics around terrorism vs gun deaths and found that, at least as of 2015,  Americans have a higher probability of dying at the hands of another American with a gun than a European has of being killed in a terror attack.  I also noted that the risk seemed to be inversely proportional to the fear.

Stepping back further, let’s look more broadly at how terrorism and gun deaths compare to other preventable causes of death:

At over 480,000 deaths per year, smoking dwarfs the deaths caused by either guns or terrorism in the US (even, it must be noted, when considering the ~3,000 deaths caused by the 9/11attacks).  Obesity is rapidly overtaking smoking at 374,000 and rising.  It seems that Americans should fear Big Macs and Marlboros far more than terrorists.  

As with guns vs terror, I can’t help but note that the fear factor seems to be almost directly inverse to the risk – Americans seem to be mortified by terrorists, afraid of guns, and relatively indifferent to the rest.  Why so irrational?  I’m afraid that answering that question is likely out of the realm of data science and more in the realm of psychology or evolutionary biology.  It does call to mind, however, the argument made by Levitt and Dubner in Freakonomics when discussing the statistics around swimming pools vs guns.  If I recall correctly they use the term “dread” to describe the emotion that drives some irrational choices.  Perhaps the same thing is going on here – the idea of a truck slamming into a joyful Christmas market creates more dread than the somewhat abstract idea of dying from obesity or smoking.  

One final observation about guns in America – people are horrified when mass shootings happen but as a society they choose to do nothing to prevent the next massacre.  This is in marked contrast to terrorism in which the same people are willing to spend billions of dollars, close the borders and sacrifice civil liberties to prevent the next terrorist attack.  It seems to me the dread factor is high in both cases, but the response is highly asymmetrical.  Perhaps a topic for another day…

Fear Factor: Guns vs Terrorism

I‘ve been pretty quite here recently – some intense projects and my travel schedule haven’t left me much time to write.  I do have a few half-written posts that I’ll try to finish up soon.  In the meantime, here’s a short series that veers a bit from pure technology and into the interconnected realm of data analytics and social sciences…

A few weeks ago I was talking to a family member in the U.S. (I’m a U.S. citizen currently living in Germany) and we were discussing the recent spate of weather and other natural disasters that were hammering the states. When we were done he said, “Well as crazy as it is here I’d take this any day over what you’re dealing with.”

I was a bit confused, and asked what disaster he was referring to. He clarified, “No, I mean all of the terrorists driving trucks into crowds and setting off bombs on trains and stuff.”

Ah, right. I’ve heard similar statements several times since I moved to Europe and never quite understood them – after all, while horrific, the sheer number of terror related deaths in either Europe or the U.S. is in the dozens or low hundreds; I was pretty confident that the probability of being a victim of a terrorist is far lower than many other forms of violent crime or preventable death. I replied, “You know, there are more gun deaths each day in the US than terrorism deaths in Europe every year. What you should be afraid of is walking out your door.”

Not surprisingly, we agreed to disagree and the conversation ended cordially. However, it got me thinking: Was I right that someone in Europe is less at risk from an Islamic (or other radical) terrorist than an American is from another American with a gun?  If not, why is the fear factor from terrorism so much greater than gun violence?

The first question sounded like a straightforward data analytics exercise, so I busted out a Jupyter notebook to explore, grabbed some data and challenged the hypothesis.

To analyze terrorism I chose the Global Terrorism Dataset (GTD), a very comprehensive collection of worldwide terrorism over the last half century. The gun violence datasets were harder to come by, in part due to the successful lobbying efforts by the National Rifle Association (NRA) which blocks government research on gun violence, so I chose to work with the Centers for Disease Control (CDC) Multiple Causes of Death dataset which classifies all deaths in the US, including deaths by firearms. The latest year that the GTD and CDC set fully overlap is 2015, so I chose that as the year to focus on.

Terrorism 

Let’s start by looking at terrorism.  Worldwide, there was a significant spike in terrorism over the most recent decade, with the vast majority of the increase coming from the middle east, Africa, and south Asia.

 

If we zoom into this decade and look only at the US and Western Europe, this is what we see:

Look at the Y axis on both of the above graphs – it’s clear that it’s much safer to be in Europe or the US that many other parts of the world (two orders of magnitude safer). While Europe has seen a relative spike in terrorism related deaths since the end of 2015, it also has roughly double the population of the US so to get a better picture of how this compares to US deaths we need to look at deaths per million residents. Here’s what we get:

2015 terror deaths EU: 171.0 total, or 0.23 per million residents
So in 2015 a European had roughly a 1 in 4,000,000 chance of dying in a terrorist attack. That sounds pretty small.  Just out of curiosity, I wonder how that compares to terrorist attacks on American soil:

2015 terror deaths US: 44.0 total, or 0.14 per million residents
I hate to write this because some knucklehead will quote it out of context, but on the surface Europeans have roughly twice the probability of being terror victims than Americans when adjusted for population (in 2015 at least). But that’s like saying a person is twice as likely to be killed by a bear than by a shark – both numbers are so low that doubling either is still a low number.  (In fact, the odds of dying in a shark or bear attack aren’t too far off than death by a terrorist, but that’s for another article.)
Let’s look at the other side of the problem.

Gun Deaths in the U.S. (round 1)

Ok, how does that compare to the risk of dying from a gun in the US? Here’s a high-level breakdown of US gun deaths in 2015:

suicide     22060
homicide    13018
accident      489
other         284

The rough numbers/ratios above have been quoted quite a bit over recent years – roughly 35K gun deaths per year with ~1/3 homicides and ~2/3 suicides – so no big surprises there. 
 
Since terror attacks are essentially homicides, let’s look at gun homicides per million so we can compare with the terrorist threat:

2015 gun homicides US: 13018 total, or 40.29 per million residents
So, at ~40 gun homicides per million residents, an American is ~175x more likely to die from a gun homicide in the US than a European is from a terrorist in Europe.  Hmm.
 
But… it could be argued that this isn’t a fair comparison.  I’ve heard several arguments that have gone something like this: “Terrorists tend to strike random, killing innocent, unsuspecting victims.  U.S. gun violence mostly happens in places like Chicago, St Louis and Detroit and involves gangs and criminals.  In other words, U.S. gun violence is about ‘them’, and we’re not ‘them’.”
 
So how can we whittle the dataset down to “not them”?

Gun Deaths in the U.S. (round 2) 

Let’s see what we can find as we drill into the CDC data…

On an absolute basis, American men are ~6X more likely to be victims of gun violence, while on a percentage basis, men and women have similar levels of root cause, with suicide being the major contributor.

How about race?

The differences here are striking – blacks and hispanics are far more likely to die from homicide while whites are overwhelmingly likely to take their own life. To get a different perspective, let’s look at this on a percentage basis:

Again, some striking differences in intent between different racial groups.  (My gut tells me the homicide rate roughly correlates with average income level, but that’s an analysis for another day.)
 
Ok, maybe education plays a role, either directly or as a proxy for socio-economic status:

Again, a pretty strong correlation.

And now let’s look at age. Here are two views, one broken down by intent and the other by race:

(Note: the bump around 50 is due to a spike in white male suicide…  Remember, remember the month of Movember…)

While tragic, the suicides, accidents and undetermined cause events aren’t relevant to this analysis so we’ll exclude those to focus exclusively on homicides and revisit the age vs race graph in this light:

So, it appears that gun deaths skew heavily towards young black and hispanic males without college degrees.  It feels wrong removing men from the equation since most of the comments I’ve heard relating to this hypothesis have come from men, so let’s just filter on the other dimensions and look at whites over 30 with college degrees:

2015 gun homicides US (white, over 30, college degree): 392 total, or 1.21 per million residents

So even this limited demographic is still ~5X more likely to die from gun in the US than a European is from a terrorist attack.

Conclusion

Ok, let’s review:

  • In 2015 a person in Europe had less than one in a million chance of being killed by a terrorist.
  • That same year, a person in the U.S. had a probability up to 40 in a million of being killed by another American with a gun.

At this point I think I can be pretty confident that my original hypothesis is correct: an American is at much higher risk of being killed by another American with a gun than a European is of being killed by a terrorist.

In the course of exploring this data I have to admit I was surprised at some of the things I found and want to explore them further – for example:

  • What’s going on with terrorism in the rest of the world?
  • How does the casualty rate from guns and terrorism compare with other preventable deaths?
  • Why is the fear factor orthogonal to the reality of the actual risks? 

Stay tuned.

(If interested, you can look at the code of the analysis on this Kaggle kernel.)

 

What I learned from writing an AI voice assistant and chat bot

I have a confession: despite being in management I still love to code.  Since I don’t get to program as much as I’d like or stay up on the latest trends and technologies, I set a goal for myself to learn at least one new technology every year (and more than one on a good year).  This learning hobby is how I made the leap from back-end to full-stack developer, how I learned iOS and Android, and how I stepped into the hallowed halls of Data Science.

This year I decided to explore chat bots and voice assistants.  As I learn best by doing, I generally think up a fun or useful project and then learn through building it.  For this project I decided to tackle an unending source of stress in my household: bickering and arguing over screen time for our kids.  

Enter ChronosBot

The idea behind ChronosBot is simple.  Parents set up screentime accounts for each child as well as an an automatic allowance that puts time in the accounts.  After linking their account to Alexa, Google Assistant, Facebook Messenger, etc., they can say or write things like, “Alexa, ask ChronosBot to withdraw 30 minutes from Axel’s account” or “… what’s everyone’s balance?”

With the idea in place, I had to choose my tech stack.  Google has a robust platform built on API.AI.  API.AI supports a dozen or so chat integrations (Allo, Messenger, Telegram, Kik, etc.) as well as a voice interface for Google Home, allowing developers to (theoretically) write one interface for both voice and chat.  At the time I started, Amazon Alexa had a rudimentary platform for speech dialog development using structures text.  In both platforms the interface designer creates “intents” that match what the user says to something the bot can do and then provides appropriate responses, and both platforms hand off the business logic to a backend app using web hooks.

For the backend, I decided to sharpen my python skills and implement in Django on top of Postgres.  For deployment I decided to give Heroku a try.  

Development of the basic use cases took me a couple of weeks in the late evening and weekends.  I submitted to both Amazon and Google and waited for a week or so in each case for the review.  Both rejected my app, but for reasons that I hadn’t expected.  Amazon told me that my app violated the Alexa ToCs because it “targeted children” (huh?) and told me to not resubmit the app ever again (seems they relented).  Google gave me the boot because my invocation name couldn’t be recognized properly but a very helpful person from Google worked with me to resolve the issue and now it’s live.  

I’ve since continued development and added new features like “rewards” and “penalties” (requested by my wife) and “mystery bonus” (requested by the kids).  I’ve enabled Telegram and Messenger and have adapted the platform to support both visual and audio surfaces.  And the Alexa version was finally approved earlier this week.

Lessons Learned

So, what have I learned while navigating the ins and outs of the Google and Alexa development platform and publication process? 

1)  Amazon and Google have very different approaches.  Google has taken the bold approach of enabling all community developed actions and using an intent matching algorithm to route users to the correct action.  Amazon requires users to enable specific skills via a Skill Store.  In both cases, discovery is a largely unsolved challenge.

2)  Too early to tell who will be king.  Amazon Alexa has a crazy head start, but Google seems to be a more robust speech development platform.   With a zillion Android devices already on the market one certainly can’t count them out.  On the other hand not a month seems to go by without a new Alexa form factor hitting the market.  

3)  It’s early days.  Both platforms are being developed at a lightning fast pace.  Google had a big head start with API.AI.  The original Alexa interface was frustratingly primitive, but they’ve since upgraded to a new UI (which suspiciously bears a strong resemblance to API.AI) that has great promise.  

I have to take my hat off to both companies for creating a paradigm and ecosystem that makes voice assistant and natural language development accessible to the broader development community.  It’s so straight forward that even my kids gave it a try – my daughter (10) developed “The Oracle” that answers deeply profound questions like “Who’s awesome” (she’s awesome).  My son (12) wrote a math quiz game with which he is happy to challenge anyone to beat his top score. 

4)  Conversational UX is easy; good conversational UX is really hard.  I’ve known this since I was involved with Nuance and the voice web in the late 1990’s (and I also happen to be married to an expert in the space).  Making it easy to build a conversational UX is a very different thing than helping developers build a high quality conversational UX (especially a Voice UX).  Both Amazon and Google have tried to address this with volumes of best practice documentation, but I expect most developers will ignore it.

5)  Conversational UX is limited.  There are some use cases that work for serial interactions (voice or chat) and some that work better in parallel interactions (visual).  Trying to force one into the other typically doesn’t make sense or only applies to “desperate users”.  You see the effect of this to some degree already in the Alexa Skill Store – there are some clear clusters evolving (home automation, information retrieval, quiz games).

6)  Multi-modal UX is the next natural step.  I’m very excited about the Amazon Echo Show as I expect that will unleash a wave of interesting multi-modal interaction paradigms.  

7)  It’s fun.  There’s just something about the natural language element of voice assistants that allows for a richer, more human interaction than what GUIs can provide.  

All in all I’m really excited about the potential of this space, and I’m not alone – just look at the growth of the Alexa Skills Store.  The tech press is also taking a critical look at these capabilities (e.g. a recent article featuring yours truly) and I expect most companies are at least thinking about how these capabilities will play in their business.  My company, Bonial, is investing in several actions/skills to explore the potential of voice and chat interfaces.  To date we’ve already launched a bot that allows users to search for local deals and will shortly launch a voice assistant interface to our shopping list app, Out of Milk.  We’ve learned a lot and we’ll share more on those projects in other posts.  

How we Plan at Bonial (part 3)

Collaborative digital stickies board that we use for planning.

Ok, after all that, how do we actually plan at Bonial?

The heart of our planning activities is the Quarterly Planning which is loosely modeled on Program Implement (PI) Planning from SAFe.  During quarterly planning / PI planning, everyone in the product development organization – developers, designers, architects, testers, product managers, operations specialists, designers, etc. – get together for a couple of days to map out their next phase.  We do our planning during the previous quarter’s HIP (Hardening Innovation and Planning) sprint, which is sprint 6 of each quarter.

Before I dive into the actual planning days, I should point out that the preparations start several weeks before when the product teams actively work with stakeholders, customer facing teams and the executive team to validates the backlogs against the current company priorities and business realities.  The prep phase looks something like this:

  • The senior management team and product strategy board review the overall strategy and primary business goals to assess if any change in focus is needed.  
  • Next we make sure that product and delivery management has the same level of clarity. We get the delivery leads and product owners together and communicate the company goals for the upcoming quarter to them, taking the time to answer questions about strategy, challenges, current market trends etc. Our goal here is to make sure that all our leaders are able to bring clarity to their teams so that local decisions are made with the right context.
  • 3-4 weeks before the planning event, the product management team starts curating the backlogs for the different product and system streams.  They create a “long list” of major features and work items and meet with stakeholders, customers and Bonial management to validate priorities. 
  • A week before planning the “long lists” are reduced to “short lists” of the highest priority items. This is probably the hardest part of the process and it requires saying “no” to things… we find that our stakeholders and customers all agree that discipline is needed so long as it mostly impacts other stakeholders and customers.  Over the years we’ve tried various formal mechanisms to prioritization – Weighted Short Job First, Feature Bucks, etc. – but in the end we find that different tools are needed for different situations and that, with experience, people often people intuitively know the order.
  • Over the next week the product team spends time working through open questions and details while architects and engineers do the same on the technical side.  There’s also generally some intense discussions about “bubble” items – features that are right on the cusp of making the list – as well as hot items that didn’t make the list.

I wish I could say that this process was easy.  The truth is that a great deal changes in three months – new opportunities and challenges, unexpected curveballs – so we’re constantly challenged to re-assess our priorities with each planning cycle.  On top of that there’s a lot we want to do, so we find ourselves often having hard discussions up until the planning day, especially around the “bubble items”.  It’s not clear to me that there’s a much easier way – we’re in a fast industry and a complex business – but we try to get better each quarter.

So the primary inputs to planning are a short, discreet, prioritized set of epic-sized initiatives for each team.  Most of these are functional but there are usually some architectural or operational topics as well.  That brings us now to the actual planning days (typically a Th/F):

  • On planning day 1, we start with a team breakfast at 0900 and then a kickoff presentation at 0930.  The kickoff presentation covers the big picture goals for the quarter and a quick review of each team’s focus and top items so everyone has context.  We also cover logistics – where they can find flip-charts and stickies, who’s in which rooms, etc.
  • Following the kickoff (and the kitchen cleanup), the teams go to their planning spaces and get started.  Basically, they start with the top priority item, plan it through to completion, and then repeat with the next item.  Once they get to the allocated capacity they stop planning.  The remaining items simply don’t get done.
Teams plan with flip charts for each sprint and colored stickies for tasks, milestones, etc.
  • “Full capacity” is an interesting and oft debated question.  We have a loose agreement that teams should reserve ~20% for bugs and team discretion and should reserve another ~20% for refactoring and architecture work. 
  • As the teams are planning they’re also working with other teams on inbound and outbound dependencies.  We’ve organized the teams to minimize dependencies but they’re still a fact of life.  The teams negotiate how to support each other based on overall priorities and goal (ref. the “context” from the breakfast).  Any un-resolved conflicts are escalated or raised at the review meeting (below).
  • At 4PM on the first day the scrum masters and other delivery managers get together to share their current plans with the group.  We use a web-based collaboration tool that allows each team to put virtual stickies on their assigned row with different colors illustrating milestones, spikes, tasks, releases, etc.  Dependencies are made visible by connecting two stickies with a line.  
Teams gather to review the day 1 draft plan.
  • Putting everything together allows us to visualize the major streams, see what made the cut and what didn’t, and address any dependency challenges or conflicts.  Generally there are several to-dos coming out of the review, primarily around working through dependencies or going to business stakeholders for clarification.
  • The morning of day 2 is primarily for making adjustments from the previous day, collaborating with other teams where combined efforts are needed and tying up loose ends.  Most teams wrap this up pretty early and then get back to their HIP sprint, others need most or all of the day.  
  • At 4PM on day two we grab a beer and get back together in front of the stickies board to review any changes from the previous day and discuss any unresolved conflicts.  This exercise typically goes much faster than the day 1 review.  At the end we check confidence and then head home for a much needed break.

Here’s the final plan from last quarter.  

Q2 final plan

It looks complex and it is complex.  Without developing our process, our teams and ourselves over the last couple of years we’d be hard pressed to effectively manage this complexity.

Following the planning we package up the plan and communicate a high level, consumable version for to the business and stakeholders.  We emphasize that these are our current targets and best estimates – this isn’t a contract.  We’ll do everything we can to stick to it but we may be surprised or, in good agile fashion, we may decide to make changes as the situation evolves.

So that brings us full nearly full circle.  I started this series during our last planning days and expected it to be a quick post.  As I pulled the thread, however, I realized how much work had gone into our evolution in this area.  I could also see that a high-level flyover would leave huge gaps in the journey, so I decided to fly lower.   

You can see by now that undertaking a journey like this takes a fair amount of time, experience and honest self-evaluation, regardless of the specific methodology you choose.  That said, the investment is worth it, and a great deal of value can be realized even early in the process.

In Bonial’s case, we had a few advantages as we set off on the journey.  First, everyone was open to change, even when the change made them nervous.  The importance of this can’t be overstated.  I’ve lost count of the organizations I’ve worked with in which the teams had no motivation to improve (though paradoxically most of them complained constantly about the status quo).  In the end the team has to want or at least be willing give it shot.  Which brings us to point two…

Second, we had good people and a healthy culture.  Where we lacked in experience and skills, we more than compensated by having a team of smart, energetic professionals.  With good people, you can generally solve any problem. 

Last, but not least, we have a skilled, SAFe-trained Release Train Manager to drive the process (though her role has evolved).  Even the finest orchestras of the world don’t play on they own- they have a conductor.  In our case the conductor/RTE ensures:

  • The stage is set. Everybody knows the timing, their roles and the rules of the game and All the needed supplies are in place and easily accessible to everybody.
  • Short (really short!) list of candidates for planning is finalized before we start.  The RTE ensure we’re observing Work in Progress (WIP) constraints, which are critical to maximizing throughput.  As she often says, “Let’s stop starting things and start finishing things instead.”  
  • People know who to go to regarding priorities and impediments during planning.
  • The planning is properly wrapped up, all roadmaps and agreements put together, and outcomes are properly communicated to all key stakeholders.
  • Solid retrospectives are done both on the quarter itself as well as the planning process so we can continue improving.

Whew!  That was a lot of writing for me and reading for you.  Kudos if you made it this far – I hope it was worth it.  So now you know how we do it – feel free to share your own stories about how you and your teams plan.  Best of luck in your own journey!

(Special thanks to Irina Zhovtobrukh (the mysterious RTE) for her contributions to this post as well as teaching us how to “conduct” better planning evolutions.)

How we Plan at Bonial (part 2: competence)

In the previous two posts I talked about the importance of clarity and control, but even perfect clarity and unlimited control will likely still lead to failure and frustration if the team isn’t ready to take on these new responsibilities. That’s where Competence comes in.

To build competence across the team we invested in experienced practitioners as well as training and mentoring. We hired a talented SAFe-trained development manager (“Release Train Engineer” in SAFe parlance) to both lead our transformation as well as provide training and mentoring.  We brought in agile and SAFe trainers for multi-day training sessions on team and enterprise agile (more on SAFe in later posts).  We started leadership and management training for our product owners, new team leads and lead developers. The more experienced members of the team actively coached others in best practices.

Why go through all this trouble?  Simple – a common source of failure I’ve seen over the years is this: the fantasy that calling something ‘agile’ somehow makes it agile.  Too often I’ve seen organizations slap on the label of “scrum teams,” appoint a newly hired Scrum Master or Agile Coach, tell them to have stand-ups and sprints, and then hope that “agile happens”… a.k.a. “fake it until you make it”.  Good luck.  Like it or not, you have to invest in training, excellent people and experienced leadership.

A word of advice: don’t skimp on the training. Our first training session involved a half-day session for only key leaders. As we quickly learned, that’s not training – that’s just a teaser.  Frankly I was part of the problem – I needed to shift my attitude and accept that, unless the whole team is on-board and up-to-speed, we’d never be able to run a full speed.  Yes, it was expensive in both time and money, but necessary.  We’ve since opened up both the breadth and depth of the training.

We also learned by doing. We built on a strong culture of open and honest retrospectives and we actively shared the learnings between teams. We experimented with new techniques and, when they worked, spread them throughout the organization. We actively cultured an environment of “low fear” so that people had space to learn and grow.

As a management team, we also worked hard to “specify goals, not methods” as part of the shift away from the Roadmap Committee described in the previous post. Why is this a competence topic? Because by forcing ourselves to stay out of the details we provided space for the teams to learn and grow. This also opened up room for lots of great ideas that may never have been voiced in a top-down approach.

Key takeaway: invest in training and regular, iterative experiential learning. Put your teams in positions where they need to stretch their knowledge and experience so that they have the context and confidence going forward to execute the mission (but actively support them as they learn).  And, as always, hire and retain great people.

One thing before we get back to the original topic – as I re-read these last three posts I can see how a reader might get the sense that we executed smoothly via a carefully orchestrated plan.  Not so.  There was trial-and-error, plenty of course adjustments and a mix of successes and failures.  That’s ok – it takes time.  What’s important is keeping your eye on the ultimate goal, being realistic and working together as a team to make it happen.

Ok, after a long detour through the background, back to the original topic…