Accelerated Velocity: Situational Awareness

“If a product or system chokes and it’s not being monitored, will anyone notice?”  Unlike the classic thought experiment, this tech version has a clear answer: yes.  Users will notice, customers will notice, and eventually your whole business will notice. 

No-one wants their first sign of trouble to be customer complaints or a downturn in the business, so smart teams invest in developing “situational awareness.” What’s that?  Simple – situation awareness is the result of having access to the tools, data and information needed to understand and act on all of the moving factors relating to the “situation.”  This term is often used in the context of crisis situations or other fast-paced, high-risk endeavors, but it applies to business and network operations as well.

Product development teams most definitely need situational awareness.  The product managers and development leads need to know what their users are doing and how their systems are performing in order to make wise decisions – for example, should the next iteration focus on features, scale or stability.  Sadly, these same product teams often see the tracking and monitoring that is needed for developing situational awareness as “nice-to-have’s” or something to be added when the mythical “someday” arrives. 

The result?  Users having good or bad experiences and no-one knowing either way.  Product strategy decisions being made on individual bias, intuition and incomplete snippets of information.  Not good.

Sun Tzu put it succinctly:

“If you know neither the enemy nor yourself, you will succumb in every battle.”

Situational awareness is a huge topic, so in this series I’m going to limit my focus to data collection (tracking and monitoring) and insights (analytics and visualization) at the product team level.  For the purposes of this series I’ll define ”tracking” as the data and tools that show what users/customers are doing and “monitoring” as the data and tools that focus on systems stability are performance.  Likewise I’ll use “analytics” to refer to tools that facilitate the conversion of data into usable intelligence and “visualization” as the tools for making that intelligence available to the right people at the right time.  I’ll cover monitoring in this article and tracking in a later article.

At Bonial in 2014 there was a feeling that things were fine – the software systems seemed to be reasonably stable and the users appeared happy.  Revenue was strong and the few broad indicators viewed by management seemed healthy.  Why worry?   

From a system stability and product evolution perspective it turns out there was plenty of reason to worry.  While some system-level monitoring was in place, there was little visibility into application performance, product availability or user experience.  Likewise our behavioral tracking was essentially limited to billing events and aggregated results in Google Analytics.  Perhaps most concerning: one of the primary metrics we had for feature success or failure was app store ratings.  Hmmm.

I wasn’t comfortable with this state of affairs.  I decided to start improving situational awareness around system health so I worked with Julius, our head of operations, to lay out a plan of attack.  We already had Icinga running at the system level as well as DataDog and Site24x7 running on a few applications – but they didn’t consistently answer the most fundamental question: “are our users having a good experience?” 

So we took some simple steps like adding new data collectors at critical points in the application stack.  Since full situation awareness requires that the insights be available to the right people at the right time, we also installed large screens around the office that showed a realtime stream of the most important metrics.  And then we looked at them (a surprisingly challenging final step). 

The Bonial NOC Monitor Wall
One of my “go to” overviews of critical APIs, showing two significant problems during the previous day.

The initial results weren’t pretty.  With additional visibility we discovered that the system was experiencing frequent degradations and outages.  In addition, we were regularly killing our own systems by overloading them with massive online marketing campaigns (for which we coined the term: “Self Denial of Service” or SDoS).  Our users were definitely not having the experience we wanted to provide.

(A funny side note: with the advent of monitoring and transparency, people started to ask: “why has the system become so unstable?”)

We had no choice but to respond aggressively.  We set up more effective alerting schemes as well as processes for handling alerts and dealing with outages.  Over time, we essentially set up a network operations center (NOC) with the primary responsibility of monitoring the systems and responding immediately to issues.  Though exhausting for those in the NOC (thank you), it was incredibly effective.  Eventually we transferred responsibility for incident detection and response to the teams (“you build it you run it”) who then carried the torch forward.

Over the better part of the next year we invested enormous effort into triaging the immediate issues and then making design and architecture changes to fix the underlying problems.  This was very expensive as we tapped our best engineers for this mission.  But over time daily issues became weekly became monthly.  Disruptions became less frequent and planning could be done with reasonable confidence as to the availability of engineers.  Monitoring shifted from being an early warning system to a tool for continuous improvement. 

As the year went on the stable system freed up our engineers to work on new capabilities instead of responding to outages.  This in turn became a massive contributor to our accelerated velocity.  Subsequent years were much the same – with continued investment in both awareness and tools for response, we confidently set and measure aggressive SLAs.  Our regular investment in this area massively reduced disruption.  We would never have been able to get as fast as we are today had we not made this investment.

We’ve made a lot of progress in situational awareness around our systems, but we still have a long way to go.  Despite the painful journey we’ve taken, it boggles my mind that some of our teams still push monitoring and tracking down the priority list in favor of “going fast”.  And we still have blind spots in our monitoring and alerting that allow edge-case issues – some very painful – to remain undetected.  But we learn and get better every time.

Some closing thoughts:

  • Ensuring sufficient situational awareness must be your top priority.  Teams can’t fix problems that they don’t know about.
  • Monitoring is not an afterthought.  SLAs and associated monitoring should be a required non-functional requirement (NFR) for every feature and project.
  • Don’t allow pain to persist – if there’s a big problem, invest aggressively in fixing it now.  If you don’t you’ll just compound the problem and demoralize your team.
  • Lead by example.  Know the system better than anyone else on the team.

 

In case you’re interested, here are some of the workhorses of our monitoring suite:

 

What I learned from writing an AI voice assistant and chat bot

I have a confession: despite being in management I still love to code.  Since I don’t get to program as much as I’d like or stay up on the latest trends and technologies, I set a goal for myself to learn at least one new technology every year (and more than one on a good year).  This learning hobby is how I made the leap from back-end to full-stack developer, how I learned iOS and Android, and how I stepped into the hallowed halls of Data Science.

This year I decided to explore chat bots and voice assistants.  As I learn best by doing, I generally think up a fun or useful project and then learn through building it.  For this project I decided to tackle an unending source of stress in my household: bickering and arguing over screen time for our kids.  

Enter ChronosBot

The idea behind ChronosBot is simple.  Parents set up screentime accounts for each child as well as an an automatic allowance that puts time in the accounts.  After linking their account to Alexa, Google Assistant, Facebook Messenger, etc., they can say or write things like, “Alexa, ask ChronosBot to withdraw 30 minutes from Axel’s account” or “… what’s everyone’s balance?”

With the idea in place, I had to choose my tech stack.  Google has a robust platform built on API.AI.  API.AI supports a dozen or so chat integrations (Allo, Messenger, Telegram, Kik, etc.) as well as a voice interface for Google Home, allowing developers to (theoretically) write one interface for both voice and chat.  At the time I started, Amazon Alexa had a rudimentary platform for speech dialog development using structures text.  In both platforms the interface designer creates “intents” that match what the user says to something the bot can do and then provides appropriate responses, and both platforms hand off the business logic to a backend app using web hooks.

For the backend, I decided to sharpen my python skills and implement in Django on top of Postgres.  For deployment I decided to give Heroku a try.  

Development of the basic use cases took me a couple of weeks in the late evening and weekends.  I submitted to both Amazon and Google and waited for a week or so in each case for the review.  Both rejected my app, but for reasons that I hadn’t expected.  Amazon told me that my app violated the Alexa ToCs because it “targeted children” (huh?) and told me to not resubmit the app ever again (seems they relented).  Google gave me the boot because my invocation name couldn’t be recognized properly but a very helpful person from Google worked with me to resolve the issue and now it’s live.  

I’ve since continued development and added new features like “rewards” and “penalties” (requested by my wife) and “mystery bonus” (requested by the kids).  I’ve enabled Telegram and Messenger and have adapted the platform to support both visual and audio surfaces.  And the Alexa version was finally approved earlier this week.

Lessons Learned

So, what have I learned while navigating the ins and outs of the Google and Alexa development platform and publication process? 

1)  Amazon and Google have very different approaches.  Google has taken the bold approach of enabling all community developed actions and using an intent matching algorithm to route users to the correct action.  Amazon requires users to enable specific skills via a Skill Store.  In both cases, discovery is a largely unsolved challenge.

2)  Too early to tell who will be king.  Amazon Alexa has a crazy head start, but Google seems to be a more robust speech development platform.   With a zillion Android devices already on the market one certainly can’t count them out.  On the other hand not a month seems to go by without a new Alexa form factor hitting the market.  

3)  It’s early days.  Both platforms are being developed at a lightning fast pace.  Google had a big head start with API.AI.  The original Alexa interface was frustratingly primitive, but they’ve since upgraded to a new UI (which suspiciously bears a strong resemblance to API.AI) that has great promise.  

I have to take my hat off to both companies for creating a paradigm and ecosystem that makes voice assistant and natural language development accessible to the broader development community.  It’s so straight forward that even my kids gave it a try – my daughter (10) developed “The Oracle” that answers deeply profound questions like “Who’s awesome” (she’s awesome).  My son (12) wrote a math quiz game with which he is happy to challenge anyone to beat his top score. 

4)  Conversational UX is easy; good conversational UX is really hard.  I’ve known this since I was involved with Nuance and the voice web in the late 1990’s (and I also happen to be married to an expert in the space).  Making it easy to build a conversational UX is a very different thing than helping developers build a high quality conversational UX (especially a Voice UX).  Both Amazon and Google have tried to address this with volumes of best practice documentation, but I expect most developers will ignore it.

5)  Conversational UX is limited.  There are some use cases that work for serial interactions (voice or chat) and some that work better in parallel interactions (visual).  Trying to force one into the other typically doesn’t make sense or only applies to “desperate users”.  You see the effect of this to some degree already in the Alexa Skill Store – there are some clear clusters evolving (home automation, information retrieval, quiz games).

6)  Multi-modal UX is the next natural step.  I’m very excited about the Amazon Echo Show as I expect that will unleash a wave of interesting multi-modal interaction paradigms.  

7)  It’s fun.  There’s just something about the natural language element of voice assistants that allows for a richer, more human interaction than what GUIs can provide.  

All in all I’m really excited about the potential of this space, and I’m not alone – just look at the growth of the Alexa Skills Store.  The tech press is also taking a critical look at these capabilities (e.g. a recent article featuring yours truly) and I expect most companies are at least thinking about how these capabilities will play in their business.  My company, Bonial, is investing in several actions/skills to explore the potential of voice and chat interfaces.  To date we’ve already launched a bot that allows users to search for local deals and will shortly launch a voice assistant interface to our shopping list app, Out of Milk.  We’ve learned a lot and we’ll share more on those projects in other posts.  

The Micro-service Conundrum

 

Micro-services have been the rage in software circles over the past couple of years.  A natural evolution of service oriented architectures (SOA), and popularized by successful implementations at companies like Spotify, Soundcloud and many others, micro-services have become the “must have gadget this holiday season”: if you aren’t doing them, you must be doing something wrong.  

But is that true?  As much as people (and especially engineers) love black and white, the answer here is a firm “maybe.”  Here are some of the positives and negatives from one CTO’s perspective.

On the plus side, micro-service architectures provide an excellent canvas for rapid development and continuous integration.  Hard dependencies are minimized, business logic is localized, and the resulting services are typically cloud ready.  Developers tend to like micro-services because it allows for a great deal of independence.  It’s hard to understate the potential pain savings and optimizations – people, process and technology – that can be driven by moving to this type of architecture.

But it doesn’t come for free.  For starters, you’ll likely have a lot more moving pieces in terms of individual components and running executables.  A few weeks ago I wrote a post on the architectural heuristic: Simplify Simplify Simplify in which I posited that simple is better when it comes to minimizing TCO.  In that vein, one must ask if micro-services follow the rule.  Yes, each individual service itself is simpler than a bloated monolith as a result of the small size and tight boundaries.  But the total business logic in your enterprise hasn’t changed, and now you may have hundreds or thousands of additional code modules to manage and executables to orchestrate.  The good news is that cloud hosting providers like AWS provide an ever increasing set of tools to help with managing micro-service architectures (e.g. Lambda, Container Services), but it still requires a good deal of cultural and process change.

Another side effect of the proliferation of executables is potential increase in cost – many hosting providers and software vendors (e.g. APM providers) still price based on number of processes or agents.  If you take the same processing load and 10X the number of running processes, you might find yourself in a world of hurt pretty quickly.

Finally, in moving to micro-services, you’ll find yourself needing to address a host of new challenges that you may not have had to previously – service discovery, versioning, transactions and eventual consistency, event tracing, security, etc.  At a minimum, the upside benefits you’ll realize will be offset by developing competency and code to solve those new challenges.

So, what does this mean for the typical company.  If you have applications that are bloated monoliths, those are fantastic candidates for breaking down into smaller components or micro-services.  On the other hand, if you have a reasonably well architected system with decent boundaries in place already, I’d carefully weight the cost-benefits – maybe run a few trials projects to get a better sense of how it would fit into your platform.  Just realize that in many ways you’re “squeezing the balloon” – trading one set of problems for another.  So long as you’re happier with the new problems (and the corresponding benefits), you win.

In closing, whether you move to micro-services or not, I do think there are great lessons to be learned from applying the discipline required by micro-services – namely, enforcing clear boundaries around business logic and using “API thinking” to service a variety of clients.  I wonder if there isn’t a compromise to be had in which one uses the principles for developing and organizing the code, but you still deploy in a more constrained manner – “Code Micro, Deploy Macro.”  But that’s a discussion for another time. 

Conversations with Amazon Alexa

(Warning: this article will delve into technical design and code topics – if you’re not in the mood to geek-out you might want to skip this one.)
 
I’m excited about Alexa and it’s siblings in the voice assistant space – the conversational hands-free model will facilitate “micro moment” interactions to a degree that even mobile apps couldn’t do.  These new apps and interactions can be quite powerful, but as the saying goes – “with great power comes great responsibility.”  In this case the responsibility is to build voice interfaces that don’t suck, and that’s not trivial.  We’ve all used a bank or airline automated systems that have infuriated us, either by being confusing, a waste of time, or by leaving us stuck in “IVR hell” unable to understand or get us to where we want to be.
 
Fortunately there are solutions.  First, there is a UX specialty know as Voice User Interface Design (VUI Design) who’s practitioners are highly skilled in the art, science, psychology, sociology and linguistics required to craft quality speech interactions.  Unfortunately they are rare and will likely be in extremely high demand as voice assistant skills blossom.
 
Second, there are online frameworks for developing speech interactions that fill much the same role as bumpers at the bowling alley – they won’t make you a better bowler, but they’ll protect you from some of the most egregious mistakes.  Perhaps the best tool on the market today is API.AI, which is primarily a natural language interpretation engine that can be the brains behind a variety of conversation interfaces – chat bots like Facebook Messenger and Telegram, voice assistants like Google Home, etc.
 
The Alexa ADK also comes with an online tool for developing interactions, but it’s quite primitive and cumbersome to use for anything but the simplest of skills.  Probably the biggest gap in the ADK is the lack of support for “slot filling”.  Slot filling is what speech interfaces do when they don’t get all the info needed to complete a task.  For example, let’s say you’re developing a movie ticket purchase skill.  In a perfect world every user would properly say, “I’d like two adult tickets to the 5:00 PM showing of Star Wars today.”  Given that our users will be rude and not behave the way we want them to, it’s likely they’ll say something like, “I want two tickets to Star Wars.”  It’s our skill’s responsibility to discover the [ ticket type ], [ showtime ], and [ show date ].  Our skill would likely next as the user: “How many tickets do you want to buy?” and so on.  That’s slot filling.
 
Alexa provides no native tools for managing slot filling, so it’s left to the developer to implement the functionality on their own service (which Alexa calls via “web hooks”.  Here’s an approach we use here at Bonial:
 
  • Create a Conversation object (AlexaConversation) that encapsulates the current state of the dialog and the business logic for determining next steps.  The constructor takes the request model from Alexa, which includes a “Session” context.   Conversations expose three methods:
    1. get_status() – whether the current dialog is complete or not
    2. get_next_slot() – if the dialog is not complete, which slot needs to be filled next
    3. get_session_context() – the new session context JSON to be sent back to Alexa (and then returned to the app on the next call) – basically the dialog state
class Conversation:
    __metaclass__ = ABCMeta

    model = None
    status = None
    type = None

    # pass in the underlying model or data needed to assess the current state of the dialog
    def __init__(self, model):
        self.model = model

    @abstractmethod
    def get_status(self):
        None

    @abstractmethod
    def get_next_slot(self):
        None

    @abstractmethod
    def get_session_context(self):
        None
  • When a request from Alexa arrives, we simply create an AlexaConversation with the request JSON and ask whether the current dialog is complete or not.  If it is complete, we then pass the dialog to the business logic layer for interpretation and processing (more in this later).  If not complete we respond to Alexa with a prompt to ask for the next slot.  Repeat.
 
So far it’s working well and reduces the complexity of the processing code.  Unfortunately both the dialog rules (how many slots, which are required, which order) is in the code, as are the slot prompts.  Are next step will be to move both of these into a declarative format so the VUI designers will have the flexibility to edit without involving the coders.
 
We assume this will be a stop-gap until the ASK and other resources have proper slot-filling capabilities.  We’d also love to hear how you’re approaching this challenge.

What a difference a decade makes…

I frequently fly transatlantic as part of my job.  Over the past few years I’ve been excited to see airlines (Delta, Lufthansa, Air Berlin) begin to offer two things: (1) in seat AC power and (2) internet access throughout the flight.  Now I can run my laptop the entire flight and, or daytime flights, stay connected with my team back in Berlin.
 
Last week I was fortunate to be rerouted from a Delta codeshare KLM fight (no power, no internet) onto Lufthansa (power, internet).  On the daytime flight from Frankfurt to Chicago I spent nine hours of blissful time catching up on a ton of work that required online access.  I was able to slack with my team the whole time, send emails, and work on shared documents.  At one point, I was working on a prototype of a voice assistant project – the IDE was running on my laptop and deploying code to Heroku, I was using API.AI to develop the natural language interface, and used Amazon Alexa ADK to generate sample Alexa calls.  Traffic was constantly flowing between all of the nodes.  All from my seat on the plane.
 
Ten years ago we didn’t have smart phones.  We were just a few years past modems.  Streaming media was mostly a dream. There certainly wasn’t wifi on planes.
 
The jury is still out whether I’ll miss the eight hours of uninterrupted quiet time on planes bing-watching of movies that I probably didn’t want to see – there’s certainly something to be said for being unplugged.  But I sure as heck like the option to stay connected.
 
What a difference a decade makes.  It makes me wonder what the net decade will bring.  Can’t wait – should be a wild ride.

Here to Stay

As we slide into 2017, speech recognition is all the rage – it was the darling of CES and you can’t pick up a business or tech journal without reading about the phenomenon.  The sudden explosion of interest and coverage could lead one to assume this is yet another hype bubble waiting to burst.  I don’t think so, and I’ll explain why below.  But first let’s roll back the calendar and look at the evolution of the technology… 2016… 2015… 2014… 2010… 2005…

In the late 1990’s and early 2000’s, speech recognition was a hot topic.  Powered by early versions of machine learning algorithms and improvements in processing power, and boosted by the advent of VoiceXML which brought web methods to voice, pundits preached of the day when a “voice web” would rise to rival the dot-com bubble.

It never happened.

Implementors quickly found an Achille’s heel in speech interfaces: the single-dimensional information stream provided by audio was no match for two-dimensional visual presentation of the web and apps – it was simply too cumbersome to consume large quantities of information via audio.  This relegated speech to “desperation” scenarios where visual simply wasn’t an option or to human automation scenarios (e.g. call centers).

Fast forward a decade and a half.  Siri which, for all that its been maligned, was a watershed moment for speech.  It came with reasonable accuracy and with a set of natural use cases (hand free driving, message dictation, cold-weather hands-free operation, etc.).  It took speech mainstream.

What Siri started, Amazon Echo took to the next level.  Rather than requiring the user to interrupt the natural flow of their lives to fiddle with their phone, Alexa is always on and ready to go (so long as you’re near it, of course).  This means Alexa enables Micro-Moments and integrates into one’s normal flow of life.  The importance of that can’t be understated.

Over the last six months other tech giants have started falling over themselves to respond to the market success of Echo and the surprising stats coming in from the market: 20% of mobile searches via speech, 42% of people using voice assistants, etc.  Google recently released “Home” and is plugging Assistant into its Pixel phone and other devices.  Facebook and others and trailing close behind.  And Apple is working to regain it’s early lead by freeing Siri from the confines of the phone and laptop.

So where’s it all going?

To speculate on that we should probably look at why consumer speech recognition is succeeding this time.  First, improvements in processing power and neural network / “deep learning” algorithms dropped the cost and radically improved the accuracy of speech recognition.  This has allowed speech + AI to subtly creep into more and more user facing apps (e.g. Google Translate) which both conditioned users as well as helped train the speech engines.  The technology is still limited to single-dimensional streams, but the enormous popularity of chat and, more recently, bots shows that there is plenty of attraction to this single dimension.

But speech is still limited – for example the best recognition engines need a network connection to use cloud resources and noisy environments common to cityscapes continue to confound recognition engines.  This is why the Echo approach is genius – an always-on device with a great microphone in a (somewhat) quite environment.  But will speech move beyond the use case of playing music or getting the weather?

Yes.  Advanced headphones like the Apple AirPods will expand “always on” beyond the home.  Improved algorithms will handle noisy environments.  But perhaps most important – multi-modal interfaces are now eminently possible.

What’s multi-modal?  Basically an interaction that spans multiple interfaces simultaneously.  For example, you may start an interaction via voice but complete it via a mobile device – like asking your voice assistant to read your email headers but then forwarding an email of interest to the nearest screen to be read and responded to.  Fifteen years ago there simple weren’t too many options for bouncing between speech and graphical interfaces.  Today, the ubiquity of connected smart phones changes the equation.

Will this be the last big wave of speech?  No.  Until the speech interface is backed by full AI it can’t reach it’s full potential.  Likewise there’s still a lot of runway in terms of interface devices.  But this time it’s here to stay.