Conversations with Amazon Alexa

(Warning: this article will delve into technical design and code topics – if you’re not in the mood to geek-out you might want to skip this one.)
 
I’m excited about Alexa and it’s siblings in the voice assistant space – the conversational hands-free model will facilitate “micro moment” interactions to a degree that even mobile apps couldn’t do.  These new apps and interactions can be quite powerful, but as the saying goes – “with great power comes great responsibility.”  In this case the responsibility is to build voice interfaces that don’t suck, and that’s not trivial.  We’ve all used a bank or airline automated systems that have infuriated us, either by being confusing, a waste of time, or by leaving us stuck in “IVR hell” unable to understand or get us to where we want to be.
 
Fortunately there are solutions.  First, there is a UX specialty know as Voice User Interface Design (VUI Design) who’s practitioners are highly skilled in the art, science, psychology, sociology and linguistics required to craft quality speech interactions.  Unfortunately they are rare and will likely be in extremely high demand as voice assistant skills blossom.
 
Second, there are online frameworks for developing speech interactions that fill much the same role as bumpers at the bowling alley – they won’t make you a better bowler, but they’ll protect you from some of the most egregious mistakes.  Perhaps the best tool on the market today is API.AI, which is primarily a natural language interpretation engine that can be the brains behind a variety of conversation interfaces – chat bots like Facebook Messenger and Telegram, voice assistants like Google Home, etc.
 
The Alexa ADK also comes with an online tool for developing interactions, but it’s quite primitive and cumbersome to use for anything but the simplest of skills.  Probably the biggest gap in the ADK is the lack of support for “slot filling”.  Slot filling is what speech interfaces do when they don’t get all the info needed to complete a task.  For example, let’s say you’re developing a movie ticket purchase skill.  In a perfect world every user would properly say, “I’d like two adult tickets to the 5:00 PM showing of Star Wars today.”  Given that our users will be rude and not behave the way we want them to, it’s likely they’ll say something like, “I want two tickets to Star Wars.”  It’s our skill’s responsibility to discover the [ ticket type ], [ showtime ], and [ show date ].  Our skill would likely next as the user: “How many tickets do you want to buy?” and so on.  That’s slot filling.
 
Alexa provides no native tools for managing slot filling, so it’s left to the developer to implement the functionality on their own service (which Alexa calls via “web hooks”.  Here’s an approach we use here at Bonial:
 
  • Create a Conversation object (AlexaConversation) that encapsulates the current state of the dialog and the business logic for determining next steps.  The constructor takes the request model from Alexa, which includes a “Session” context.   Conversations expose three methods:
    1. get_status() – whether the current dialog is complete or not
    2. get_next_slot() – if the dialog is not complete, which slot needs to be filled next
    3. get_session_context() – the new session context JSON to be sent back to Alexa (and then returned to the app on the next call) – basically the dialog state
class Conversation:
    __metaclass__ = ABCMeta

    model = None
    status = None
    type = None

    # pass in the underlying model or data needed to assess the current state of the dialog
    def __init__(self, model):
        self.model = model

    @abstractmethod
    def get_status(self):
        None

    @abstractmethod
    def get_next_slot(self):
        None

    @abstractmethod
    def get_session_context(self):
        None
  • When a request from Alexa arrives, we simply create an AlexaConversation with the request JSON and ask whether the current dialog is complete or not.  If it is complete, we then pass the dialog to the business logic layer for interpretation and processing (more in this later).  If not complete we respond to Alexa with a prompt to ask for the next slot.  Repeat.
 
So far it’s working well and reduces the complexity of the processing code.  Unfortunately both the dialog rules (how many slots, which are required, which order) is in the code, as are the slot prompts.  Are next step will be to move both of these into a declarative format so the VUI designers will have the flexibility to edit without involving the coders.
 
We assume this will be a stop-gap until the ASK and other resources have proper slot-filling capabilities.  We’d also love to hear how you’re approaching this challenge.

What a difference a decade makes…

I frequently fly transatlantic as part of my job.  Over the past few years I’ve been excited to see airlines (Delta, Lufthansa, Air Berlin) begin to offer two things: (1) in seat AC power and (2) internet access throughout the flight.  Now I can run my laptop the entire flight and, or daytime flights, stay connected with my team back in Berlin.
 
Last week I was fortunate to be rerouted from a Delta codeshare KLM fight (no power, no internet) onto Lufthansa (power, internet).  On the daytime flight from Frankfurt to Chicago I spent nine hours of blissful time catching up on a ton of work that required online access.  I was able to slack with my team the whole time, send emails, and work on shared documents.  At one point, I was working on a prototype of a voice assistant project – the IDE was running on my laptop and deploying code to Heroku, I was using API.AI to develop the natural language interface, and used Amazon Alexa ADK to generate sample Alexa calls.  Traffic was constantly flowing between all of the nodes.  All from my seat on the plane.
 
Ten years ago we didn’t have smart phones.  We were just a few years past modems.  Streaming media was mostly a dream. There certainly wasn’t wifi on planes.
 
The jury is still out whether I’ll miss the eight hours of uninterrupted quiet time on planes bing-watching of movies that I probably didn’t want to see – there’s certainly something to be said for being unplugged.  But I sure as heck like the option to stay connected.
 
What a difference a decade makes.  It makes me wonder what the net decade will bring.  Can’t wait – should be a wild ride.