
(Warning: this article will delve into technical design and code topics – if you’re not in the mood to geek-out you might want to skip this one.)
I’m excited about Alexa and it’s siblings in the voice assistant space – the conversational hands-free model will facilitate “micro moment” interactions to a degree that even mobile apps couldn’t do. These new apps and interactions can be quite powerful, but as the saying goes – “with great power comes great responsibility.” In this case the responsibility is to build voice interfaces that don’t suck, and that’s not trivial. We’ve all used a bank or airline automated systems that have infuriated us, either by being confusing, a waste of time, or by leaving us stuck in “IVR hell” unable to understand or get us to where we want to be.
Fortunately there are solutions. First, there is a UX specialty know as Voice User Interface Design (VUI Design) who’s practitioners are highly skilled in the art, science, psychology, sociology and linguistics required to craft quality speech interactions. Unfortunately they are rare and will likely be in extremely high demand as voice assistant skills blossom.
Second, there are online frameworks for developing speech interactions that fill much the same role as bumpers at the bowling alley – they won’t make you a better bowler, but they’ll protect you from some of the most egregious mistakes. Perhaps the best tool on the market today is API.AI, which is primarily a natural language interpretation engine that can be the brains behind a variety of conversation interfaces – chat bots like Facebook Messenger and Telegram, voice assistants like Google Home, etc.
The Alexa ADK also comes with an online tool for developing interactions, but it’s quite primitive and cumbersome to use for anything but the simplest of skills. Probably the biggest gap in the ADK is the lack of support for “slot filling”. Slot filling is what speech interfaces do when they don’t get all the info needed to complete a task. For example, let’s say you’re developing a movie ticket purchase skill. In a perfect world every user would properly say, “I’d like two adult tickets to the 5:00 PM showing of Star Wars today.” Given that our users will be rude and not behave the way we want them to, it’s likely they’ll say something like, “I want two tickets to Star Wars.” It’s our skill’s responsibility to discover the [ ticket type ], [ showtime ], and [ show date ]. Our skill would likely next as the user: “How many tickets do you want to buy?” and so on. That’s slot filling.
Alexa provides no native tools for managing slot filling, so it’s left to the developer to implement the functionality on their own service (which Alexa calls via “web hooks”. Here’s an approach we use here at Bonial:
- Create a Conversation object (AlexaConversation) that encapsulates the current state of the dialog and the business logic for determining next steps. The constructor takes the request model from Alexa, which includes a “Session” context. Conversations expose three methods:
- get_status() – whether the current dialog is complete or not
- get_next_slot() – if the dialog is not complete, which slot needs to be filled next
- get_session_context() – the new session context JSON to be sent back to Alexa (and then returned to the app on the next call) – basically the dialog state
class Conversation: __metaclass__ = ABCMeta model = None status = None type = None # pass in the underlying model or data needed to assess the current state of the dialog def __init__(self, model): self.model = model @abstractmethod def get_status(self): None @abstractmethod def get_next_slot(self): None @abstractmethod def get_session_context(self): None
- When a request from Alexa arrives, we simply create an AlexaConversation with the request JSON and ask whether the current dialog is complete or not. If it is complete, we then pass the dialog to the business logic layer for interpretation and processing (more in this later). If not complete we respond to Alexa with a prompt to ask for the next slot. Repeat.
So far it’s working well and reduces the complexity of the processing code. Unfortunately both the dialog rules (how many slots, which are required, which order) is in the code, as are the slot prompts. Are next step will be to move both of these into a declarative format so the VUI designers will have the flexibility to edit without involving the coders.
We assume this will be a stop-gap until the ASK and other resources have proper slot-filling capabilities. We’d also love to hear how you’re approaching this challenge.