A pioneer's guide to Alexa
Have you seen Star Trek? If you haven’t, you should. One of the pieces of future tech that is quietly on display throughout the show is the ability to talk to the computer. Whether it’s asking the computer where someone is or ordering a cup of earl grey tea, the computer has no problem understanding the questions it is asked, and who’s asking them.
Amazon’s Alexa products claim that they are this shining vision of the future! Not only that, you can write your own apps for the platform. Amazing! Right? Well, we’re not quite there yet. So, what are the challenges and limitations? Is there anything we can do to hack our way through the tough parts? Can we achieve our dreams even if we emerge a bit bloody and beaten? Let's find out.
Creating your own Apollo
I’m going to be focusing on the technical challenges to creating rich Alexa Skills.
An Alexa Skill definition has a few core components:
- Intents: Actions users can do with your Skills
- Utterances: Words and phrases that invoke the Intents
- Slots: A select of possible words to grab from an Utterance
- Logic Service: Takes above, does a think, returns speech. Usually an AWS Lambda
Let’s say we want to define a skill that turns a light on and off. What would that look like?
LightIntent turn light {state}
We state the Intent first, then the Utterance, and we use the Slots mechanism to allow Alexa to pick out the state. In a separate place we define that Slots can either be the word On or Off, restricting Alexa to listen out for those two phrases.
Our Lambda then receives the Intent, as well as any Slot data, which here will either be on or off. This is similar to a Redux action, where you have an Action Type and a Payload of optional data.
There are a few more bits and pieces, like an Intent Schema you’d need to set up.
So, what’s the problem? Turns out, there are quite a number. Below are a few scary chasms you might face when developing your Skills, as well as a few tips and tricks for mitigating parts that we’ve developed in response.
Alexa, why are you hard to develop for?
The Alexa platform is still very immature, and nothing makes this more evident than the developer experience for creating new Skills for the Alexa platform.
We all know how to deploy AWS Lambdas, and luckily you can use your normal preferred methods to do so. Maybe if you’re looking to try something new, this could be an excellent excuse to try out Serverless, which now supports Alexa Skill trigger events.
The "Skill definition" component, however, causes extensive developer experience issues. These definitions don’t live in AWS at all, but in an entirely separate "Amazon Developer Services" (ADS) ecosystem. As this is a separate ecosystem, you cannot programmatically create, update, read or manage any Alexa skill with the AWS-SDK or even through any sort of endpoint. In addition, your IAM users and roles don’t apply!
This means your delivery pipeline is unfortunately a case of writing your intents, slots, and utterances and… copy-pasting them in.
Luckily in terms of collaboration without IAM roles, I can help you out. There’s a well-hidden feature to invite other developers with Amazon accounts to view and edit your Skills definitions within ADS. This can be found in under the Settings option and then under the User Permissions link in the new grey bar just below the main navigation. There are only four levels of permissions though, so I’m afraid you don’t get to retain the same level of granularity you might get in IAM!
Alexa, order me some Earl Grey Biscuits
Alexa has this concept of Slots, a selection of predefined items that can fit into sections of a particular Utterance. What this means though, is you have to match a slot option exactly. Imagine you wanted to be able to order from your favorite supermarket -- do you normally describe what butter you want with the exact brand and weight?
This kind of fuzzy matching is not something Alexa seems to be able to do by itself and is a big hurdle in making talking to her feel "natural." When we say, "Can you pick up some butter?" to a friend, assumptions can be made from context and common sense. It’s unlikely we meant a kilo of butter, for example. Or as above, when we want Earl Grey Biscuits, we want a form of biscuit and not Earl Grey tea.
The slots mechanism really isn’t suited well for that, it wants exact matches. What we found worked quite well though, is ElasticSearch. It’s a search engine that excels in full-text search; letting you have a very complex weighting systems and analytics language breakdowns through high performance queries.
This does mean putting quite a lot of software complexity behind the Alexa layer, but it worked well for us. Additionally, hosting your ElasticSearch cluster on AWS, possibly fed by a DynamoDB instance, lets you have control over dynamic data in a way you could not if you used the static copy-and-paste slots system alone.
Alexa, I’m pretty sure I didn’t ask you to pour me a glass of 'shoes'
Validation is a huge part of using Alexa. She will often make a best guess on what you thought you said within a particular utterance structure -- with no real inclination of whether that’s sensible.
Rejection of odd things is, again, unfortunately entirely up to you. This can partly be mitigated by strong Language User Experience, but in software terms, you may well end up having to write a bunch of validation and verification libraries.
My suggestion is to keep this logic in separate Lambdas that the Alexa Lambda can invoke. This keeps your code more manageable and a lot more testable.
Has the Alexa hype train derailed?
At the introduction of nearly all tech there is a huge level of expectation for how it’s going to change the world. Fueled by slick demos and wild dreams, no one notices the pragmatic challenges and limitations something might have.
Then the disillusionment sets in. I guess the new thing is nothing to get excited about after all. Let's forget all about it.
However, it improves over time, people refine some solid use cases for the new thing, and it settles into our lives over time.
Take the iPhone. When the app store was first released in iOS 2, people’s minds exploded. Think of all the things you could do on your phone now. It’s a supercomputer in your pocket! It’ll take pictures of your fridge and learn what you eat and automatically order more! It’ll drive your car!
None of this happened, of course. Not at first. The iOS platform was very limited and for a long time we had nothing but the most basic apps. Over time though, after the disillusionment had hit, Apple expanded its API, better phones appeared with stronger capabilities and more importantly people built or applied existing tech to fill in the gaps left by iOS.
This is where Alexa is now.
Amazon is growing the platform, letting you do more things to it soon (like push notifications to Alexa) and expanding its developer offerings to ease creation of applications with things like the Alexa SDK.
While the use cases are simple now, tech like Alexa has gained a real foothold for large sections of users that are often quite different from the kind that read tech blogs. Children and the elderly, for example, are some of the primary users of voice interfaces, and get an immense amount of value from Alexa, even as simple as she might be right now.
My tip for now, if you have an Alexa dream, is go for it, because most things are just about possible. Push hard and realize you may have to do quite a bit of work around the edges, but remember, you do have access to AWS’ vast selection of services to augment your Skills. Treat Alexa more as a fantastic audio interface than something that’ll solve your problem for you and trust for it to get better.
Be a pioneer.
Marcel Cutts, software engineer, Red Badger.
Published under license from ITProPortal.com, a Future plc Publication. All rights reserved.