How to create your first Alexa skill
For the last couple of weeks, Graham, Marcel, Sinem and I, from Red Badger, have been experimenting with Amazon’s Alexa Echo Dot. An Electric Hockey Puck that uses voice recognition powered by Amazon Alexa voice assistant.
In this post, I’d like to explain how one goes about creating their first Alexa skill.
Unboxing
The first thing we need to do after unboxing is to download the Alexa app from respective app store. Follow the instructions to connect it to WiFi. Once connected, Alexa should be ready and listening for requests, questions or commands.
One caveat
By the time of writing this blog, if you want to run a custom Alexa skill on your local device, you’ll need to set the device to US English. That took us some hard googling to find out, so you don’t have to.
Designing a voice user interface
There are some best practices when designing the user interface for Alexa and I’d recommend to follow them. You can find them in the documentation with examples.
I’ll just list two here that I consider crucial:
- Make it clear that the user needs to respond - means that after you present options to the user, make sure you ask a question so they know that they are expected to say something
- Don’t assume users know what to do - i’ve already mentioned this, basically make sure to give and clearly present the options to the user so they know how to answer / control your Alexa skill
Defining the voice interface
Amazon Developer Portal (ADP) is the place where we setup our skill. It’s a separate thing from AWS console and as far as we know, there isn’t a way of updating Alexa Skill programmatically. (Which makes us quite sad because we do really like to automate our deployment.)
When creating a skill in the developer portal, first thing we need to define is the Name and the Invocation Name of our skill which is pretty self-explanatory. In our case, we put “Jarvis” into both fields and proceed to the next step.
The interaction model
Alexa’s Interaction model consists of three elements:
1.2 Intent Schema
What is an intent? Intents are actions that users can do with your skill. And intent schema is a simple JSON definition of those intents.
This is the schema for our Jarvis skill:
{
"intents": [
{
"intent": "MakeFood",
"slots": [
{
"name": "food",
"type": "AMAZON.Food"
}
]
},
{
"intent": "AnswerFood",
"slots": [
{
"name": "food",
"type": "AMAZON.Food"
}
]
},
]
}
We have two arbitrary intent definitions:
- MakeFood -- this intent is triggered when user asks Jarvis to make food straight away, like saying "Hey Alexa, tell Jarvis to make pancakes," as we’ll see in the Utterances definition
- AnswerFood -- whereas this intent represents a simple answer to a Jarvis’s question like "pancakes" and can only be triggered if the session has been already initialized as we’ll see later in our code
Slot / (Custom) slot types
Think about slot as an intent’s argument. You can have multiple slots for an intent. For both of these intents above we use Amazon’s built-in Slot Type called AMAZON.Food - which is basically some predefined list of food. We could have a custom slot type but then we’d have to list the food options manually. I think that Alexa only uses these definitions to distinguish what intent should be triggered and what to feed into each slot as a value. So, it’ll also work with things that are not in the list, like shoes for example.
Utterances
These are what people say to interact with our skill or basically a voice-to-intent mapping if you prefer.
Again, this is the setup for Jarvis:
MakeFood to cook {Food}
MakeFood to make {Food}
AnswerFood I'm thinking {Food}
AnswerFood I'd like {Food}
AnswerFood I want {Food}
AnswerFood {Food}
Once we finish configuring our Interaction Model, we get to the Configuration step -- this is where we connect Alexa to our Lambda endpoint.
Build and host code
Login to the AWS Console and navigate to AWS Lambda. Click the region drop-down and select either US East (N.Virginia) or EU (Ireland) as Lambda functions for Alexa skills must be hosted in either one of these two.
Our Lambda needs to return a JSON response that looks something like this - this is actually the output we need to get when Jarvis is invoked without a command. I.e. “Alexa, open Jarvis” or “Alexa, ask Jarvis” and so on.
{
version: '1.0',
sessionAttributes: {},
response: {
outputSpeech: {
type: 'PlainText',
text: ‘Jarvis can cook food for you, what would you like?’
},
reprompt: {
outputSpeech: {
type: 'PlainText',
text: ‘What did you say you would like to eat?’
},
},
shouldEndSession: false,
}
}
So, the full source code for our Jarvis would look like this. (We still need to compile it with babel before shipping it to Lambda).
const makeResponse = (text, reprompt = false, shouldEndSession = true) => ({
version: '1.0',
sessionAttributes: {},
response: {
outputSpeech: {
type: 'PlainText',
text
},
reprompt: reprompt ? {
outputSpeech: {
type: 'PlainText',
text: reprompt
},
} : {},
shouldEndSession,
}
});
export const handler = function (event, context, callback) {
const { type, session } = event.request;
if (type === 'LaunchRequest') {
context.succeed(makeResponse(
'Jarvis can cook food for you, what would you like?',
'What did you say you would like to eat?',
false
));
} else if (type === 'IntentRequest') {
const { intent: { name, slots } } = event.request;
if (session.name === 'AnswerFood' && !session.new && slots.food) {
// make your call to a cooking service here
context.succeed(makeResponse(`${slots.food.value}, that's great. I'm on it sir.`));
} else if (session.name === 'MakeFood' && slots.food) {
// make your call to a cooking service here
context.succeed(makeResponse(`${slots.food.value}, that's great. I'm on it sir.`));
} else {
context.succeed(makeResponse(
'I did not understand your request. For now I can only cook, what would you like to eat?',
'What did you say you would like to eat?',
false
));
}
} else if (type === 'SessionEndedRequest') {
context.succeed('Good bye');
}
};
Testing the skill
Once we have our Lambda live and ready, we’ll go back to the Amazon’s developer portal and fill our Lambda’s id into the form, hit the "Next" button which will bring us into the Testing section. Type down one of our defined utterances and click on "Ask Jarvis" -- does it work? If so, your skill should as well be installed on Alexa Echo / Echo dot and you should be able to test it straight away.
Roman Schejbal, software engineer, Red Badger.
Published under license from ITProPortal.com, a Future plc Publication. All rights reserved.