In this article, we will discuss ASK development from 10,000 feet. The core concepts, syntax, hardware and fill in any gaps in between.

 What is it?


Alexa Skills Kit (ASK):

The Alexa Skills Kit is an SDK that allows developers to develop “skills” for any device hosting Alexa Voice Service. The Skills allow for developers to react to and send respond to a user’s audio input.

Alexa Voice Service (AVS):

The engine installed on devices that control the voice pooling, translation, audio and response to audio.

High-Level Concept:

From a high level, ask is just an API that you communicate with. Because of that all you need to do is spin up a web service and publicly expose it to the world. Once you have that you then need to register it on Amazon’s website. ASK just sends a message to your web service and your service does some work and packages up a response.

Rough Syntax:

What Can\Can’t It Do?


With ASK You Can:

  • Create phrases to handle responses
  • Provide “callback” to each response
  • Hook into Alexa’s slot types to provide extensive data context on user-provided arguments
  • Create custom slot types to handle contextual data
  • Respond directly to an utterance triggered by Alexa
  • Perform some action(s) (API calls, saving data)
  • Save any user data explicitly provided by app
  • Send images, audio files associated with responses
  • Essentially, anything you can think of that a restful endpoint can do in response to

With ASK You Can’t :

  • React to passive responses. Must be triggered with Alexa
  • Respond to state changes within Alexa
  • Send a direct message from the developer’s endpoint not directly initiated by a user at that time.
  • Data can’t pull or push data to/from other Skills.
  • Provide ongoing notifications or messaging for your skill

How Does It Work?


Workflow Example

ASK requires you register your “utterance”, your “skill name”, and a couple of sets of phrases to accept. Say, you have a skill that can start a car, your skill is “Car Starter.” “Alexa tell Car Starter to start the car.” At which point, your web service will be notified that that is the utterance at which point it is up to the developer how they want to handle each utterance. Each utterance is typically scripted as a conditional branch of some sort.

Ex:

if “Start Car” was called then … if “Stop Car” was called then … etc…

You can have any number of individual conditional branches to develop against. The biggest areas of limitations have to do with the flow control of a skill. A skill can only travel in one direction and respond directly to its requester.

 

 

Where is the Machine Learning:

The majority of the ML capabilities are in the engine itself. How it understands utterances, voice recognition and interprets responses. You, as a developer, can leverage Alexa’s smarts to a degree with the Amazon provided slots. Behind the scenes, Amazon is regularly improving the smarts of an Amazon-provided slot such that they can be interpreted with a wider variety of potential options. For example, a date slot could interpret the word “today”, “this weekend” or “January the Second” and additionally work in multiple languages. Amazon’s baked in slots make your life easier and should be used wherever possible.

User Experience:

As a user, the verbal syntax is “tell” or “ask” and the name of the skill and any arguments. If this were a terminal script the help would look something like the following. Alexa [ask|tell] {Skill_Name} {args[]}

Hardware Requirements:

You can develop with the ASK without an echo by using tooling like echoism or with a homebrewed echo using raspberry pi but at some point, you should consider actually purchasing a device. If for nothing else than to better understand what the majority of your users are experiencing. The more affordable device is the dot and doesn’t cost nearly as much cheaper devices can from $35-$50 USD used/new.

 

In Closing:

We’ve covered a number of common discussion points I often get asked about regarding Alexa. Specifically:

  • What is it?
  • How does it work?
  • What it can and can’t-do?
  • User Experience
  • Hardware

In a future article, we will cover nitty gritty bits. What does development for ASK look like? If there is anything that you have a question about or that would add to this reference then feel free to add it in the comments below.

 

For a ref of useful Alexa, resources check out this article.