For a recent hackfest, I experimented with Alexa, Amazon's voice UX software offering. Specifically, I tried to add a skill to Alexa. For an idea of what you can do with skills, you can check out the Amazon Alexa Blueprints. However, I approached this from a more technical perspective, and followed this tutorial. I implemented the tutorial first, which includes setting up an AWS Lambda function and the interface that Alexa uses to integrate with your application. After that I modified it to pull from custom data that we provide, rather than the example color scheme.
I think that because Alexa skills and voice UX in general is so new, setting goals is important. My goals for the Alexa skill included:
- run in the simulator
- pull data from a remote service
- take input from a user
- have a multi step interaction
The multi step interaction is a bit clunky, but I think it's a great way to avoid collisions between different skills. Basically, the user calls out an 'invocation' like 'open color picker'. Interactions with Alexa after that are send directly to that particular skill until an end point is reached in the interaction tree. Each of these interactions is triggered by a different voice command, and is handled by something called an 'intent'. Intents can have multiple triggering commands ('what is my favorite color' vs 'what is my color', for example). There's also a lightweight, session level storage while the entire invocation is occurring, which means you can easily pass data between intents without reaching out to a more persistent data storage. There's a lot more in the documentation and the voice design guide.
I also learned that you should not use acronyms, as they are often misunderstood. I ended up changing my invocation to avoid this issue. I imagine there is a bit of a land grab among Alexa skill invocations, as those are like .com top level domains. Once someone has 'open color picker' no one else can have that invocation for Amazon Alexa. This means that if you think voice UX on Alexa might be useful to your business, it's worth experimenting sooner rather than later. (This only applies for 'custom skills' that you are developing. See here for more.)
From a development perspective, it was all done in the browser, though I could definitely have used an IDE for the lambda function I was working on. The developer experience in the browser was good--I didn't run into any weird bugs or glitches. The example code was pretty well commented and easy to extend. Both the lambda function and the skill definition could be exported as text and therefore version controlled. I also noticed that you can have a skill respond from any HTTPS endpoint rather than a lambda function. I can see that being useful if you wanted to leverage existing code or data that wasn't in the AWS cloud.
It was a hackfest and I worked for about six hours, but at the end of the day I had a demo I could run on the Alexa Simulator. At the bottom of the python tutorial mentioned above, there is a reference to echosim.io, which is an online Alexa skill simulator (run by someone else). I had intermittant issues with that simulator, so once I found the simulator built into the developer console, I stuck with that.
The overarching goal was to have fun, learn something and become more familiar with Alexa and see if it made sense for any of our clients to build Alexa skills to help with their business. Alexa is mainstream enough that it makes sense for anyone who works with timely, textual information to evaluate building a skill, especially since a prototype can be built relatively quickly. For instance, a newspaper site should absolutely have an Alexa skill, but it doesn't make as much sense for an ecommerce store (unless you want to do the 'latest deals') because navigation more than a few levels deep is problematic. In general I think that Alexa skills are worth exploring as this new UX for interacting with computers becomes more prevalent.