What do we call this industry anyway?

What do we call this industry anyway?

I recently joined the Open Voice Network Ethical Use Task Force (makes it sound like I’m going to be dropping out of helicopters and kicking down doors in search of people using conversational AI for nefarious means) and have already been exposed to some fascinating discussions of some of the most knowledgeable and thoughtful people in “our industry”.

One of these discussions came back to one of my favourite subjects: terminology! 

(Yes, I’m a nerd.)

We’re developing an ethical standard (basically a set of rules) for implementing conversational AI projects in a way that emphasises the positive benefits to society and mitigates harms.

But that led us to a discussion about how we should refer to the industry we work in. 

If we call it the “Voice industry” then we’re excluding people who are building conversational experiences using chat that we think the standards are relevant to. 

If we call it the “Conversational AI industry” then we could be ignoring people doing things in areas fraught with ethical considerations like voice cloning and gathering metadata from speech patterns. 

And what about if we go with something like the “Digital assistant industry” which could cover all sorts of things to do with storing and using both implicit and explicit user data plus a whole load of questions about who should own these digital chokepoints.

I think we all agree that we can’t give up and just call ourselves the “AI industry”. Apart from it taking in a load of things we don’t currently have a view on such as the efficacy of self-driving cars, the term AI is so poorly defined that it should never be used in anything other than films about robots overthrowing the human race.

So how do we square the circle?

Usually when you’re having difficulty defining a term it’s because the word (or words) are being loaded with more meaning than they can bear. 

And perhaps that’s a sign to us that our industry has grown to the point where it’s not really one thing anymore.

The ethical considerations if you’re using Large Language Models to automatically generate responses for your bank chatbot are quite different from building an app that lets users mimic their friends’ voices.

Instead we need to focus on the practical. These standards will help you avoid ethical missteps if you are doing these activities. Rather than a single overarching standard maybe it’s better to be granular. 

Or is this just a cop out? Let me know what you think.

Some thoughts on the top Alexa Live ’22 announcements from a skill developer perspective

Some thoughts on the top Alexa Live ’22 announcements from a skill developer perspective

Ah Alexa Live, every year it feels just like Christmas did when you’re a kid: sitting around while the adults have long boring conversations (hello, over-extended keynote), a mixture of presents you want, those you don’t but have to pretend you’re grateful for, ones who’s batteries weren’t included so you can’t play with them yet, and the crushing disappointment at not recieving the thing you’ve constantly pestered your parents for over the last 6 months.

For those of you who don’t know me, I run the London Chatbots & Voice Assistants Meetup and I said that I would share my thoughts as someone who’s been building Alexa skills for over 5 years(!) on the announcements from Alexa Live at the next meetup. To help me sort the wheat from the chaff I thought it would be useful to write everything down as a blog, with the side benefit of then people who can’t attend the meetup can read them too.

All-in-all there were 20 announcements relating to skill developers but some of them were definitely a bit more niche so I’ll keep this list to the ones that I think are the most important and put them in order of most to least important so they’re easy to find.

Hopefully you find this helpful, if I’ve missed anything please let me know.

Amazon’s cutting the percentage of your revenue that it takes

This is the big one for most developers (at least if you’re in the US) — if you’re making less than $1m a year from your skills then Amazon is going to cut the amount of your revenue it takes from 30% to 20% and they’re going to give you another 10% in ‘credits’ that you can spend on promoting your skill on the Alexa platform (more on this later).

During their presentation Amazon claimed that this was ‘industry leading’ but that completely depends on whether those credits are actually worth anything as both Apple and Google only take 15% of revenue in their equivalent mobile app stores (which, in Google’s case also applied to Google Actions) for the first $1m in a year. Personally, seeing as the credits can only be spent with Amazon and on something that you kind of get for free today I don’t think you can unambiguosly say this is a better deal that what the other platforms provide.

Unfortunately, but not unsuprisingly based on Amazon’s previous approach to rolling out features, this is US only for the moment but with plans at some point to be expanded. If you’re in a lower priority locale like Italy, Canada or Brazil I wouldn’t hold your breath! I was told this is due to ‘legal reasons’ but I don’t think it should be beyond Amazon to get their ducks in a row everywhere before announcing something.

Update: 5 August 2022 — Andy Whitworth reports that the new percentage has been rolled out to the UK and I can see it in Germany as well.

I haven’t seen an official announcement yet so I’m not sure whether all the locales are covered or whether it’s just UK and Germany. I’ll try to update this again when I know what’s what.

It is nice that Amazon is letting developers keep more of their money, and those behind it should certainly be applauded. That being said, this can probably be read as a tacit admission by the Alexa team that the Skill developer ecosystem isn’t as healthy as it should be and can’t currently bare the full weight of Amazon trying to generate an ROI from Alexa.

I still believe that the problem is as much on the customer acceptance side as the developer side (a disproportionately large number of 1-star reviews are people complaining that they bought an Alexa device and therefore expect the content to be free). I’ve long advocated that the developer rewards scheme should be replaced by giving the most active Alexa users credits to spend on ISPs thereby teaching them how to purchase and better aligning developer incentives.

Maybe next year.

Amazon is going to pay you to improve your skills

While the reduced Amazon commission is nice for those developers who are making a meaningful amount (10% extra on $100 isn’t much), most skill developers struggle to make anything via In-Skill-Purchasing so the new Skill Quality Incentives will probably be main way they’ll earn something from their Alexa skill.

As far as I can tell this seems to just be a more explicit version of the current Developer Rewards progam (which although the exact calculation method is a secret, supposedly takes into account the same kind of things such as engagement, retention and customer feedback) and I expect that it will replace that when it launches next year.

The nice thing is that Amazon is providing a new Skill Quality Coach which shows you your current quality score (out of 5) and suggestions on how to improve it. Initially Amazon is intending to pay incentives to skills that have a score over 3, with the money increasing when you’re skill gets over 4.

I will say though, that the Skill Quality Coach still needs a bit of work as it claimed that my skills didn’t have multimodal when they do because I use WebAPI rather than APL and my only other recommendation was that I set up a beta test pool to increase my score when I use a separate staging skill for that.

I’ve got a feeling that it’s probably going to be more helpful to people who are just starting out than those who have been developing for a while and I really hope good skills aren’t penalised if they fit into the one-size-fits-all box ticking exercise.

You’re going to be able to sell Amazon products through your Skills and earn commission

Personally, this might be the one I’m most excited about as it opens up a whole new way to monetize your Skills. Unfortunately, like most of the announcements, it’s not actually available yet and is scheduled for ‘later in 2022’ which probably means just before christmas.

As well as being a route for independent developers to earn more from their Alexa skills I think this may well prove interesting to forward-thinking brands who can see how building a presence on Alexa could drive additional sales as part of their content marketing efforts. I’m sure a lot of the Alexa agencies were very happy with this announcement and the good news for them is that the non-commission version is available right away.

One thing that strikes me as slightly weird is that you’re going to be able to push notifications to the user’s Alexa device with product recommendations. This feels like it has the potential to get super spammy but the Alexa team assured me that there are ‘guardrails’ and I expect that when the implementation details are released it won’t be all that easy to send a notification.

(Aside: the number of things you can send a notification for are pretty limited and although the Alexa team supposedly invite suggestions for other topics I don’t know anyone who has ever managed to get a response from the submission form so I wouldn’t plan on a use case that isn’t supported by the current rules).

Technical details on how to implement the non-commission purchasing flow.

Paid-for promotion is about to become a thing

I feel a bit uneasy about this one — there’s no denying that discovery has long been a problem for Skill developers and there’s no doubt this will help, it’s just that I can’t shake the feeling that before you know it basically what little organic discovery exists will be completely subsumed by paid promotion similar to how the most popular product searches on Amazon lead to a results list that is more than 50% ads.

If you’re a brand looking for new channel this is potentially huge (depending on the price). For the cost of building a relatively simple Skill you’ll be able to run an ad campaign showcasing your brand on every screen based echo device out there. Expect to see lots of diswasher tablet skills appearing shortly!

For indie developers like myself I suspect how supportive of this feature you are will depend on how much you’re getting in those free promotion credits I mentioned earlier. If you’re making reasonable revenue from your skills and the cost of the promotion isn’t too high then I can see it being a good deal. My internal pessimist thinks that the market rate is likely to be driven by the aforementioned dishwasher tablet skills and your credits won’t go very far.

One thing I’d like to see Amazon improve with this is to add some kind of quality / engagment factor to the bid price algorithm similar to Google Adwords so skills that users like are able to pay less for the same promotion. Perhaps another opportunity to tie in the new Skill Quality Score?

Account linking is getting less bad

I think we can all agree that the current process for getting a user to link their account in your Alexa skill with an existing account you have elsewhere is truely terrible.

Amazon have recognised this and have spent a bunch of time and money to make it slightly less bad while not actually making it something anyone would want to use. Why? Who knows? I swear sometimes the Alexa product team like to find the most obtuse way to solve a problem just so they can say that no-one else has ever done it before.

According to the presentation the new ‘voice-forward’ account linking is 3 times better than the current flow (as in 3x the number of people manage to complete it) but I don’t think that’s as impressive as it sounds when you consider just how terrible the current experience is.

You may have noticed the slightly awkard ‘voice-forward account linking’ name and that’s because it’s not ‘voice account linking’. That’s right folks, you still need your phone to complete the process. Just now you read out a numeric code that you get sent in a text message rather than having to go into the app and enter your username and password.

Why it’s so difficult for them to let Alexa ask “{skill name} would like to access your email so it can connect to your account, is that ok?” I don’t know! If you’re worried about privacy you could put a PIN on it like you can with voice shopping. Even on it’s own terms it seems obtuse to send you a text message with a code to read out rather than a mobile notification that you can tap to confirm in the way plenty of other 2 factor systems work.

Anyway, it’s better than it was before so it’s probably worth implementing if you’re already using account linking. Just don’t expect the problem of knowing who your users actually are to be radically solved this year.

There’s a couple of new tools that make it easier to improve your Skills

I always feel a bit sorry for the Alexa PMs who are working on developer tooling as there’s just such a wide range of Alexa developers that whichever tool you build is only going to be suitable for a relatively small segment. This leads to lots of tools being announced, most of which might not seem relevant to you.

If you’re still relatively new to Alexa the new Alexa Learning Lab, APL best practices guide, APL accessiblity guide, Multimodal Response Builder, APLA templates and Code Sandbox are probably all worth looking at though I really wish they had rationalised these down into a single tool as it seems there’s a fair amount of duplication (I guess it’s just a lot more fun to be a PM of your own product than a feature of someone elses).

For me, though, the most interesting are the new Dialog Testing Tool API, Alexa Skill Deals and an update to the A/B testing tool which lets you roll out to a smaller percentage of your audience.

The A/B testing tool is pretty self explanatory. My only question with it is whether any of the developers who have enough traffic to do successful A/B testing are happy with the accuracy and granularity of the metrics you can get from the Developer Console. I know that personally I’d only ever use it for testing interaction model changes that can’t be tested any other way, otherwise I’d rather use an A/B testing solution from the web / mobile world and manage it on the backend.

Still, being able to deploy a test to a smaller segment of your audience is nice (assuming you have enough user to do so) which I suspect most Alexa developers probably don’t. It’s an incremental improvement, probably not worthy of an actual announcement but there you go.

On a related note, Amazon also announced a feature called Alexa Skill Deals which lets you add introductory discounts for your ISPs. I suspect that a lot of skills aren’t making as much revenue as they could due to incorrect pricing due to the fact it’s almost impossible to test thanks to Amazon’s rules around changing prices. This looks easy to implement and an interesting way to experiment with the hope of driving more revene.

The Dialog Testing Tool API, on the other hand, looks more substantial but it’s still in private beta so I’ll have to wait until I get a chance to play with it before I cast judgement. As far as I can tell, it doesn’t do anything that you can’t do with existing 3rd party tools such as Bespoken but hopefully it will be quicker, easier and free!

What I really want to see though, is Alexa pre-building these tests for you based off of usage data (or your manual testing if there isn’t any) that you can then approve to be used as regression tests. I think the single biggest thing that could be done to improve skill quality is making it much easier to test whether your skill is working in the way you expect.

An additional plus side of this would be that Amazon could run these regression tests before updating the overall Alexa language model so they stop breaking 3rd party Skills unintentionally!

Amazon wants you to switch to using Alexa conversations, it just can’t explain why

Every year there’s an Alexa live session or two extolling the halcyon future that is Alexa Conversations where dialog is written in a couple of lines and we don’t have to worry about intents anymore.

I don’t know. I want to like it. I’m a big believer in the future of statistical dialog managment. But every time I watch one of these presentions it leaves me feeling cold. Maybe I’m just not smart enough to get it, but when the speaker usually says something like “I bet by now you can’t wait to get your hands on it” I find myself thinking I’d rather do anything but.

I’ve heard from other developers that it can be good for certain use cases, and I’m sure that once they get the technical kinks (ie the ~15 min build times for your language model) worked out it will be a big improvement but I really think that between now and next year Amazon needs to find someone who can explain Alexa Conversations to me like the ADHD 5 year old I am on the inside!

In the meantime, if you’re interested in the future of statistical dialog managment I recommend you check out this video on end-to-end dialog training from the team over at RASA which is also easily replicable if you’re ready to get your hands dirty.

What was missing

As I mentioned at the beginning, Alexa live is a bit like Christmas and often the thing you notice the most is the present you didn’t get (human nature right?).

I was involved in a back channel discussion with some other seasoned voice developers and there was certainly a fair amount of criticism being levelled at the Alexa team for not fixing basic issues after many years (it’s worth saying that these criticisms applied to the now defunct Actions on Google programme even moreso).

From my perspective, a lot of these issues are due to the way the Alexa product team are organised where product managers seem to be incentivised to launch new products rather than polish and iterate on existing things.

Areas where I would dearly like to see improvement are things like:

  • the clunky copy used in the Alexa payment flow

  • the unecessary step a user is forced through when initiating a skill connection

  • more topics for proactive notifications (in my case, a way to let people who have finshed all the content know when new content is added)

  • an actual voice only account linking process

  • to no longer have to submit every locale to certification at the same time so you could just submit the ones where changes are made rather than your submission getting held up for 6 weeks while the Italy team are waiting for a policy clarification from someone in the US

  • fixing the invocation problems on Fire TV

  • making APL / multimodal available in the Alexa mobile app

  • allowing 3rd party developers to run ads in their skills

I think the biggest problem for the Alexa ecosystem is that Amazon isn’t sure which kind of developers it’s targeting. Does it want Alexa developers to be hobbiests who build the odd skill as a weekend project or professional developers building cutting edge experiences. At the moment it seems like Amazon is focussed on the former without realising that the success of the ecosystem depends on the creation of the latter.

But I appreciate it’s hard to build a platform with a vibrant ecosystem. Having seen how even Google can’t make it work perhaps it’s churlish to be too critical of Amazon when they do at least seem to be trying.

Here’s hoping they succeed and that we all get that elusive, long-wished-for gift at Alexa Live 2023!

In the meantime, let me know what you think and what you’re hoping the Alexa team will announce next.

For completeness, there were a few other things announced that I think are only relevant to a smaller niche of developers so haven’t included in my main list:

  • Alexa Routines Kit which lets you make parts of your skill available to be included in people’s routines rather than the whole thing (useful if you’re a content publisher with multiple sections within your skill).

  • Alexa for Apps SDK which lets you add Alexa control to your existing mobile app (I doubt this is relevant to most Alexa developers and unfortunatly for Amazon I think Google Assistant and Siri have got this one sewn up thanks to their inbuilt advantage on their respective platforms).

  • Something to do with the Food Skills APIs which mainly reminded me that 9am PT is dinnertime in London and caused me to miss that segment in favour of grabbing a pizza!