Skip to Main Content
Friday, June 3, 2022

Designing Voice Experiences

Co-authored by Jennifer Segal, Erica Dahle, Hannah Bishop, and Mike Hanson

Interaction has always been about communication and staying connected – and Human-Computer Interaction (HCI) was never intended solely for graphical user interfaces. That is why voice experiences are poised to play an important role in the next step in HCI design.

Thought: Designing Voice Experiences
Designing Voice ExperiencesInteraction Design Lead — Jennifer Segal
Thought: Designing Voice Experiences

“Human language is the new user interface layer.” – Satya Nadella, CEO, Microsoft

This Magic Moment

An Evolution Back to Voice

We are currently on the cusp of a fundamental shift in how we interact with digital experiences. Over the past 50 years, we have been on a journey that has simplified our interactions from cumbersome desktop interfaces to fluid, real-time mobile apps. But now we are returning to the most foundational human interaction — voice — for something that somehow feels futuristic.

timeline of voice experiences

While the history of speech recognition dates back to the 1950s, Internet of Things (IoT) “voice technology” arrived in 2009 with the launch of Google Voice Search. Siri followed in October 2011, as well as a slew of new voice technology competitors, including Amazon Alexa, Google Home, and Apple’s HomePod, among others.

Since then, voice technology has continued to expand and has now become a part of our everyday lives. According to Statista, the voice recognition tech market reached close to 11 billion US dollars in 2019 and is expected to increase by nearly 17% in 2025. More broadly, over 30% of searches happen by voice, and Google estimates that number will surpass 50% by 2021. An Invoca Survey of 1,000 people in the U.S. found that 42% have a family room voice assistant. The survey also verified that voice is replacing certain types of digital interactions — what once was a swipe or tap is now a request to Alexa. Research has proven that voice-user interfaces will grow exponentially, and expectations associated with voice will increase in ways we can’t yet imagine. 

Raising Your Voice Experience Game: Opportunities to Be Heard

Voice technology resonates with consumers because voice is a natural means of communication. Users can interact and relate to the experience more intuitively than with a screen. Unlike other digital experiences, voice experiences rely on the use of natural language, Q&A formats, and the spoken word. Ideally, voice technology enhances consumers’ engagement with content through fluid and natural language interactions, makes contextual recommendations, and personalizes the overall experience.

graph of how often people use voice devices

The emergence of voice experiences presents a unique opportunity for brands to be heard by their audience. However, this opportunity comes with one challenge — how do you create multi-modal experiences that are compatible with voice interactions? To get ahead of expectations for voice experiences, companies need to account for users interacting with their content off-screen. Read on to learn how to optimize digital experiences for voice devices, from a multidisciplinary group of Experience professionals.

Optimizing Experiences for Voice

With voice experiences, words stand alone. There are no user-friendly widgets to guide a user to the information they’re seeking, and no eye-catching visuals to aid in understanding information. So, how can you navigate crafting spoken rather than written or visual content, and make sure it is strong enough to stand on its own?

voice device usage statistics

To ensure your content is extensible to new channels, you’ll need an omnichannel content strategy. An omnichannel content strategy is a plan for creating content that can be distributed across many channels. To create an omnichannel content strategy, you must first identify what channels are relevant for your content. Then, you’ll need to understand the differences in how users consume your content on each of the relevant channels.

Let Your Content Speak for You

The unique challenge for voice experiences is that, unlike other digital experiences, content is being consumed through spoken word vs. written word. To ensure your written content will translate to voice, focus on these key principles:

  1. Content structure: structuring and labeling content impacts discoverability — especially for voice experiences

  2. Content legibility: simplifying text can make for an easier translation from written to spoken word

  3. Content usefulness: creating content that directly answers user’s questions is invaluable to voice experiences

Content Structure

content structure for voice search

Structuring and labeling content is important for on-page scannability, as well as off-page discoverability — especially for voice search. According to an analysis by Ahrefs, 40.7% of voice search answers come from the Featured Snippet on search result pages (SERPs). A Featured Snippet is a search result displayed at the top of a SERP, that directly answers a searcher’s question. To rank for this type of result — and gain traction with voice searches, content needs to be highly structured and clearly labeled. Use the following tactics to optimize the structure of your content:

  1. “Chunk” content into concise, logical paragraphs

  2. When applicable, use bulleted/ordered lists to organize content

  3. Use descriptive headers and subheads to label content

  4. When applicable, include rich media

  5. Create content that directly answers user’s questions 

  6. Use schema markup to classify content. Common schema markup types for voice search include FAQ, How To, Blog and Article

A few additional things to consider: there is no way to identify a link within voice content, and therefore, addressing links and calls to action is important. Phrased links like “learn more about these programs” or “read more” would be very confusing to hear if delivered as voice content. It is also important to reduce the number of calls to action, which are confusing for screenless users. Voice interfaces can also have issues of lost context and ambiguity, i.e. loss of context of something appearing earlier on the page; it is critical that each piece of content can stand on its own.

Once content is optimized for voice search discovery, you’ll want to make sure that it’s delivered to users effectively. Understanding how your content sounds to users will be critical for effective delivery of written words in a spoken format.

Content Legibility: I Can’t Hear You…

One of the most important questions to consider when creating a voice experience is: will written content be legible when it’s translated to voice? The answer lies in simple, concise writing. Voice experiences are best supported by straightforward language that’s free of marketing “fluff,” and other unnecessary words that may add confusion, such as idioms. This poses a unique challenge for taking web-based content beyond the screen and ensuring that it’s understandable for listeners rather than readers. To be sure that web-based content is legible for conversational channels, consider the following:

  1. Voice device users have a low tolerance for verbosity — be concise with your writing

  2. Long form content doesn’t lend itself well to voice experiences — create “bite-size” versions of long form content where possible

  3. Read your content out loud — this is the only way you’ll know how your content will sound to voice users

Another consideration when writing content for voice is brand personality. Designers have a unique opportunity to convey brand personality through voice. Gender, age, inflection, tone, accent, cadence, and pace are all elements that will be used to craft a unique customer experience for a brand. For example, Disney might craft a friendly Mickey tone that kids can talk to, while a news publication might opt for a smart, assertive voice when breaking the latest story.

Voice interfaces are the most human of interfaces, and there is currently a large opportunity to break away from the sameness of voice personas that are pervasive today and reach a broader, more human audience by doing so. “Voice interfaces must represent the richness of human language,” says Preston So, author of Voice Content and Usability. We don’t tend to personify visual and physical interfaces, but we do personify voice interfaces — and the problem is that Alexa and Siri personas that are so pervasive today are cisgender, white, American women, and there is currently little representation in voice experiences for different human dialects and communities. When crafting your brand’s voice persona, it is important to consider the audience you wish to reach.

Content Usefulness: Answering Your User’s Questions

For voice, good design is dependent on good writing. However, high-quality writing is only valuable if it’s supported by intent. To create an optimal experience on any channel, content must address your user’s needs. The key question to ask here: is the content discoverable for users? Based on a 2019 study conducted by Search Engine Land, 50% of users are engaging in voice experiences by asking their smartphones questions. The types of questions voice device users ask commonly include:

  • Quick facts (68%)

  • Directions (65%)

  • Searching for a business (47%)

  • Researching a product or service (44%)

google search for rental car

The types of questions will vary slightly based on the type of business, but a good starting point for all voice content strategies is answering, ‘Who,’ ‘What,’ ‘Where,’ ‘When,’ ‘How,’ and ‘Why’ questions.

When crafting Q&A content for conversational interfaces, use language that is common to your users. There are multiple resources that can be tapped to find commonly used words, such as:

  • Organic and paid search terms

  • In-site search terms

  • Call center recordings (and other documented customer feedback)

  • Search engine auto suggest completion

In the age of instant gratification, it’s important that voice technology can decipher what’s relevant to the user with quick interactions and summaries. Although this technology is in its infancy, we expect to see voice technology prioritize our needs as customers, get to know us, and make personalized recommendations on products and services. The ideal approach would be to summarize the information based on preferences, and then deliver an answer — much like the cadence of a natural conversation.

Local-intent and SEO for Voice

2 in 5 adults use voice search once a day. (Location World) Voice searches tend to have informational or transactional search intent — meaning users are seeking information about a brand/product/service or looking to complete a transaction for a brand/product/service. To meet user expectations, you must first understand the intent of the inquiry and how it applies to your brand, product or service.

Google voice search results tend to be very concise. In fact, the average voice search result is only 29 words long. In order to optimize for voice search SEO, aim to make your answer snippets approximately 29 words long. Google prefers short, concise answers to voice search queries. Simple, easy-to-read content may also help with voice search SEO. The average Google voice search result is written at a 9th grade level.

google search for perm press cycle

Here are a few suggestions on for improving your voice SEO

  1. Use an SEO tool like Ahrefs, BrightEdge, SEMRush or Moz to identify search terms related to your website that return featured snippets.

  2. Answer search questions with unbiased, factual language. Use bullets or numbered steps when possible.

  3. Use an SEO tool like the ones mentioned above to track keyword rank and featured snippet visibility. When successful, you should see rankings improve to positions 1 and 2, impressions increase, and clicks might actually decrease.

  4. Communicate reporting, expectations and progress to team members and stakeholders. Use real life examples from your phone or speaker to showcase success.

Voice and Accessibility

Voice design also tackles one of the most important topics — accessibility. There are multiple ways in which voice interaction can help with disabilities. “Accessibility” refers to providing access to all people, regardless of ability. According to the World Health Organization, about 15% of all users have a disability.

For common voice description technologies to work a site or an application, it must have Text to Speech enabled (TSS). Semantic HTML markup makes this possible through the screen reader, which allows users to comprehend content at a faster pace. As per Preston So, author of Voice Content and Usability, voice interfaces can and should exist in parallel with screen readers; well-designed voice interfaces can speed users to their goals faster than long-winded screen readers because they shift the focus away from visual to a verbal experience.

Testing Voice Interactions

When creating voice experiences, it is important to facilitate cross-channel interactions. Users have multiple devices they are using — a user might have their phone, laptop, smart speaker, smart headphones, etc. and use multiple devices. Facilitate the ability to visit the website if accessing information from a voice interface; allow users ability to access a phone number to call an agency, for example.

It is also critical to conduct very robust usability testing. Content auditing is, in a sense, usability testing for content. Usability testing can encompass omni-channel and multi-modal experiences, and give a comprehensive idea of how similar or interchangeable the experiences are with each other. Testing will also be critical for learning new ways to optimize the experience.

Prioritizing Voice and Crafting an Omni-Channel Strategy

Now that you know more about the evolution of voice devices and the key elements that impact a voice experience, it’s time to create a strategy. Creating an omni-channel strategy should be a multi-team effort that can include the following deliverables:

  1. Conversational content audit

  2. Help Desk / Call Center stakeholder interviews

  3. Keyword research

  4. Schema markup audit

  5. Entity based content models

  6. Taxonomy and tagging strategies

  7. Voice content guidelines

  8. User flows

  9. Usability testing

  10. Omni-channel content audit to examine cross-channel legibility

Making voice a part of your product design roadmap will ensure that utilization will be applicable to multiple platforms.

Industry Examples of Enhanced, Multi-Modal Experiences

Below are a few inspiring industry examples of how voice can enhance a user’s interaction with a product or service.

Capital One (financial services)

Capital One leads in the financial realm for voice-enabled services. Tasks such as checking your balance, tracking your spending, and paying your bill become simple and instantaneous. You’re able to simply link your account with an option to create a personal key.

“What is the difference between APY and APR?” “What is a credit limit?”

Samsung SmartThings (home services)

SmartThings teaches your home a few tricks to simplify life — such as setting alarms for when you wake up, turning on lights, locking doors, and turning down the thermostat. It can set video alerts if there’s unexpected activity or a water leak.

“How to use Smart Things app”

Kayak Booking Assistance (travel and hospitality)

Kayak’s new Alexa skill allows consumers to check the price of flights and hotels. Instead of having to look for the best deals, they are able to receive a curated verbal response versus having to read and compare prices online.

“what is the closest airport to yellowstone national park?” “how much is a rental car per day?”

Whirlpool Smart Countertop Oven (home appliances)

The smart oven recognizes food and adjusts cooking time and temperature. It also allows you to use voice commands to control the range by preheating the oven and receiving status notifications when food is ready.

“how long should your washer run?” “how to defrost a freezer” “what is the perm press cycle?”

Taking Action: Next Steps for Exploring Voice

Spoken content is organic, arises from human interactions and needs, and is inherently messier than written and visual content. Companies can turn this challenge into an opportunity by following these principles:

  • Ensure the action enhances the experience

  • Imbue voice interactions with your brand’s personality

  • Assure the voice interactions get smarter and more responsive

  • Prioritize voice on digital platforms

  • Explore opportunities with inclusivity and accessibility

Make voice experiences a part of your product roadmap and take the next steps in your omni-channel strategy with these actions:

  1. Consider how voice could augment touchpoints on your current customer journey. How could voice add value to your customer interactions?

  2. Learn the rules of engagement in conversational commerce. Deepen your experience to the types of conversations your customers want to have — build a chatbot, create surveys to learn more.

  3. Review whether your search activity is optimized for voice. Do you have a keyword strategy to capitalize on conversational search terms?

  4. Experiment with voice user interfaces. Test and learn through different skills such as Alexa to explore how you can provide value to your customers and help them engage in new behaviors.

  5. Identify your customer’s deciding factors. Determine what’s at stake for your consumer — what do they care about, and what would make them eager to interact with voice technology?