Do's and Don’ts: Designing an AI Virtual Assistant

Are you interested in designing an AI virtual assistant? There are so many different ways to design one, where should you start? A good way to start designing an AI virtual assistant is to follow these steps:

Decide on the assistant’s role/functions
Decide on a development approach
Decide which tools you are going to use
When designing stick to the Dos and Don’ts of virtual assistant creation

Let’s go over these steps in more detail.

Choosing a Role
To begin with, let's make sure we understand what a digital virtual assistant is, and what one does.

A virtual assistant is an application that is capable of interpreting voice commands and carrying out tasks for its user. Digital virtual assistance make use of microphones and software to capture spoken commands and translate these commands into actions. A digital assistant can carry out just about any task that can be accomplished with digital skills by computer software, but the tasks expected of a digital assistant will depend on the context in which the assistant is being used. A digital assistant designed for use in schools will not require the same abilities as a digital assistant designed for use by banks.

What Functions Are Required Of Digital Assistants?
There are certain core functionalities that a digital assistant is expected to have, like the ability to search the web, send emails, open apps, etc. Digital assistants for banking, for example, will be expected to carry out tasks like searching databases and bringing up records of past transactions, viewing routing numbers and account numbers, paying bills, managing debit and credit cards, connecting specialists and connecting customer support agents with customers.

How Can You Choose The Right Abilities For Your Digital Assistant?
First, decide on a role for your digital assistant. In what context will the assistant be used?
Decide on the role you want your virtual assistance to have, as this will impact the functions of your assistant.

Ask yourself this: Is my digital assistant going to be voice-based or text based? More likely, it will be some combination of the two.

It is extremely common for customers to use both voice commands and text/touched commands when interfacing with a virtual digital assistant. Because digital assistants are increasingly multimodal, you will need to decide in what capacity you will use voice input and text input, as well as the feedback for both of these methods.

After you've decided to what role your digital assistant will fulfill, and the best methods of handling inputs and outputs to fulfill this role, you can begin thinking about ways to carry out the assistant’s desired functions.

Methods For Designing A Virtual Assistant
When considering how to design a virtual assistant, there are various strategies you can implement. You can program all the logic for the assistant from scratch, doing things like designing a neural network to handle the recognition of audio data, designing the logic that will carry out tasks like sending emails and searching the web, and designing a user interface for the digital assistant.

If you have a genuine need to customize every aspect of the digital assistant, this approach may prove worthwhile. Yet in all likelihood, using libraries that other people have created will not only make the implementation of your digital assistant faster and easier, the libraries will perform better than code you write yourself.

There are many libraries out there for handling different tasks like speech recognition and making HTTP requests. For instance, in terms of handling HTTP requests and similar tasks, Python has libraries like httplib, urllib, and requests.

Each library has their pros and cons, and it will be beneficial for you to research the features of a library you are thinking about working with before committing to using it.

The Dos And Don’ts Of AI Virtual Assistant Design
Let’s go over some general pointers about designing a virtual assistant. Keep these design considerations in mind as you move into the actual programming phase of development.

Do: Make useful functions
This may come as no surprise, but the functions your digital assistant is capable of should be useful. This means tailoring the functions to the target audience.
The ability to recognize images of flowers may be of use to a botanist, but is likely to be useless to a bank, for example.

Don’t: Add too many features
Avoid adding features simply for the sake of having more features. Feature bloat means that your user’s experience of interacting with the assistant may be overly complex, with the ability to access many features that aren’t needed.

Even worse, taking time to invest in features that your digital assistant doesn’t need means you are losing time that could be spent perfecting relevant features.

Do: Make your assistant easy to use and interface with
The user interface for your assistant should be simple, easy to navigate and understand. The user should feel like interacting with your digital assistant is natural and painless, with desired functions easy to activate.

If something goes wrong and a desired function can’t be activated, the assistant should provide relevant information about why the desired task couldn’t be carried out, so that the user can amend their actions/instructions.

Do: Acknowledge ambiguity when receiving input from the user
If there are multiple ways that a command can be interpreted, ask for clarification, rather than simply carrying out the associated action, or you risk doing something the user doesn’t want.

Don’t: Give ambiguous feedback
At the same time, don’t provide ambiguous statements when giving feedback to the user; make feedback/communication as clear as possible.

Don’t: Try to make your digital assistant communicate entirely through voice commands
Most people tend to use some combination of text/touch inputs and voice inputs when interacting with a digital assistant, because for some tasks it is easier to make a request in these fashions. Don't force your users to only give requests in one way.

Don’t: Be inconsiderate or rude when designing responses to queries

Most people do not intend to be rude, but they can accidentally come off as abrasive when not considering how another person may interpret some response. For this reason, try to only use language that is courteous and kind when programming responses to inquiries.

Don’t: Reinvent the wheel
Finally, try to avoid reinventing the wheel. As previously mentioned, many libraries and other resources already exist to assist you in carrying out the functions you want for your digital assistant.

The functions these libraries provide are typically very robust and supported with constant updates, making it wise to use them instead of designing these functions yourself.

Useful Libraries
After you’ve chosen a direction for your digital assistant, you can look into libraries that may be of use.

There are a number of different libraries for Python that you can use to expedite the design of a digital virtual assistant. These libraries include Pyttsx3, SpeechRecognition, Pyaudio, gTTS, WolframAlpha, and Selenium.

Pyttsx3
Pyttsx3 is a text-to-speech package which lets you synthesize text into audio. The package is a free and convenient way to convert text to audio, so that you can have your digital assistant speak to your user. Pyttsx3 supports both Windows and Mac, and it is also compatible with espeak.

SpeechRecognition
SpeechRecognition is a powerful, free library designed to recognize speech, enabling you to convert voice commands into a format your program can interact with. The SpeechRecogntiion package works with many different engines or APIs like Wit.ai, CMU Sphinx, Microsoft Bing Voice Recognition, and Google Cloud Speech API.

PyAudio
PyAudio is a package for Python that allows it to interface with the cross platform audio library I/O, enabling Python to be used on various platforms to record/play audio.

gTTS
gTTS is Google Text-to-Speech, a Python Library that integrates with Google Translate’s text-to-speech function. gTTS converts text to speech like Pyttsx3, but it has some different features. gTTS can create a file out of audio data that can be manipulated and transformed in various ways, and it supports text-tokenizing as well as other forms of pre-processing.

WolframAlpha
WolframAlpha is an answer engine, capable of retrieving factual answers to user questions from large databases. The WolframAlpha API can be used along with Python to enable your users to ask a variety of factual questions and get accurate answers.

Selenium
Selenium is an open source Python package capable of issuing commands to various web browsers. Not only can it handle Get and Post methods, it can be used to fill in forms, move windows, drag and drop objects, and more.

Google Cloud - Google Speech API
If you are looking for a cloud-based solution to handling speech and text, Google’s Cloud Text-to-Speech API makes it easy to use powerful machine learning methods to transform speech to text. The API is compatible with many different languages and it can handle real-time input or prerecorded input.

Sample Virtual Assistant Structure And Abilities
We won’t cover every way of designing a digital assistant here, instead we’ll look at a couple of different ways you can implement a digital assistant and carry out simple functions with it. Please note that these are just examples of how you can structure a digital assistant.

Let’s take a look at an example using SpeechRecognition and pyttsx, and then compare it with gTTS. When using pyttsx, we have to specify an “engine” for the program to use.

import pyttsx
import speech_recognition
sr_engine = pyttsx.init('sapi5')
sr_engine.setProperty('rate', 150)

After importing the libraries and setting the engine properties, creating functions to handle your assistant’s listening and talking is fairly simple. Listening can be handled by using the “Recognizer” function from speech_recognition.

recog = speech_recognition.Recognizer()
def listen():
    with speech_recognition.Microphone() as source:
           recog.adjust_for_ambient_noise(source)
           audio = recog.listen(source)
    Try:
    # you can use your chosen audio recognization platform here
         return recog.recognize_google(audio)
         # return recognizer.recognize_sphinx(audio)
    # to handle errors
    except speech_recognition.UnknownValueError:
         print("Couldn’t make sense of audio")
    except speech_recognition.RequestError as e:
         print("Recognition Error; {0}".format(e))
    return ""

To handle speech, it’s as easy as using the “say” command from the engine.

def talk(text):
        sr_engine.say(text)
        sr_engine.runAndWait()

Creating the talk function with gTTS is comparatively simple.

from gtts import gTTS
def speak(audio_string):
       print(audio_string)
       text_2_speech = gTTS(text=audio_string, lang=’en’)
       text_2_speech.save(“audio.mp3”)
       os.system(“mpg321 audio.mp3”)

So here’s how you could structure a digital assistant and have it carry out simple commands.

Define functions to handle commands and responses. The response function that converts other data types to audio speech, for your assistant’s feedback, could look something like this:

def assistant_response(audio):
        print(audio)
        for line in audio.splitlines():
                 os.system(“Say” + audio)

Define a function to understand user voice responses.

def chosen_command():
       r = speech_recognition.Recognizer()
       with speech_recognition.Microphone() as source:
              print(“Please give instructions…”)
              # to wait for response
              r.pause_threshold = 1
              # to adjust for the ambient audio
              r.adjust_for_ambient_noise(source, duration=1)
              audio = r.listen(source)
              print(“I think you said:” + command + ‘\n’)
try:
              command = r.recognize_google(audio).lower()
# if speech was unrecognized, tries again
except speech_recognition.UnknownValueError:
        print(‘’)
        command = chosen_command();
return command

With these functions created, we can create a function to handle web searching. Here’s how you could potentially structure a function to search the web.

if ‘open’ in command:
# use the regular expressions library to handle special characters
       reg_ex = re.search(‘open (.+)’, command)
if reg_ex:
       domain = reg_ex.group(1)
       print(domain)
       url = 'https://www.' + domain
       # webbrowser is a Python based browser, but could also use Selenium
       webbrowser.open(url)
       assistant_response(‘The webpage you requested has been opened.’)
else:
       Pass

As you can see, there are many tools and multiple methods of designing an AI virtual assistant at your disposal. You will have to decide what is best for your industry, product, and/or service. No matter what tools and methods you use, try to stick to the Dos and Don’ts of virtual assistant design to help you design a virtual assistant that works harmoniously with the user.

The Vietnam AI Grand Challenge is On!
After hosting seminars and workshops in three cities around Vietnam, we are excited to host the first hackathon in the Vietnam AI Grand Challenge series this weekend. Developers will compete to win up to $40,000 in cash and KAT as they design the Ultimate AI Virtual Assistant across various sectors.

Community members are encouraged to vote for their favorite projects. Four winning teams will advance to the next round -- three teams selected by AI Grand Challenge judges and one team voted in by the Community. Stay tuned for more information about Community Voting and check out the Kambria platform Bounty page to keep track of new hackathons and K-prize challenges as they are announced. May the best teams win!

Related Posts

Announcing the DAO Experimentation Program

Kambria 2024 Year In Review and 2025 Roadmap

Tech-powered Elderly Care DAO (TECD) Launch Event – Recap