Skip Links

Network World

  • Social Web 
  • Email 
  • Close

VoiceXML lets you talk to computers

By James Larson , Network World , 03/22/2004
This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

There are many interactive voice response applications that let users listen to computers and respond by pressing the buttons on touchtone phones. However, callers often get lost traversing long, time-consuming sequences of menus. It's also difficult for callers to juggle between listening and searching for the right buttons to press on the small keypads of their cell phones. What's needed are IVR user interfaces that let users listen and speak to computers.

VoiceXML 2.0 is a markup language for building speech interfaces - the voice equivalent of HTML. A voice browser is like a Web browser - it interprets VoiceXML 2.0 scripts to present spoken information to users and accept spoken requests from them.

The World Wide Web Consortium last week made VoiceXML 2.0 a full recommendation, which is commonly understood to be a Web standard. The standard adds a speech-recognition grammar format - for words and phrases that users can speak in response to prompts - that was not included in previous version.

Call components

Because telephones, including many cell phones, don't have the computation capability to host a voice browser, the voice browser resides on the network in a speech server. The speech server may be located in a corporate data center or off-site at a hosting provider.

Users dial a speech server, which downloads VoiceXML 2.0 scripts, grammar formats and audio files from an application server.

The voice browser interprets the VoiceXML 2.0 script by presenting users with a voice message, such as:

System: "Welcome to Ajax. Do you want to speak with sales, accounting or repairs?"

The voice message could be prerecorded voice or text that is routed through a text-to-speech synthesizer.

The voice browser invokes an automatic speech recognizer (ASR), which uses the grammar format to recognize words users speak:

User: "Repairs."

The ASR recognizes the user's spoken response. In this case the grammar format consists of only three words: "sales," "accounting" and "repairs." This type of grammar-driven ASR performs more accurately than dictation ASRs, which attempt to recognize most of the words in English or whatever language a user is speaking.

Sometimes, users might respond by using dual-tone modulated frequency (DTMF). DTMF is useful in noisy environments or when the user wants to reply confidentially.

Partner Content
Foundry Networks

The Foundry Enterprise Advantage

Foundry Networks, Inc. (NASDAQ: FDRY) is a leading provider of high-performance enterprise and service provider switching, routing, security and Web traffic management solutions. Foundry's customers include the world's premier ISPs, metro service providers, and enterprises.

For further information on Foundry Networks please click here.

Leveraging the Advantages
of a Multi-vendor Network Strategy

Today's enterprise network provides more than simply a technology infrastructure. It's an enabler for the enterprise, supporting mission critical applications, creating operational efficiencies and increasing productivity gains. Foundry Networks provides the ideal foundation for a multi-vendor network.

Click here to view whitepaper!

Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to moderator approval. Register here for member benefits.
Have a NetworkWorld account? Log in here. Register now for a free account.

Videos

rssRss Feed
Get instant email notification when white papers, webcasts, executive guides are added to our library. Stay informed and up-to-date with the latest on IT Technologies with Network World's Resource Alerts.