Imagine a salesman talking to your Web-based CRM system from
his cell phone. He asks for the latest sales figures for a client he's
flying to meet, and the CRM system e-mails him moments later.
Such "multimodal" applications are the idea behind
Speech Application Language Tags (SALT), an emerging technology that provides a speech interface
to Web information. An extension to existing Web programming models and markup
languages such as HTML, XHTML and XML, SALT helps create speech interfaces
that will reside alongside traditional I/O modes such as text, audio, video
and graphics. Rather than dealing with cumbersome custom code, SALT developers
should be able to work with familiar tools and techniques.
Developers who try SALT will find it to be a pretty solid
specification, says Alexander Rudnicky, a senior systems scientist at Carnegie
Mellon University's School of Computer Science. Rudnicky and his students
hope to complete development of an open source SALT browser by year-end. "Our
vision is that once it is available and distributed, tens of thousands of
people can download our browser and start playing with it," he says.
Although promising, SALT is still in its infancy. It will
be a year or more before network executives can consider adding SALT-based
capabilities to their internal or external Web applications.
"The exciting thing about SALT is that it's a
necessary first step for concurrent multimodal applications to become mainstream,"
says Dan Hawkins, managing analyst for voice business at Datamonitor. "But
that's some years off."
Where SALT gets its flavor
SALT has been under development for a year by the SALT Forum,
an industry group led by Microsoft, Cisco and Intel that has grown to include
50 companies. The SALT Forum wants to create an open, royalty-free standard
that supports speech access to Web content through a variety of devices, including
telephones, desktop and tablet PCs, and PDAs. The SALT Forum in July released
Version 1.0 of the specification to the public and submitted it to the World
Wide Web Consortium (W3C) for consideration by working groups developing standards
for voice browsers and multimodal applications.
This fall, the W3C working groups will begin discussing how to
move forward with the SALT specification and how SALT might interrelate
with the W3C's VoiceXML standard, says Dave Raggett, a W3C fellow
who leads these two working groups. While SALT focuses on speech-enabling
Web pages and creating multimodal applications for wireless users,
VoiceXML targets telephony applications such as directory access,
call routing and cell centers. But whether these two standards
will be complementary or competitive is unclear as SALT also can
be used to create telephony applications.
Putting SALT on the table
Enterprise use of SALT will likely be for the development
of multimodal applications for areas such as Web-based self service, call
centers, CRM and sales force automation systems. Or, companies could give
SALT-capable tablet PCs to workers such as insurance adjusters, real estate
agents and aircraft mechanics who prefer hands-free input to fill out Web-based
forms.
Industry segments expected to adopt SALT are those that already
use speech recognition and interactive voice response systems: banking, finance,
telecommunications, travel and other customer-service-intensive industries,
says Peter Gavalakis, a marketing manager at Intel.
SALT-enabling a Web site will require Web development tools
that support the specification, and SALT-capable browsers and Web server software.
SALT browsers are available from HeyAnita and Philips, and Kirusa offers a
SALT-based multimodal platform. Inter-Voice Brite says it will add SALT support
to its Media Gateway, which supports VoiceXML, by year-end.
But Microsoft's support for SALT is most critical, industry
observers say.
"SALT is important because Microsoft is involved," Datamonitor's
Hawkins says, adding that Microsoft is focusing exclusively on
SALT. The company is not involved with VoiceXML development and
doesn't support VoiceXML in its products. (SALT co-developers
Cisco and Intel, on the other hand, do support VoiceXML.)
Microsoft in May began shipping the beta version of its .Net
Speech SDK, a Web development toolkit that supports SALT (see
story). The toolkit includes add-ons to Microsoft's Internet
Explorer and Pocket Internet Explorer browsers to support speech
input. And by year-end, Microsoft plans to release a beta version
of its .Net speech platform, which is server software that will
support speech-enabled Web applications, says James Mastan, director
of marketing for .Net speech technologies at Microsoft.
SALT supporters have their challenge in finding service providers
and companies willing to invest in speech-enabled Web applications during
these tough economic times. Other hurdles include spreading awareness of the
technology among Web developers and continuing to develop industry partnerships
with companies such as AT&T and IBM, which back VoiceXML and have yet
to join the SALT Forum.
"There are tens of thousands of developers who are doing
stuff in VoiceXML," Hawkins says. "The challenge is getting all
those developers to work on SALT."
Carnegie Mellon's Rudnicky notes that SALT does a good
job of hiding much of the complexity involved with developing speech-enabled
systems. But, he says, SALT lacks a set of libraries or reusable code for
developers to modify to their purposes.
Still, Rudnicky remains optimistic about SALT's potential
because, he says, it will let people who aren't intimately familiar
with speech recognition, voice synthesis and other underlying technologies
create working applications. "SALT is a big deal."
SALT
vs. VoiceXML
The W3C must sort
out whether SALT and VoiceXML are more complementary
or competitive. Here's a side-by-side comparison of
the two.
 |
| SALT
|
VoiceXML |
| W3C
work initiated in July 2002, with a final specification 12 to
18 months away. |
Work
began in March 1999. Final specification due in the fall of
2002. |
| Handful
of early and beta products available. |
Dozens
of shipping products. |
| Designed
for speech-enabling Web pages and multimodal applications, but
can be used to create telephony applications. |
Designed
for telephony applications. |
| An
extension to HTML, XHTML and XML, which should
lower development costs vs. using a brand-new
language. |
A
brand-new language for telephony applications. |
|
|