Look Who's Talking
A short review of Text-to-Speech technology, including a list of commercially available products and freeware packages.

By Denis Susac


Lucent's Text-to-Speech Engine is another natural and intelligible sounding text-to-speech system. It is equipped with speed, volume, pitch, and vocal tract size adjustment ranges, text input is unrestricted and both male and female voice options can be used. The engine also offers context-sensitive abbreviations, expansion and open architecture as well as rich preprocessing options for email and Web. It supports American English, Continental & Canadian French, Latin & Castilian Spanish, German and Italian. Other utilities include the custom dictionary editor, custom audio object and ActiveX control for generating .wav files from text.

Implemented as Windows DLL, SoftVoice TTS is one of the rare engines that uses formant synthesis approach. Using the advantages this technology provides, the programmer can alter any of the voices in virtually limitless ways to create totally new voices. A comprehensive set of over 30 different commands can be embedded in the text to control the speech output, including extensive singing support! Accurate mouth shapes can be animated from data supplied by the synthesizer. The SoftVoice system utilizes letter-to-sound rules, a numeric preprocessor, and a dictionary to determine proper word pronunciation. Programmers or users can also use the SoftVoice exceptions dictionary editor to create their own dictionaries for words or abbreviations not pronounced properly. It supports English and Spanish languages.

e-Language from Elan Informatique is the new range of sophisticated software tools designed to bring the user a new generation interface to PDAs, smart phones, car navigation systems, etc. It includes Prosel, a module that extracts natural prosody and applies it to synthetic speech, and Lexitool, a module that produces a personalized lexicon data base for exceptions and abbreviations specific to the application. Their Speech Cube and Proverbe provide software and hardware support for a multi-lingual (US English, British English, Spanish, German, French, Russian, Brazilian Portuguese and Italian) and multi-channel text-to-speech server applications under Windows NT, SCO, Linux, Qnx and Solaris. Desktop solutions include speech engine SDK - DLL version, speech engine SAPI, and speech engine for OS/2. Elan also offers an extensive support for the embedded systems, including Windows CE.

The Microsoft Speech SDK is a natural choice for a majority of developers under Windows platforms. The new release, 5.0, includes a number of improvements, updated development tools, samples, documentation, and enhanced versions of the Microsoft continuous speech recognition engine (MCSR) and Microsoft concatenative speech synthesis engine (TTS). You can mix speech capabilities with the Telephony API (TAPI) to produce advanced telephony applications. Most of the engines described here can be used in conjuction with the Speech SDK - you can install a number of TTS products and select only the desired engine using the control panel applet.

As for the less-known, but also high-quality systems, TCTS Lab's EULER 2.00 is a freely available (GNU C++), easy-to-use, and easy-to-extend, generic TTS for Windows95/98/NT. French is currently supported; other languages will follow. A Mac port is in the works, as well as Unix/Linux port. Another great project from TCTS, MBROLA, has a goal to obtain a set of speech synthesizers for as many languages as possible, and provide them free for noncommercial applications. Central to the MBROLA project is MBROLA, a multiplatform speech synthesizer based on the concatenation of diphones. It is therefore not a general TTS system, since it does not accept raw text as input. On the other hand, it currently supports 24 (!) languages, enabling a wide acceptance of this system all over the world. The Festival Speech Synthesis System is a general and very powerful multi-lingual speech synthesis system, now regarded as a standard in TTS research. It offers a full text to speech system with various APIs, as well an environment for development and research of speech synthesis techniques. It is written in C++ with a Scheme-based command interpreter for general control. Festival can be connected to MBROLA, so it can also support a very large number of languages.

