Review of mobile Text-To-Speech engines

    imageIf your native language is not English and you do not write applications only for the iPhone, then it will be difficult for you if you want to find the right tools for developing the so-called mobile voice-enabled applications.

    This review provides a classification and describes the most worthy of the kind mobile TTS engine.

    I do research in the design of mobile device interfaces for people with visual impairments. To implement one of my projects, I needed a voice generation engine with multilanguage support (at least two languages ​​- English and Russian). This was the reason for the search for a speech synthesizer.

    For convenience, we divide the TTS engines into three classes:
    • commercial;
    • free (solutions licensed under the GPL, LGPL and softer licenses such as BSD License or wxWindows License, which allow commercial development of products);
    • built-in (tools provided by the operating system itself).

    Commercial engines


    SVOX Mobile TTS
    SVOX logo
    Price: n / a
    Languages: 26, including Russian
    Subjective assessment of sound quality: high
    Mobile OS: Android, Symbian, Windows CE / Windows Mobile, BREW
    Possibility of developing commercial products: yes


    SVOX has the most “technically delicious” product - SVOX Mobile TTS . However, since the company operates mainly in the B2B segment, they never answered my two letters asking for a price.

    Acapela TTS
    Acapela logo
    Price: 2800 € plus the so-called run-time license, which in the best case will have to pay 49 € for each common application
    Languages: 23, including Russian
    Subjective assessment of sound quality: high
    Mobile OS: Symbian, Windows CE / Windows Mobile, Embedded Linux, iOS
    Possibility of developing commercial products: there are


    Acapela Group employees turned out to be much more sociable and answered literally half an hour after filling out this application.

    The price indicated in the header applies to such operating systems as Windows Mobile and Symbian, however, the Acapela business model varies depending on the OS chosen. For example, they most strongly promote the direction of iOS, for which it is madeseparate site . There you can register and get an evaluation version of their engine for free. The price of the bare SDK for the former iPhone OS is 250 €. Also, for every application you sell in the App Store, a considerable percentage is removed.

    I note that Acapela provides a "cloud" speech synthesis , as well as porting the SDK to any platform.

    Loquendo Embedded TTS
    Loquendo logo
    Price: 3000 € plus interest from each mobile application sold by you
    Languages: 26, including Russian
    Subjective assessment of sound quality: high
    Mobile OS: Android, Symbian, Windows CE / Windows Mobile, Embedded Linux, iOS, Maemo, Moblin, MeeGo, PalmOS
    Opportunity development of commercial products: there


    Loquendo engine has special tags that allow you to make speech more natural, mixing in such not quite “speech” chips like coughing, laughter and so on.

    Their engine meets the specifications of SSML 1.0 recommended by W3C.

    Sakrament tts
    Sakrament logo
    Price: 1500 € for one OS, when buying a package for two languages ​​at once, a discount of 25% is given, which is 2250 €
    Languages: English, Russian
    Subjective assessment of sound quality: average
    Mobile OS: Symbian, Windows Mobile
    Ability to develop commercial products: yes


    The quality of the Sakrament TTS speech synthesis is quite enough to voice short phrases such as phone numbers or application names. A description of all SDK versions can be found here .

    Free engines


    Flite
    Price: no
    Languages: English plus the ability to compile languages FestVox
    Subjective assessment of sound quality: low
    Mobile OS: Android, Windows CE / Windows Mobile, iOS, PalmOS
    Possibility of developing commercial products: yes ( CMU license )


    Festival speech synthesizer is well known in the world of desktop systems . It has a port called Flite for mobile devices and embedded systems, which is distributed under their own X11-like license, which allows you to freely distribute this software to anyone, as well as build on its basis both commercial and free applications. There are ports for Windows CE / Windows Mobile , PalmOS, Androidand iOS .

    eSpeak
    eSpeak logo
    Price: no
    Languages: 39, including Russian
    Subjective assessment of sound quality: average
    Mobile OS: Android, Windows CE / Windows Mobile
    Ability to develop commercial products: no ( GNU GPL )


    Instructions for compiling the engine for WM are included in the distribution, but on this platform eSpeak It has one significant limitation - voice generation is possible only in a WAV file. The compiled TTS engine for Windows Mobile can be obtained here .

    eSpeak ported to Android. The easiest way to try it out is to install the TTS Service Extended application from the Android Market, which allows you to switch between the built-in engine and eSpeak. This TTS engine is distributed under the terms of the GNU GPL.

    Embedded Solutions


    Embedded solutions are present only in Symbian and Android. For some unknown reason, Microsoft has stripped its mobile OS of the appropriate software interface (MS SAPI).

    Symbian
    Symbian logo
    Price: none
    Languages: English
    Subjective assessment of sound quality: extremely low.
    Possibility of developing commercial products: yes.


    Built-in TTS from the Symbian Foundation is hidden in the CMdaAudioPlayerUtility class. Although his documentation does not say anything about this, it still allows you to synthesize speech. Unfortunately, the Russian language is not supported. The quality of English speech generation is very low. Without preparation, it is quite difficult to understand what he said.

    Additional language packs can be downloaded here.However, the list of supported phones is extremely small. Installing packages for the Russian language on a device running Symbain OS S60 5th did not give the expected results, the built-in TTS did not speak Russian.

    I note that there is a fairly convenient extension of the API called NSS TTS Utility API, a description of which can be found here .

    Android
    Android robot
    Price: no
    Languages: English, French, German, Italian, Spanish
    Subjective assessment of sound quality: average
    Possibility of developing commercial products: yes

    Built-in speech synthesis functionality in Android is available from version 1.6. A great introduction to the topic can be found on the developer blog. Android TTS API is nothing more than a wrapper over SVOX Pico, a Russian language that, unfortunately, is not supported.

    Conclusion


    Everyone will have to make conclusions depending on the requirements for the product being developed. For commercial solutions, the quality of speech synthesis is extremely important, so you should choose from two engines - Acapela TTS and Loquendo Embedded TTS. When choosing an engine for an open source project, the list of target OSs will play an extremely important role.

    Personally, I chose eSpeak, because my project is academic and I can afford to use the GNU GPL licensed product.

    Also popular now: