Fog Creek Software
Discussion Board

Speech Recognition: alternative to bulky M$ ?

Microsoft includes Speech Recognition (and synthesis) with Windows 98 and above, though it's not preinstalled by default.  You can distribute it freely.

But it's 150 MB.

Are there any good alternatives for developers (for Speech Recognition & Synthesis) that are much smaller (i.e., <10 MB) ?

This may be wishful thinking, but I was just wondering.

Free (from M$) is a good price, but too big for a downloadable app. (And most of my customers couldn't find thier Windows CD if thier life depended on it <g>)

Mr. Analogy
Wednesday, June 30, 2004

I had a g/f whose brother had terrible carpal-tunnel problems, he used IBM's ViaVoice, and thought it was great... but I have no idea how big or small it is, just that it's roughly $50.

Greg Hurlman
Wednesday, June 30, 2004

To clarify:  I'm looking for a redistributable component or library that I can buy, include with my app, and redistribute.

I looked all over and can't find anything.

I suspect that the free Microsoft SAPI makes it hard to SELL such a component.

Mr. Analogy
Wednesday, June 30, 2004

> I suspect that the free Microsoft SAPI makes it hard to SELL such a component.

I don't think so: Automatic Speech Recognition and Text To Speech is relatively "deep magic", *and* a "killer app". There are several vendors (and implementations for every language, not only American English).

My guess (and this is a complete guess) is that historically the ones that actually worked were neither cheap nor small nor simple, and that they expect to sell in large volumes to large companies (for example telcos, multinationals with semi-automated customer service centres, OEMs and resellers).

I think (I'm not sure) that traditionally the various vendors' ASR and TTS engines may use proprietary APIs.

Apart from MS SAPI, the other standard API is "VoiceXML". I worked for a VoiceXML vendor, briefly; they supported (and resold) engines from several vendors. The following is a link to the page on which they describe the ASR and TTS engines and vendors that they support:

I don't know how many of these vendors also sell retail: Scansoft is one.

So, "VoiceXML" (as well as SAPI) gives you something else to Google for: e.g. a Google for "VoiceXML Sourceforge" found but of course that probably for Unix ...

... note that you may discover hardware issues too: e.g. ASR is no better than the microphone you're using (the microphone built into my laptop isn't good enough for ASR).

Christopher Wells
Wednesday, June 30, 2004

VoiceXML and the leading speech engines like ScanSoft are designed for enterprise telephony apps, not standalone PC apps. I'm not familiar with Microsoft's speech rec, but it's probably the easiest way to proceed.

Thursday, July 1, 2004

Scansoft bought Dragon, and are retailing it for $99.

Christopher Wells
Thursday, July 1, 2004

Christopher -  happily that sourceforge project is cross platform.

a cynic writes...
Thursday, July 1, 2004

I use Scansoft's Dragon Naturally Speaking Essentials, v. 7. On my hard drive it takes up 239MB, including overhead. 

But speech recognition is not a simple app.  Dragon was started by a couple of IBM dropouts who had some pretty good ideas about how to approach the problem.  When they got close to the mark, IBM put the money and manpower into overtaking them, and succeeded; since then Scansoft bought out Dragon, but they also market ViaVoice and other speech technologies, including a small footprint speech recognition engine by SpeechWorks.

Have a look at their website.  Unbelievable stuff there.

Call me James
Thursday, July 1, 2004

try speereo speech engine
it's very small engine.
And ( it's not a joke) really work.

Konstantin Lamine
Monday, July 26, 2004

*  Recent Topics

*  Fog Creek Home