Fog Creek Software
Discussion Board




Suggestions for Speech Synthesis and recognition

Hi,

I'm starting to research adding Speech Synth and Recognition to our programs.

These are for people with langauge difficulties, so we've not used either SS or SR before because the were not "up to snuff".  (If you can't talk, then having HAL the computer demonstrate how to say "Dave, I'm feeling much better now. Dave? Dave?" isn't very helpful).

BUT, I thought I'd revisit this and see what the state of the art is.

I need something I can use within a win32 program (VB, Delphi, .net, etc.).

What third party (or windows OS) solutions have you used that are very good?

Clay N.
Wednesday, June 04, 2003

I've only played with it a bit by the Microsoft speech stuff seems OK for giving commands.  And you can't beat the price.

http://www.microsoft.com/speech/

-
Wednesday, June 04, 2003

Not exactly win32 specific, but close enough...

I had some good experiences with the IBM implementation of the Java Speech API, http://www.alphaworks.ibm.com/tech/speech

That was quite a while ago now, I have little or no clue how well this performs against other win32 TTS STT engines.

Chris Davies
Thursday, June 05, 2003

I have used the MS SAPI stuff for speech synthesis.

The clarity of speech for individual words or syllables can be very poor, but sentences as a whole a very understandable.  ie it is fine for flowing speech / dialogue, but not so good for finer details of the spoken word.

If you're selling speech therapy software, the clarity may not be good enough.  I use it in remedial reading software (for children).  I am looking to use it in spelling software, but I think that we will have to voice record letters and syllables.  I don't think SAPI synthesis is good enough for slowly and clearly speaking individual words.

SAPI comes installed on Win2k and WinXP (probably also Me).  Win2k only has the Microsoft Sam voice as default, which is inferior to the Microsoft Mary voice.  The SAPI installer for 9x machines is several tens of megabytes.

I haven't used the speech recognition.


Thursday, June 05, 2003

The best text-to-speech I've heard is AT&T's Natural Voices system:

[ http://www.research.att.com/projects/tts/ ]

Don't have any experience with speech recognition systems.

He-who-would-not-be-named
Thursday, June 05, 2003

I have used Dragon NaturallySpeaking. It seems quite good. I'm willing to bet that with a decent noise-free microphone it would work very well indeed; my microphone was noisy and crackly, but still it was quite usable for English text. (I tried using it for programming; it seems to use context a lot so recognition of discrete words is a hit or miss affair. Still, after training, it worked OK. I suspect this is my crappy microphone :)

It has a proper SDK that I vaguely remember works using COM. You can see a program that links with it here; it's a Python-based macro system for DNS written by one of the dragon people:

http://tinyurl.com/dju9

Also search for the DNS plugin for EMACS -- that also comes with source code.

Tom
Thursday, June 05, 2003

Aargh. Forgot to provide a URL for the Dragon NaturallySpeaking program. Here it is:

http://www.scansoft.com/

Tom
Thursday, June 05, 2003

Nuance rules the roost for speech rec.

Clutch Cargo
Saturday, June 07, 2003

*  Recent Topics

*  Fog Creek Home