Fog Creek Software
Discussion Board




Working with audio files

I'm writing a small elementary-level educational app for my daughter.  The app will quiz her on the 300 most common words in the English language.

My problem is the quality of the audio files.  When I record using Creative WaveStudio and the microphone that came with the system, the recordings have a lot of static.

I'd like to keep the file sizes to a minimum, but don't know the reasonable trade-offs between quality and recording frequency, bits, compression format, etc.

Without spending loads for studio-quality equipment, any audiophiles out there have advice on creating crisp, clear voice recordings? 

Nick Hebb
Thursday, January 16, 2003

Cheap microphones are definitely cheap in quality.  Speech recognition suffers from similar problems and people are usually recommended to buy a decent mic ($50).  Check out Dragon at scansoft.com for their mic recomendations.

IanRae
Thursday, January 16, 2003

Make sure you've got as much signal as possible coming in. The noise floor tends to be quite static, so if you record with as much gain as possible (without clipping, obviously) you'll get the most out of your equipment.

If this is just speech then I would be inclined to use a sampling rate of 22.05 kHz. 11k sounds like a telephone, and you don't need the full range of 44k1 +.

For bit-depth, there's not much reason not to just use 16-bit; 8-bit is a bit lo-fi, and you can't record at 24-bit without some fairly serious outlay.

I wouldn't tend to use compression at all, unless you really need it (say web access). In which case:

- Lossless compression schemes, such as ADPCM, will restore the audio to its full glory, but not save much space (~12%)

- Lossy compression (i.e. /reduction/), like mp3, will incur some loss of quality, but this will not be noticeable at sensible bit-rate settings. I would experiement with 64kbps (for mono, 128 for stereo) and see what you think of the results.

There are other, better, reduction schemes out there (AAC, for instance) but I'm not sure how widespread decoders for these are on the desktop yet...

Hardware-wise the soundcard (or onboard sound?) could be as much of a noise source as the mic if it's a cheap one. Rubbishy sound cards typically skimp on 'details' such as proper sheilding around the analogue-digital converters, which leads to problems in a big noisy box :)

Decent soundcards, with balanced inputs etc. are getting a lot cheaper; Terratec, for example, have a good range of basic cards that quote (and, IME, provide) decent signa-noise ratios, and for not too much cash.

For mics, a decent dynamic mic would do the trick splendily. The standard workhorse is the Shure SM-58 (or the 57) which cost about 60 GBP, but these things are usually significantly cheaper in the US. Phonic make an decentish SM-58 clone which (over here) is about half the price.

Finally, you can do a fair bit at the mic. If you're getting lots of silibants and plosives causing unwanted noise and transients, try maknig a pop-sheild out of an old coat hanger and pair of tights (dernier 10, I believe, gets good results :). If you're getting too much low frequency boom, try angling the mic away from the source (assuming that it's not omnidirectional).

Owen Green
Thursday, January 16, 2003

Nick:

Your noise problem may be due to using a mic directlly plugged into the *mic* input of your computer (thus amplified only by the sound card, which may be quite cheap and therefore  noisy).

Much better results can be had by plugging your mic into a good quality amplifier or mixer, which is turn in plugged into the *line* input of your computer.  Much better results, noise-wise.  You can even use a good quality tape recorder in place of the amplifier, with satisfactory results.

Don't let anyone talk you out of compression. It's possible to reduce file size by a factor of ten or so, even if the software is deployed on CD-ROM--and for voiceover the quality does not suffer much, provided you use adequate frequency and sampling settings.  Excellent voice quality is obtained at 11khz or better with 16 bit sampling.  And monaural is of course only half the file size of stereo.

For recording and editing I use the latest version of Total Recorder Standard Edition (www.highcriteria.com).  It's dirt cheap and allows you to record from an application as well as mic and line sources.

working stiff
Thursday, January 16, 2003

OOps! Meant to say 22khz or better.

By the way, the LAME .mp3 codec is still available at no charge, and Total Recorder also supports Ogg Vorbis (an open source audio format).

working stiff
Thursday, January 16, 2003

A lot of good advice - thanks.

I'll skip the $50 mic, though.  For that price, I'll drill the words into her head the old-fashioned way.

Nick Hebb
Thursday, January 16, 2003

*  Recent Topics

*  Fog Creek Home