Fog Creek Software
Discussion Board




Display the sin graph of a sound file

I have to develop an application which would recieve a .WAV file path as an input and display the sin graph with the frequency, wave-length and amplitude for the sound in the file. I have some time to research the idea and I need input as to the Win32 libraries I would need to use to depict the graph. Anyone who's done this earlier, any ideas please.

Sathyaish Chakravarthy
Thursday, December 18, 2003

You mean something like http://www.vanaeken.com/images/scope.html ?

It is included for free in http://www.vanaeken.com/products.html ,  but I do not think it is available as a standalone product. Maybe you could ask the developer for it.

Just me (Sir to you)
Thursday, December 18, 2003

Yeah, a simplified version of that. Thanks for the link.

Sathyaish Chakravarthy
Thursday, December 18, 2003

Guys, I am really feeling lost. I have to develop an application that loads an audio file (mp3, mid or wav) and displays its frequency graph. It then allows the user to edit the graph by cutting or copying a portion of the wave and manipulating the whole sound as such. I don't even know where to start looking. There's still lots of time at hand but still, I want to start with my homework early. Those of you who have some experience in working with sounds, please help me soon. Provide me some place to get started.

Sathyaish Chakravarthy
Thursday, December 18, 2003

I'd recommend browsing through some open source stuff.  Rummaging around on SourceForge I found:

http://audacity.sourceforge.net/
and
http://www.musickit.org/

There's probably a lot more.

Lee
Thursday, December 18, 2003

Is this a homework assignment or a real project? (No insult meant.)


Thursday, December 18, 2003

This is supposed to be a real project sometime later. For now, it is a proposal.

Sathyaish Chakravarthy
Thursday, December 18, 2003

Am I missing something? It sounds like you're describing a sound editor, with which the market is flooded. Why would someone want to propose to make a new one?

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, December 18, 2003

You also need to get your terminology straight before you go any further. What's a "sin graph"? What do you mean by "showing the frequency graph"? Do you mean you want to show the sound in the frequency domain (via an FFT) and edit it there? That'll be next to impossible to do an interactive editor.

If you don't understand my questions, you need to go learn some stuff about signal processing before you go any further.

Chris Tavares
Thursday, December 18, 2003

Yeah, I need to make something like a sound editor, but a much scaled down version with only the following functionality:

This will be a small part of a major application. I have over 2 months for this simplistic sound editor.

By a sin graph, I meant the wave representation of the sound measured on the time line, like the graph of the sine function in trigonometry. But unlike the sine theta representation, the sound wave has varying amplitude and wavelength. For lack of knowledge on the subject of phonetics and multimedia programming, I tried to be vernacular and probably sounded a bit irresolute on the jargon. And you are correct, I am a complete neophyte in the order of multimedia programming and signal processing. I also do not understand what a frequency domain that you mentioned is and nor do I know what the FFT is. However, I am very comfortable with programming in VB, Win32 API, and also a little bit of C.

Would you think it would be enough for me to spend a month learning this stuff or would it require more time? And particularly, how do I get started? Where do I look for the appropriate resources?

Sathyaish Chakravarthy
Thursday, December 18, 2003

Sathyaish, is this an offshored project?

me
Thursday, December 18, 2003

Oops! I forgot to mention the function points in the previous post. The intended functionality in my sound editor would be to a bare minimum which would allow for:

(1) Loading a wav, mid or mp3 file.
(2) Display its wave graph
(3) Allow the user to play the file by pressing a button captioned Play.
(4) Allow the user to stop the playing of the file by pressing a button called Stop.
(5) Let the user mark two points on the graph so as to select a portion of the wave for playing. To do this, the user will, say, click a button called Trim. Then he would click once on the wave form. This would mark a line indicating the starting point of the subset of the wave he wants to select. He would once again click on another place on the wave representation and that would draw the second mark on the wave thereby defining a chunk of the wave as the selected subset. He would then hit the play button to play this subset.
(6) Reset the graph. This would deselect any previous selection the user made from the graph and render the whole sound content to be played upon hitting the Play button.
(7) Close the application/sound editor.

Sathyaish Chakravarthy
Thursday, December 18, 2003

Nope, this is NOT an offshored project.

Sathyaish Chakravarthy
Thursday, December 18, 2003

A PCM (uncompressed) sound file is just a list of numbers. Each number represents the position of the speaker at a certain point in time. The time distance between each sample is 1/Hz seconds, where Hz is the sampling rate of the file.

X is time. Now for each X, you can find Y, because you just look up the number for that point.

You don't need to worry about different sine curves added together. That's how you can decompose sounds (I think!), but you don't need that for this.

Insert half smiley here.
Thursday, December 18, 2003

Hi, Insert Half Smiley Here (Cor, what a name?)!

Thank you for your little tutorial. I have to tell you that I re-read each of those three or four lines a few times and understood almost everything you said. You put it in very easy terms. Then, I also read a few code segments from planet source code and they related to what you said - jotting down the numeric frequency values from byte 44 onwards in the WAV file. However, I have a few questions for you and would be grateful for you answering them.

(1) What is this PCM?
(2) Just like you explained it in such easy terms, I am getting more curious and zealous for more knowledge. Is there some place on the web you know, or some book may be, that would quench my thirst for knowing more?
(3) I did not get this line of yours
>The time distance between each sample is 1/Hz seconds, where Hz is the sampling rate of the file.

Could you please explain this again?

(4) >You don't need to worry about different sine curves added together. That's how you can decompose sounds (I think!), but you don't need that for this.

Yes, I will have only single track sounds to parse, so I would not worry about sound decomposition, if I understand what you are saying correctly.

Sathyaish Chakravarthy
Thursday, December 18, 2003

PCM stands for Pulse Code Modulation. That's all I know about it -- it just seems to be the standard name for uncompressed sound data in the format I described.

As for the 1/Hz thing... "Hz" means, pretty much, "ticks per second". (It's probably got an official definition, but I'm not sure exactly what it might be.)

This means that each second's-worth of sound consists of (Hz) values. So, each value comes 1/(Hz)th of a second after the previous, if you get what I mean.

I say "Hz" rather than give a specific value because you will find various sampling rates in common use. Usually they range from 8,000Hz (sounds like a telephone) to 48,000Hz (slightly higher than that used on a CD).

This page might help a bit with all that, as well as explaining why higher sampling rates sound closer to the original sound:

http://www.fortunecity.com/emachines/e11/86/synth5.html

One word of warning -- byte 44 of the WAV file is not necessarily where the data starts. The WAV file is divided into chunks that you must parse correctly if you are to ensure you find the sample data correctly. More details about that here:

http://www.borg.com/~jglatt/tech/wave.htm

I found those links on google -- if you do a search for "sampling introduction" or "pcm introduction" there are plenty of links.

I don't have any other good links, or suggestions for books, I'm afraid. This is something I just picked up by some wierd kind of osmosis.

Insert half smiley here.
Thursday, December 18, 2003

I doubt that it is practical to handle MIDI files that way; they only consist of notes and other control data, so in order to draw their amplitude curve you would have to render their content.

Martin
Thursday, December 18, 2003

Pulse Code Modulation is basically the name of the sampling process that you described.

For a 16-bit, 44.1kHz sound, you get 44,100 samples per seconds. The value at that moment is a 16-bit value. These values are stored sequentially based on time. You can adjust the sampling rate and the sample size for more or less information.

Now, a WAV file can be compressed or not. Windows supports audio (and video) CODECs. Generally speaking, though, WAV files are straight uncompressed PCM. That makes them very easy to read.

MP3 files, on the other hand, are compressed (lossy compressed, which means some of the data is lost during the compression, and can never be recovered; the idea is to select data to "lose" that the user would never otherwise hear anyway).

Reading and writing MP3 files is going to require a library, or someone who really knows what they're doing. It's probably also going to require you to license from the patent holder, since the compression algorithm is patented. In theory, any MP3 library should be able to reconstruct PCM from an MP3 input, and write an MP3 output from PCM data. The application would always deal exclusively in PCM.

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, December 18, 2003

Oh, and MIDI files aren't sampled sounds at all. They're directions for keyboards to play specific sounds for specific intervals. The actual sounds in question vary based on the MIDI device (in the case of the PC, generally the sound card's built-in MIDI support, or Microsoft's software-based MIDI support).

MIDI and WAV editing would never be done in the same way. They're fundamentally different. Even display is different... you display WAV data as a continuous waveform, but MIDI data as specific notes on a staff played with specific instruments (which looks like traditional sheet music).

Brad Wilson (dotnetguy.techieswithcats.com)
Thursday, December 18, 2003

Sathyaish Chakravarthy,

To be frank, it sounds like this project is beyond your abilities. Your best bet is to hire someone who has done it if it is really for a commercial application.

Skeptical
Thursday, December 18, 2003

SC often posts intriguing queries about software for which he is writing proposals. I wonder if he ever wins any, and if so, what is the success rate for those jobs?

No wonder outsourcing is getting a bad smell in a hurry.

HeWhoMustBeConfused
Friday, December 19, 2003

Yeah confused, I've noticed the same thing.
It's kind of crazy that he wants us to do his legwork for him.

Hey, SC, are you planning on paying us if you get one of these contracts or what?

Skeptical
Friday, December 19, 2003

Hey Sathyaish,

I've written a few wave editors and other audio file processing tools before, even some crazy near-realtime audio effects as well.  Coincidentally enough, I'm currently between paid employments, so if you are representing a serious commercial project you might find my services useful.  If you're trying to do some $100 project you bid for on RentACoder, then it probably wouldn't be worth my getting involved - but I still would be happy to advise you, although without payment I couldn't really see myself contributing working code (sorry about that).  Unless this is an open-source project - I'd be happy to contribute to that.  So if you pay me, or if it's open source, I'd be happy to rework or write some code for you.

Anyway, as people have already mentioned WAV, MID, and MP3 are very different file formats to deal with!  To handle WAV and WAV-like formats, you might just want to go and use the libsndfile library - it runs on Win32 and POSIX perfectly well, and can read or write most PCM formats at various sample rates and sample depths (8-bit, 16-bit, 24- or 32-bit, as well as 32-bit and 64-bit floating point) in a very simple and easy to use manner.  You'll have to watch the licensing on this library - it is Little GPL (aka Library GPL), requiring the author of the application to keep it in it's own DLL file and to distribute the library source code on request - but still it is usable in a commercial project.  Do your clients object to use of LGPL code (some people do, simply because of the GPL's reputation)?  Although I have plenty of home-brewed C and C++ code for reading and writing WAV files, I have found libsndfile to be much easier to use and more robust than anything I've written myself.  When I say "robust", I mean that there are lots of different apps out there with lots of different ideas of what a WAV file format really is - some of them in blatant contradiction of the Microsoft spec - it's good for compatibility to use a library which has been widely used and tested over time.

As far as mp3 support, there are a few open-source libraries around for reading and writing them, but I don't know of any LGPL ones yet.  It is very likely that, as the mp3 format is patented, nobody wants to take the risk of making a library which could easily be used with commercial software as Frauenhofer would probably sue for unauthorized commercial use.  I'd say that you really need to purchase a mp3 encoding/decoding library from a vendor who pays patent royalties to be clear of legal issues for a commercial project.  Again, if this is a $100 RentACoder type project, I'd say it won't be worth it to worry about mp3, you'd either lose money on the deal or you'd be breaking the law in most countries.  If it's GPL, you could use part of LAME or mpg123 to do the job.

As previously mentioned by other posters, MID is fundamentally different from WAV or MP3.  A MIDI file is a list of notes and instrument settings, and will sound different depending upon what sound card the user has or what soundbank presets are loaded into the card or on what software synth (if any) is running.  I may be wrong, but as far as I know there is no general way to convert a MIDI file into a signal/time graph resembling WAV or MP3 data without actually playing it through the sound card and recording the resulting output, which makes it impossible just transparently import them into an audio editor.  Worst of all, some older sound cards have trouble with recording and playing at the same time, or even can't do it at all!  MIDI editors are their own software.  Maybe you really want to add MIDI editing to your program, but it's a whole different can of worms.  You wouldn't be dealing with waveforms at all - you'd be editing a list of notes. 

Playing the sound, that's another task entirely from that of loading an audio file.  There are many libraries for playing audio, using the Windows WAVE API or the DirectSound API or both, with many different licenses and capabilities.  Microsoft provides lots of sample code, too, with various limitations that make it a bad idea to actually use any of it.  After much frustration, I've written my own DirectSound audio playing code which seems to work pretty well.

You mention Visual Basic.  I've developed an OCX control which wraps the libsndfile audio input/output, as well as an OCX control that plays audio through the DirectSound API and fires events to it's VB host when it requires another buffer of audio.  I originally developed these for a shareware project (which I'd show you, but my webhost died recently... oh well) - maybe they're what you need.  I'm sure they're not the most original work ever, but I couldn't find any similar controls for sale anywhere so I had to write my own.  The tricky part is that VB isn't the greatest multithreaded programming language, but you really do need multiple threads to play audio - so wrapping the player thread in an OCX seemed like the best solution for me.  Really hard to do it natively in VB, at least too hard for me.

Drawing a graph of the audio - seems like an easy problem, but you must be aware of certain pitfalls.  A lot of WAV files are absolutely huge.  Suppose you've got a 5-minute song in CD-quality audio, it'll be about 50 Megs long.  Now suppose that you want to draw a graph of this file on the screen - you aren't likely to want to show more than 1000 points along your axis.  But how do you pick those 1000 points?  Just uniformly sampling and plotting every Nth point could cause you to miss peaks of the signal.  If you're drawing this previously-mentioned 50 Meg file (suppose it's CD-quality stereo, so about 15 million samples) over 1000 points, you'd be drawing every 15,000th point.  A loud noise that takes up less than 15,000 samples (about 1/3 of a second) might well be completely INVISIBLE on your graph window, rendering your display misleading and useless!  (not the mention the performance hit from scanning so many widely placed blocks of memory/disk)  But if you scan every one of the 15 million samples for your plot so as not to miss any loud signals between plot points, you'll be looking at 30 second window refresh times - not good.  The solution is that you need to write a caching scheme which stores peak amplitudes (largest positive and largest negative) for chunks of a few hundred or thousand samples that can easily be used for a zoomed-out graph.  Some software, such as N-Track Studio, actually created and stores these "peak files" on disk in the same directory as the input WAV files.  Other software just holds them in memory.  Some people will use a tree structure for holding these peaks - loudest signal in 100 points, loudest signal in 100000 points, a whole heirarchy in between.  The best choice depends upon how long your typical input file will be and upon what sort of editing the user will be performing with your software.

These are just the general issues I've dealt with - there may well be more specific problems to handle for any specific application.  Good luck, and feel free to contact me for advice by the email address attached to this post.  Again, I'd be happy to contribute to an open-source project, and I wouldn't turn down paying work - if this is not possible, then I probably won't send you code but I'd still be happy to help.

Trollumination
Friday, December 19, 2003

Hello People,

This was a very informative discussion. I found it while looking for something of the same sort. Your right... there are tons of companies in India, subscribed to project auctions, getting freshers to write proposals on them. One contradiction though... The salaries in India totally justify even the $100 Rent-A-Coder projects. Just economies of scale in effect here.

I am working on a research project for spirometry, where lung function is measured through an external transducer (turbine type) and the resulting data is plotted against time. I need to make an addition of sound functionality to it. That is, assume I am speaking into the transducer (which measures air flow) and there's a microphone attached to the transducer as well (lets not get into where exactly... wind noise and all that). When I speak, the transducer is measuring air flow and the microphone is recording what I am saying. Essentially, I  need to plot the sound and the air flow against the same x-axis, with the sound and air flow as two separate y-axis. The idea is to diagnose speech problems in people.

Having said that, I'm looking for a method to capture uncompressed sound from the microphone, as a series of numbers and plot them. We'll handle saving the sound later. BTW, I'm using VB. So I'm probably looking for either an OCX, DLL or API call.

Would anyone have any idea on how to do it?

Sid
Wednesday, May 26, 2004

*  Recent Topics

*  Fog Creek Home