
|
The line between anti-alias, blurring and scaling
Forgive the slightly confused style...
Start with an actual example, then I'll try to generalize
Imagine a small 4X4 pixel color image. Each letter in following corresponds to a pixel
ABCD
EFGH
IJKL
MNOP
Now suppose you want to scale that to a 2X2 pixel color image
QR
ST
Now Q corresponds to A,B,E,F
R to C, D, G, H
etc
You have two main options, either (1) pick any one of the source pixels as the value for Q/R/etc, or (2) calculate average value of the 4 source pixels to get Q/R/etc.
Right? (this is a question, other realistic options gratefully appreciated)
My understanding is the Windows API StretchBlt would correspond to option (1) if you use the COLORONCOLOR mode. Again, right? (again this is a question, I think I'm correct, not 100% sure)
The question is which is better...
With option (1) you lose some data. For example, when you scale down a picture containing thin lines, some lines might disappear.
With option (2) you lose some data, but in a different way, while you lose detail (because of averaging), all data in the source contributes to the output (e.g. a thin line would become fainter)
Think about my last paragraph, you should hopefully see option (2) in some ways corresponds to blurring [and in a more general example is same principle as anti-aliasing - for example using a grey scale pixel to represent something which is part way between black and white]
Now, users don't like blurring in pictures. What about sound?
What I'm curious about, is there an approach which retains the value of option (2) [all data contributes to output], which also retains the value of option (1) [no blurring] ?
What about in other fields other than pictures, e.g. sound? I'm not sure what the audio equivalent of blurring is - but would it be muddied sound? Would a CD player with anti-aliasing blur the sound? Could you notice?
S.Tanna
Monday, February 16, 2004
Two image questions one after another? Is the current and past poster the same person asking us to do your home work?
Li-fan Chen
Monday, February 16, 2004
Answering your question: you can't blur audio the same way as you would pictures. Pictures, you look for neighboring pixels and average out.
With sound you have to delta it with white noise. And the white noise is up to you. White noise is not a flat line. And I don't think it actually uses sound bites (in the time line) with earlier or later time in the same way you would with pictures.
Li-fan Chen
Monday, February 16, 2004
If I understand right, you're saying the output is a mix of input and white noise. In that event, wouldn't it be blurring towards white noise? How can that be better?
And BTW, I'm a bit old to be doing homework.
S.Tanna
Monday, February 16, 2004
With audio you can't really blur it in that way because it's one dimensional not two dimensional.
If you take a graph of a sound wave and dither it to a smaller resolution using your averaging method, you'd end up with some columns that have values in two rows. How can you have values in two rows if the value needs to be a whole number?
I.e. +1, +5, +6, +7 divided by 3 can't become +.33, +1.66, +2, +2.33, it can only be +1, +2, +2, +2, or something like that.
There's plenty of literature on how mp3's are made from uncompresssed sound waves.
www.MarkTAW.com
Monday, February 16, 2004
Oh, and IMHO the averaging method almost always looks better and is more legible, unless it's 1/2, 1/4, 1/8 etc and you can remove pixels on an odd/even schedule. Every 3rd or every 4th and so on would distort the image.
www.MarkTAW.com
Monday, February 16, 2004
S. Tanna, I was making a joke earlier about the forum in general, nothing against you man.
Li-fan Chen
Monday, February 16, 2004
In audio the techniques are different. Good resampling converters are non trivial. However if you just want to drop the dynamic range then just chopping the bits off works well.
Various other artifacts will come out with audio, zipper noise is the main one. However the ear is much more sensitive than the eye.
Many digital mixing desks have a final stage that is completely analog because most studios tend to have amps without a volume knob[1]. What happened initially was that the designers just put digital attenuator in the final output (aka multiply the output by <1). Sadly this meant that because the amps were setup to deliver a couple of Kw of power the desk output was turned right down giving just a few bits of audio, thus turning a nice 16 bit audio path into a 3-4bit path.
[1] I've forgotten[2] the trade term for this.
[2] I've realised quite how much of this stuff I have forgotten now. Once upon a time I was close to world expert level on some of this stuff and worked with folks who were world experts.
Peter Ibbotson
Monday, February 16, 2004
The audio stuff is fascinating.
The image stuff is something that I have a special interest in....
I actually have an app where (some) users are complaining about "blurring" because of the average type scaling - but I know that switching to the other kind of scaling will cause them (or other users) to have a different complaint ... loss of detail. So I'm wondering if an intermediate type of scaling exists.
P.S.
Li-fan no prob
S.Tanna
Monday, February 16, 2004
"However if you just want to drop the dynamic range then just chopping the bits off works well."
Unless you fade into -infinity.
Google "The Great Dither Shootout" for a fascinating example of an extreme test for multiple audio dithering tools. The truncated version is what a "dropping the last xx bits" does if you fade into infinity.
www.MarkTAW.com
Monday, February 16, 2004
S. Tanna,
I don't know how much you are familiar with looking at data in the frequency domain and in the time domain, I'm assuming you do not and will try a very rough approximation (the usual abstraction leakage disclaimer applies). Others, please do not bring this thread down to "Your analogy sucks because blahboingblah".
Look at a chessboard and think of the black squares as the tops of waves and the white squares as the troughs. Now, when you consider a sharp transition or edge (for example pure black to pure white), this translates to a lot of high frequency components if you look at the data in the frequency domain. The basic analogy is: look at a wave, if you want sharper edges/slopes, the tops of successive waves have to be closer together, in other words, the frequency needs to be higher.
Am I making sense to you?
So blurring, or fuzzier edges, means the transition or edge is smeared out over a larger distance, which you could compare to a gentler slope, and thus a lower frequency.
Thus, the "equivalent" of blurring for sounds would be cutting off higher frequencies. A bit like listening to music through a telephone.
Now, what happens when you sample one out of x data ("option 1")? You keep the edges, but some might stronger than in your original picture. E.g. your original picture was :
100% 75% 50% 0% black, now it becomes 100% 50%. This is a stronger slope, you are in fact _adding_ high-frequency components to your data. Think the screaching you hear when a cellphone is on an unstable connection, or the screaching of badly recorded mp3s.
To answer your other question (is there a compromise between 1 and 2): yes. First you apply a transformation, then you resample/decrease the resolution. An example transformation would be : new pixel value = old pixel value/2 + (sum of four neighbouring pixel values)/4. The /2 and /4 is just there for 'normalisation': an image with a uniform color should remain untouched after such a transformation.
Wow. That was a lot, I hope it helped a bit.
Yves
Monday, February 16, 2004
More on the time-frequency domain stuff:
See this link: http://amath.colorado.edu/courses/3310/2001fall/Improc/Webpages/fourier/
Basically, you can represent a signal (either a 1-D time series signal representing sound, a 2-D signal representing an image, or 3-D, 4-D etc...) in the time (or spatial) domain, or the frequency domain. Any signal can be described by a summation of sinusoidal waves of varying magnitude with an associated phase. The key to moving between these two domains is the Fourier transform.
In down-sampling a signal (taking averages, for example), you lose the high frequency content -- information is lost. You can play tricks with the down-sampled signal to make it appear more pleasing to the eye/ear/boss, but you've still lost information. The tricks one might play are problem domain-specific.
If you work in the frequency domain, you know nothing of the spatial location of frequency components. An interesting blend of time (spatial) domain and frequency domain representations is wavelets: http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/wavelet.shtml
C Rose
Monday, February 16, 2004
Recent Topics
Fog Creek Home
|