Fog Creek Software
Discussion Board




Help optimize this code?

void DrawMonochromeBitmap(HDC hdc, int x, int y, int width, int height, PBYTE pBitmap, COLORREF color)
{    
  int dx = 0, dy = 0;
  int BitMask[8] = { 128, 64, 32, 16, 8, 4, 2, 1 };
  int BytesPerRow = ((width + 31) / 32) * 4;    
 
  for (dy = 0; dy < height; dy++)
  {
    for (dx = 0; dx < width; dx++)
    {
      if (pBitmap[(dy * BytesPerRow) + ((int)(dx / 8))] & BitMask[(dx % 8)])
    {
      SetPixelV(hdc, x + dx, y + dy, color);
    }
  }
  }
}

Can anyone help me optimize this code?  It draws a monochrome bitmap.  No it's not homework it's for my new game.  I can't use BitBlt because it's too slow and it won't do what I want.

P-Brain
Monday, July 12, 2004

You could "unroll the inner loop" to virtually eliminate the

"dy * BytesPerRow) + ((int)(dx / 8))] & BitMask[(dx % 8)])"

expression.

for (dx = 0; dx < (width/8); dx++)
{
  int offset = (dy * BytesPerRow) + dx;
  int x_plus_dx = x + (8*dx);
  int y_plus_dy = y + dy;
  if (pBitmap[offset + 0] & 1)
    SetPixelV(hdc, x_plus_dx + 0, y_plus_dy, color);
  if (pBitmap[offset + 1] & 2)
    SetPixelV(hdc, x_plus_dx + 1, y_plus_dy, color);
  if (pBitmap[offset + 2] & 4)
    SetPixelV(hdc, x_plus_dx + 2, y_plus_dy, color);
  if (pBitmap[offset + 3] & 8)
    SetPixelV(hdc, x_plus_dx + 3, y_plus_dy, color);
  ...etc...
}
if (width%8)
{
  //code to handle the odd bits
}

-----

You could add a comment to say what you're doing, in case there's another algorithm (e.g. a different WINAPI function) that could be used instead.

-----

My guess is that a lot of the time is spent in the SetPixel function (not in your code) ... which you could eliminate, iff you could change your algorithm to calculate the lines and solid areas that you're drawing, and use a SetLine or SetRect API instead.

Christopher Wells
Monday, July 12, 2004

And in the above, replace "pBitmap[offset + 0]" with ...

BYTE pbyte = &pBitmap[offset] -1;

... so that your if statements look like ...

if (*(++pbyte) & 2) {}

... and then continue to refine. I would bet however that SetPixel is likely to be the most expensive thing you're doing.

Christopher Wells
Monday, July 12, 2004

Thanks Christopher.  I'll try that.  I'm trying to render the monochrome bitmap returned from the GetGlyphOutline function.  It seems that BitBlt is slow for some reason so I have to optimize this naive routine.

P-Brain
Monday, July 12, 2004

> I can't use BitBlt because it's too slow and it won't do what I want.

Instead of SetPixel, try creating a bitmap in your own memory ... use your loop to set specific bits in your bitmap ... and then write the entire bitmap to GDI using one API call.

Christopher Wells
Monday, July 12, 2004

a slight (but easy) optimization would be to use prefix ++ instead of postfix ++ in your counters in the for loops.

Timothy Flechtner
Monday, July 12, 2004

First what exactly are you trying to optimize. It is one thing to just optimize the code buthave you thought about these optimizations?

1) For WM_PAINT Windows provides co-ordinates of a rectangle which needs updating. Clipping your bitmap and updating only that part is faster and also provides less flicker. See GetClipRgn for details.

2) See the ScrollWindow/ScrollDC API

3) If your image is a repeatable pattern and it can be accomodated in a 8x8 matrix try the PatBlt API

4)  Understand how ROP codes are used in Blt operations...a good place to start is http://enchantia.com/software/graphapp/doc/tech/wintrans.html

And BTW SetPixel is almost always slower than BitBlt for even the simplest of images. See http://www.winprog.org/tutorial/transparency.html

Code Monkey
Monday, July 12, 2004

> I'm trying to render the monochrome bitmap returned from the GetGlyphOutline function.

I refactored it all wrong: it should be more like ...

BYTE bytevalue = pBitmap[offset];
if (bytevalue & 1)
    SetPixelV(hdc, ..., color);
  if (bytevalue & 2)
    SetPixelV(hdc, ..., color);
  if (bytevalue & 4)
    ...etc...

And you may be able to replace this with a table of size 256.

Christopher Wells
Monday, July 12, 2004

The best you can do is to re-implement your algorithm to use the Windows functions MoveToEx and LineTo to plot runs of pixels instead of SetPixelV.  SetPixelV doesn't return the color of the previous pixel so it is faster than SetPixel but it's still too slow because it's called too many times.

Window's functions are optimized to use assembly instructions like REP STOSD which stores a whole 32-bit double word into memory.  You would load ECX with the number of DWORDS to store and EAX with the value and then call REP STOSD.  This is how the MoveToEx and LineTo functions operate.  They most likely use the Bresenham algorithm or a modified version of it.  It's efficiency comes with the ability to plot 'runs' of pixels.  See Michael Abrashs 'Zen of Graphics Programming'.

I don't know why BitBlt would be slow but a monochrome bitmap is normally used as a mask.  It seems you are drawing it as one color though so that would be tricky to implement using bitblt.  Try using the SRCPAINT - SRCAND transparency trick with BitBlt if the run length thing doesn't work. Two or three bitblts may be worse than the run length lines in terms of speed.

Dave B.
Monday, July 12, 2004

If mono bitblt is slow, why not convert your mono bitmaps into something for which bitblt isn't slow, and then use those?

Tom
Monday, July 12, 2004

Use DirectX, if speed is an issue.

If it's just one or two bitmaps, and you insist on using GDI, look into GetDIBits, modify the bits directly in RAM, then bitblt all at once. (Which I think somebody suggested above).

Game Programmer Lou
Monday, July 12, 2004

Other problems aside, I suspect your SetBitV (or whatever) API call deep inside the loop is burning 99% of your CPU time. If you want to plot bitmaps fast either use an API that lets you plot the entire bitmap and lets you do this fast enough (eg DirectX as someone suggested), or, if you're going to reinvent the wheel and write this routine yourself, do it in assembler, and access the screen at as low a level as possible.

I've written fast scrolling routines for games when writing for old machines back in the day (well, 6 or 7 years ago) - done in assembler using all the optimisation tricks you can shake a stick at, and accessing the screen buffer directly. Just writing individual bytes, or whole words where possible (almost certainly faster, depending on your CPU architecture!), or even multiple words at a time (the ARM had special instructions for this using many of it's 16 registers if I remember right) direct to the screen buffer in memory.

Matt
Monday, July 12, 2004

> Other problems aside, I suspect your SetBitV (or whatever) API call deep inside the loop is burning 99% of your CPU time.

Yes, as a first approximation I'd guess that the CPU time for this algorithm is proportional to the number of times you call a[ny] Win32 API.

Before unrolling any loops (which makes it more tedious to maintain), change the algorithm to write lines or bitmaps instead of pixels.

> accessing the screen buffer directly

I think he's using Win32. You can't do that in protected mode, and the format of the screen bufer is display-adapter-specific.

Christopher Wells
Monday, July 12, 2004

At the moment there are a lot of replies which say "my guess", and "I suspect". Please run your existing code through a profiler and find out where your code is slow! Then you'll know what exactly is slowing it up...


Tuesday, July 13, 2004

...and then when you make changes you'll also know if you've improved things or made them worse.


Tuesday, July 13, 2004

The call to SetPixelV is definitely the bottleneck.

Take a look at this article from MS:

http://support.microsoft.com:80/support/kb/articles/Q97/3/40.asp&NoWebContent=1

They are using CreateBitmapIndirect to create a Windows bitmap and then BitBlt to draw it. It looks like a good approach to me.

Also, off the top of my head, you will have to do use the SRCPAINT raster op (may be MERGEPAINT - I don't know what colors the GetGlyphOutline bitmap has) with the BitBlt. This will keep the background of the Glyph bitmap from being copied to the destination bitmap.

  The other high-performance alternative is to use a DIB (Device Independant Bitmap). This is a Windows data structure that gives you access to the memory of the bitmap. You still have to do a BitBlt at the end of the day to get it to the screen. See CodeGuru.com or CodeProject.com for examples.

  It is pretty rare that you can out-perform a Windows BitBlt because much of the work is being done by the display driver and hardware where possible. Good Luck.

A Graphics Programmer
Tuesday, July 13, 2004

*  Recent Topics

*  Fog Creek Home