Fog Creek Software
Discussion Board




Bitblt

I've got a problem that's got me stumped.  Essentially I'm trying to get a screenshot in Windows, and it's important to get this down to under 100ms on all machines it runs on.

On machine A, a 1.7 Ghz Pentium 4 running Win XP, 1280x1024x32bits, the Bitblt takes about 30ms.

On machine B, a 1.7 Ghz Pentium 4 running Win XP, 1280x1024x32bits, the Bitblt takes about 700ms.

What can account for this discreprancy?  I am guessing that it's the video card bus.  I am not sure what Machine B has, but Machine A has a 64MB NVidia GeForce2 MX card on an AGP port.

Still, even if machine A had a PCI card, that has about 128MB/s bandwidth, and 1280x1024x32bits is about 5MB if you do the math ... so I'd expect the transfer to take 40ms at most.

Anyone who knows Windows internals or graphics cards can help me out?

Alyosha`
Thursday, May 01, 2003

It might help us if you told us what graphics card machine B is running, and gave us a better understanding of how your code captures the screen and performs the bitblt.

Adrian Gilby
Thursday, May 01, 2003

Reading back from vid mem is, and 'always' has been, slow.

Depending on your card/drivers, readback ranges from hideously slow, to worse than hideously slow.

-
Thursday, May 01, 2003

Yeah, what he said. There's probably very little you can do to improve the read performance from video memory - it's not a design parameter that manufacturers optimize for.

Unless you're doing something really stupid in your program (reading a pixel at a time, or something), of course.

Mark Bessey
Thursday, May 01, 2003

Is the problem that you need to "capture" a screen inside 100ms, or you need to "write" the capture to disk inside 100ms?

Geoff Bennett
Friday, May 02, 2003

Sorry. Reread it. The actual blit is taking 700ms?

Geoff Bennett
Friday, May 02, 2003

How long does a "PrtScr" take on each?
Graphics hardware can seriously stink on this kind of operation. However 700 ms sounds seriously wrong. Are your bitmaps DIB or not?

Peter Ibbotson
Friday, May 02, 2003

How are you doing the blit?
Are you using DirectX?
CreateBitmapX?
Hitting hardware?

If you're using DirectX then I'd expect it to be reasonably similar in speed, it sounds like a driver issue with the slow machine and that its defaulted down to some drip drip method.

If you know exactly the hardware then you can do it directly and manage the double buffering yourself.  Its a while since I did any of this but I'd have expected a blit between the video pages to be available as an op on the video card.

Simon Lucy
Friday, May 02, 2003

Machine B is using an NVIDIA Vanta card.  It's fairly low end, but I agree ... 700ms seems seriously wrong.  It's not converting to a DIB, either, as far as I know ...

The biblt is done with Win32 calls ... it's fairly straightforward:

  hDC = GetWindowDC(GetDesktopWindow());
  hMemDC = CreateCompatibleDC(hDC);
  hBMP = CreateCompatibleBitmap(hDC, width, height);
  SelectObject(hMemDC, hBMP);
  BitBlt(hMemDC, 0, 0, width, height, hDC, 0, 0, SRCCOPY);

Alyosha`
Friday, May 02, 2003

Have you tried blitting into a DIB, for example one created with CreateDIBSection? That might make a difference, especially if the DIB is the same color depth as the screen.

Frederik Slijkerman
Friday, May 02, 2003

I doubt it will help, and I don't know whether you need the desktop itself or the desktop plus all the overlaid windows, but have you tried using CAPTUREBLT instead of SRCCOPY? Might speed things up, might slow them down.

Adrian Gilby
Friday, May 02, 2003

...and of course you have already determined that it is in fact the BitBlt that is slow, and not the other 4 lines of code...

Big B
Friday, May 02, 2003

How are you testing?  How is the capture triggered?  Like some other posters I'm wondering whether you've made absolutely sure that the blit is really responsible for the performance discrepancy... eg. even in the 4 other lines you've shown us the CreateCompatibleBitmap call might be eating up time if for some reason the underlying memory allocation is happening through some non-standard function... eg. some dev environment tool has its' hooks in to help debug mem leak probs etc.? (on one machine but not the other).

Otherwise I'd guess driver problems.  I don't have the expertise in video guts to know what's likely to be going wrong, but if it was me I'd just try switching to another driver, even a very generic non-optimized one, to see what the impact was.

John Aitken
Friday, May 02, 2003

Alyosha, blt speed is hardware dependent and some cards used to have poor implementations.

Why do you need a fast blt? Are you looping?

You could try:

1. caching the bitmap. You don't need to create it each time

2. tracking the update rect and only bltting the section that changes

If you cache, make sure you delete it on exit.

Must be a manager
Saturday, May 03, 2003

Bitblit speed, especially reading from video memory (as opposed to writing it), is *extremely* dependent on video card and driver.

NVIDIA have a pretty solid driver at this point. Just to be sure, try updating to their latest revision. (however, note that NVIDIA often chooses to cripple the performance of their low-end cards in order to differentiate from their high-end products - it could very well be the case that Vanta cards have crippled video memory read performance).

Be careful about interpreting profiles of your code because the BitBlt transfer is probably performed asynchronously. (i.e. one API call starts the transfer, then it proceeds in parallel with the CPU until another API call forces the CPU to wait for the transfer to finish).

Would it be possible for you to BitBlt into an offscreen bitmap, and then transfer the offscreen bitmap back to your application later on? Blitting from one piece of video memory to another is *much* faster than blitting into system RAM. If you just need to capture a single snapshot of the screen, you might benefit from splitting the operation into two parts like this. (my knowledge of the Win32 terminology for these things is a bit fuzzy - I *think* a "Bitmap" means it's in system RAM, whereas a "DIB Section" is somehow shared between system RAM and video RAM - it could be the other way around though...)

Dan Maas
Saturday, May 03, 2003

Dan, it's closer to being the other way round, but isn't exactly.  The code Alyosha posted is almost certainly already blitting from one piece of video memory to another but the abstraction level of the API GDI (graphics device interface) is such that you aren't supposed to know or care.

Bitmaps are device dependent (match the memory layout of the real device).  DIBs (device independent bitmaps) are mostly an intermediary more portable format... allowing you to render an image with some other calls to either, say screen1, or to screen2 (monochrome maybe), or to your printer.  There is some sneaky way to map DIB sections directly to real bitmaps, but I've never had cause to figure it out.

There are other graphic libraries you can use rather than the regular basic GDI if you need to get closer to the machine to really optimize performance, but I don't think that should be called for here.  I don't think there's anyway the blit should be taking 700ms even on a classic pentium... seems to me that something is going wrong.

John Aitken
Sunday, May 04, 2003

Thanks for all those that have responded.  It gets better from here ...

I went over to DirectDraw -- DirectDraw gives you a mechanism where you can get the frame buffer.  So I grab that and memcpy the screen ... now it takes 300ms.  On my machine.  Which used to take 30ms.  They also give me a Blt mechanism which theoretically should use hardware acceleration ... but it takes the same amount of time, 300ms.

Release build, too ...

So I'm guessing that Windows may cache the screen contents in system memory on machine A and not machine B.  According to DirectDraw, there's a flag (DDCAPS_GDI) -- "Display hardware is shared with GDI".  On machine A, this is 0.  I can't find any further documentation on this flag, tho.

If this is the case, maybe I could turn said caching on on the other machine?

I wish I knew what the update rectangle was between invocations, but I doubt there's a way to find out.

Yes, the Blt is the problem, since my code executes almost instaneously if I comment out the bitblt call. 

As far as I can figure, 700ms is the "speed of light" which cannot be bypassed for copying the full video memory to system RAM.

Alyosha`
Monday, May 05, 2003

I've seen that flag before but I can't remember quite where... it was as part of some or another device info API call... GetDevCaps or something maybe... I'm not on my own machine so my lookup ability is impeded.  Anyhow I was never sure quite what the flag meant, but my impression was that it wasn't a switch but rather just straight up info: "this is the way this drive & the underlying hardware work".

As to the update rect / region, I don't think you want to go there.  As best as I know or remember:  You could probably work out the updateRect but it would be ugly and would require system-wide hooks.  I think(?) normally you'd get the updateRect via GetUpdateRect or embedded in a PaintStruct parameter within a Paint msg, but that this gives you window-centric rather than screen-centric info.

Anyhow, I don't really have much to offer without doing research except that I remember doing blits similiar to this lickedy-split, and I think the problem is probably something stupid and simple that we've overlooked.

John Aitken
Thursday, May 08, 2003

*  Recent Topics

*  Fog Creek Home