Fog Creek Software
Discussion Board




Why can't windows just

Why can't it just unload my crashed nvdia graphics driver and load a generic one. Instead of just bluescreening and needing a reboot.

I know from limited use that Linux has a shell command to load and unload drivers, so whats stopping MS? It would make my user experience a lot better.

Or is it one of those things that anything that crashes in ring 0 will take down the whole system and no software bases solution around it? As I don't get it, just unload the device that causes the kernel panic and reload it.

somedude
Wednesday, July 14, 2004

Last paragraph should be:

Or is it one of those things that anything that crashes in ring 0 will take down the whole system and no software based solution can work around it? As I don't get it, just unload the device that causes the kernel panic and reload it.

somedude
Wednesday, July 14, 2004

Until recently changing your IP address required a reboot.  you really expect these jokers to update a driver without one?

Sounds like troll bait, not intended to be.  I'm also curious why Windows always wants a reboot.

Snotnose
Wednesday, July 14, 2004

Because it can.


Wednesday, July 14, 2004

Exactly.  Windows OWNS you.  Windows has a 90% market share (or more) and hasn't got any truly viable competitors.  Some people think Linux is catching up and posing a threat.  It's not.  Linux has the tiniest toe-hold, finally, after years.  It's poised to catch up quickly if the OSS community can get their act together and code for END USERS, but that's a dubious proposition.

In the meantime.  Microsoft owns you, your mom, and your dog.  They can require a reboot if they want to because they have no real reason to make Windows more user friendly for you, or easier for you to use, or more convenient to use.  They're MUCH more interested in expending effort to design more ways to lock you in.  Even more so now that Linux has finally gotten significant mainstream media recognition.

muppet
Wednesday, July 14, 2004

You can load/unload [some] device drivers at run-time.

Perhaps not the graphics driver, because it's always being used.

And graphics drivers are part of the kernel (they're ring 0), so if they crash then they have crashed the kernel.

Christopher Wells
Wednesday, July 14, 2004

+++Perhaps not the graphics driver, because it's always being used.+++


This is a weak argument.  Why must video drivers be part of the kernel, anyway?  Why not stop the driver (which will momentarily kill the video) and reload it (restoring the video) ?

I never understood the real benefit in truly huge kernels...

muppet
Wednesday, July 14, 2004

Video subsystem was joined to kernel because of performance issues (since NT4 AFAIR)

Max Belugin (http://belugin.newmail.ru)
Wednesday, July 14, 2004

all right, but now we have all these high end systems.  Why not break it back out?

muppet
Wednesday, July 14, 2004

Yes but at most it should be considered a loadable kernel module. A kernel must expect bad modules and so if the module crashes the kernel shouldn't crash.

Say if a plugin of an application crashes should the whole application be taken down?

However if its a ring 0 issue then its a hardware (intel's fault) problem.  And Intel, AMD and any x86 CPU should find a way to work arround it. And not only try to make the "internet" faster.

somedude
Wednesday, July 14, 2004

It lives in the process space w/ the kernel for performance reasons. You want it isolated? Great. Bear in mind that every transition to the video driver will incur a cross-process context switch.

If you want to run the generic video driver, then just run it all the time. If you think you don't need the nVidia driver anyway, why did you even bother to load it?

Brad Wilson (dotnetguy.techieswithcats.com)
Wednesday, July 14, 2004

Brad:

I think the desire to load the generic driver in case of a video driver crash is to allow us engineers the opportunity to save the code we've just been working on for 4 hours before switching off and rebooting.

muppet
Wednesday, July 14, 2004

You could consider using a graphics card/driver combination that doesn't blue screen instead of bitching about Microsoft.  I haven't had a video driver blue screen since NT 4.0

free(malloc(-1))
Wednesday, July 14, 2004

"If you think you don't need the nVidia driver anyway, why did you even bother to load it?"

Reducto Absurdium

I think the point is if my nice driver giving me 1280 x 1024 and a gagillion f'ing colors fails, I'd accept 640 x 480 with 256 colors so I could save my work before rebooting.

.net, the equivalent of MS Bob.
Wednesday, July 14, 2004

Neither have I, but the broader issue isn't about video drivers, it's about architecture.

muppet
Wednesday, July 14, 2004

I always tell people to set their 'save frequency' to a value compatible with their tolerance for redoing work. I've yet to meet someone whose tolerance is 4 hours.

Having said that, I do feel your pain. I don't always save as frequently as I should. I've come to expect a certain amount of dependability, an expectation that seems to get me into trouble sometimes.

Ron Porter
Wednesday, July 14, 2004

personally, I save every 5 minutes :)  It's an obsessive compulsive thing.  But I understand how someone might get a bit upset if their machine goes up in blue smoke while they're in the middle of something.

muppet
Wednesday, July 14, 2004

If you're at risk of losing 4 hours worth of work because the computer crashes, maybe you shouldn't be using the title "engineer."

Caffeinated
Wednesday, July 14, 2004

Windows XP does actually make an attempt to do what you're asking. I've had the system lock solid, then come back up about 20 seconds later in 640x480x256 with a message telling me that the graphics drivers had crashed and I should save my work and reboot. It's helped me occasionally.

Adrian
Wednesday, July 14, 2004

"personally, I save every 5 minutes :)  It's an obsessive compulsive thing. "

Ah. A former Mac user. Took me years to shake the habit.

Just me (Sir to you)
Wednesday, July 14, 2004

Yes I just to want know who faults it is that a monolithic kernel can't just reload a driver or even just catch a kernel module errors. Is it MS or Intel? I mean if Linux can load and unload drivers then a reload should be simple but that’s not in crashed driver situation.

Everyone is talking about user experience but something like this would be important to enhance it.
---

I mean if the OS has time to print a BSOD for me then it should have time to unload the driver and at least load a (s)VGA driver. Printing a BSOD needs a graphics driver. I don't care if it goes threw the BIOS to do this. Just use the same routine for drawing the desktop. Modern graphics cards can be enabled to (s)VGA mode via the BIOS, so no drivers needed. Just don't expect hardware acceleration of graphics.
---

Basically I’m asking to crash with grace. Not a go f*** your self crash.

somedude
Wednesday, July 14, 2004

BTW: Don't get me wrong this is not an anti MS rant. It’s just that MS Windows is the OS that I use 99% of the time.

I'm googling now to see how xfree86 (or what ever the graphics server of Linux is called) handles drivers and in what mode they run (ring 0 or other) and how it handles graphics drivers crashes.

My gut feeling says that a Linux OS will revert to console mode and that will allow you to reload the graphics server and drivers. This of course can be handled by a script. 

somedude
Wednesday, July 14, 2004

Also what I read is the reason that Windows NT and higher have graphics in the kernel is due to compatibility with Win9x not speed.

As anyone who has used BeOS should know that a microkernel can be be snappy and also why in the past were all the big 3D workstations unix based?

somedude
Wednesday, July 14, 2004

Ok google groups gave me an answer: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=20030314183019%242472%40gated-at.bofh.it&rnum=11

Its even from a graphics driver developer:

> First - the bios isn't always able to fix the screen - the program may
> have programmed the video hardware in odd ways the bios don't know
> about.  Bioses aren't a magic fix.

Exactly what facts do you base the above statement on? Have you seen this
in practice?

I can tell you from my own personal experience that the BIOS on the
graphics card can nearly always restore the screen no matter what state
you have put the graphics hardware into. I have done development in DOS
for years writing the UniVBE and SciTech Display Doctor products, and we
have always used the BIOS to restore the screen when one of our graphics
drivers crashed during development.

somedude
Wednesday, July 14, 2004

SciTech Display Doctor, eh?  I remember I used to use a pirated copy of that for my Super Nintendo emulators..

..err what I mean is...

muppet
Wednesday, July 14, 2004

"It lives in the process space w/ the kernel for performance reasons. You want it isolated? Great. Bear in mind that every transition to the video driver will incur a cross-process context switch."

Actually, not.  Framebuffers commonly (and can be in Win32) memory mapped up to user space.  Access to user space memory mapped data has no context switch penalty.

As an example, DirectDraw programs (typically games) run in user space and have (almost) full access to all the goodies underneath.

Back to the original poster's question.  When a kernel module runs off into the weeds, its unlikely that there is any saving the OS data structures.  This would happen under Linux as well (should a kernel module die, and when debugging a new module, I often wind up rebooting).

Linux (XFree86) has user space graphics drivers.  All register space and memory gets mapped into the X server user process.

hoser
Wednesday, July 14, 2004

"I mean if the OS has time to print a BSOD for me then it should have time to unload the driver and at least load a (s)VGA driver."

Any Kernal error handler would have no way of necessarily knowing the cause of a BSOD since ring0 modules may call each other. So how would it know what to replace to fix it?   

Ron
Wednesday, July 14, 2004

And how frequently have you had this happen?  I can count on one hand the number of times I've had a blue screen with Windows 2000 or XP in the last two years outside of a buggy video driver that crashed when playing a particular video game.  Seriously, Windows 2000 and above pretty much don't crash.  NT 4 was a different story.

Jeremy
Wednesday, July 14, 2004

Windows XP will actually try to do what you ask.  If your video driver crashes (gracefully), XP tries to load the VgaSave driver.  If I remember correctly, this was deliberately implemented because nVidia drivers are responsible for something like 30% of all Windows blue screens.  I don't remember the exact figure, but it was something amazingly high.

Unfortunately, drivers (and software in general) don't awlays fail gracefully.  If it goes of stomping on the rest of the OSes memory, simply killing driver isn't enough.  The OS itself is hosed.  Blue Screening is actually the safest option.  If it kept running, it could potentially do things like hose the IDE driver's buffers, which would ruin the data on your hard drive, etc.

Drivers are in kernel space for performence reasons.  You could isolate drivers like user mode application, but you have to remember that drivers are accessed a LOT, so any extra access overhead will add up very quickly.  Nowadays, computers are very fast, so user-mode drivers aren't quite as painful (witness the new WDF allows for user-mode drivers).  You'll probably start to see non-crticial drivers showing up in user mode over the next few years.  However, don't expect to see anything performence-critical, like a Video/Ethernet/Disk driver, in user mode for the next decade.

A lot of people have argued that stability is more important, therefore drivers should be moved to user mode.  Remember though that even in user mode, a hosed driver can have some devastating consequences.

Suppose that your IDE or SCSI driver was user mode.  Now it can fail without bring the whole sytem down, right?  Nope!  If your disk driver crashes, file I/O stops, and I don't know any OS that runs for more than a few seconds without doing SOME kind of file I/O.  If you have a network filesystem, the consequences for an Ethernet driver failing are just as dire.  USB and firewire have the same issues.  Are you using a USB storage device?  Guess what happens if the USB driver fails?

Graphics cards are maybe the one thing that you could kill and restart without a problem.  They are also perhaps the single most performence critical driver (from the end-user's POV, anyway).  Almost all of your perceptions about a system's speed comes from looking at the screen.

Another thing, with the complexity of today's video chips, it's not unheard of for the chip to go off into never-never land (hardware bug, not a software one).  No amount of driver reload will solve that.  The only thing to do is reset the chip (reboot). 

Furthermore, now that you have video cards with enough RAM to rival PC's, you're starting to see glitchy video memory becomming an issue (which can manifest itself in some pretty bizarre ways).

Myron A. Semack
Wednesday, July 14, 2004

Now, my question to the OP:

Was your video driver WHQL certified?

Myron A. Semack
Wednesday, July 14, 2004

"If you're at risk of losing 4 hours worth of work because the computer crashes, maybe you shouldn't be using the title "engineer.""

Don't be an ass.  Microsoft says their warez are reliable. Although I do question anyone using windows calling themselves an engineer.

.net, the equivalent of MS Bob.
Wednesday, July 14, 2004



Even as far back as Win95a, changing IP's never required a reboot.

Use Winipconfig and then do a "release" and "refresh".

On the command line, do a "ipconfig /release" and a "ipconfig /renew".

KC
Wednesday, July 14, 2004

Many nVidia driver crashes these days are caused not by driver problems, but by overheating.  Make sure your machine has adequate cooling / ventilation.

BadgerBadgerBadger
Wednesday, July 14, 2004

For muppet:
"They can require a reboot if they want to because they have no real reason to make Windows more user friendly for you"

You may have noticed that you have to reboot XP less than you had to reboot Win2k, which you had to reboot less than 98 (which you had to reboot if you changed the IP or video resolution)

You know that "annoying" error reporting dialog box? Those reports go back to engineering where they're analyzed and the highest-occurring crashes can be addressed (see the comment above about nVidia drivers causing a lot of crashes - how do you think they figured that out?).

One of the top goals for the platform teams is reducing the need for reboots. They're getting there (whether you think they have a reason to or not)

Philo

Philo
Wednesday, July 14, 2004

A few comments.

Like other posters, I NEVER have a video-card related BSOD unless I'm playing a video game.

It's also been an nVidia driver crashing.

And yes, the driver is WHQL certified.

Brad Wilson (dotnetguy.techieswithcats.com)
Wednesday, July 14, 2004

The previous posts reminded me of a joke.

One day Satan was having a discussion with Jesus about who was better dealing with modern times.  Jesus knew he was the man, but was willing to let Satan argue (kind of like the Job incedent).  Satan challanged Jesus to a writing contest to see who could write the best paper for their followers using MS Word.

For hours, both Jesus and Satan type away, arguing their posistion.  Suddenly both of their computers crash.  When they reboot there machines, Jesus's work comes back up on the screen, but all of Satan's work is gone.

"NO!" cried Satan.  "Why did all my work disappear, and YOURS didn't?"

Jesus replied, "Jesus Saves."

Steamrolla
Wednesday, July 14, 2004

Jesus Saves Sinners
  ...and redeems them for valuable prizes!

Jesus Saves
  ...but Shaq scores on the rebound

WWJD?
JWRTFM

+
Wednesday, July 14, 2004

Actually, the canonical version of that joke is:

Jesus Saves!
Gretzky steals... HE SCORES!

Ronk!
Wednesday, July 14, 2004

From a PowerPoint on Longhorn Display drivers from WinHEC 2004:

“Longhorn” Display Drivers
Basic model seamless hang recovery

- Evolution in fault-tolerant engineering
-- GPU hangs in Windows 2000 resulted in system hangs
-- In Windows XP, bug check EA was introduced
-- In Windows XP SP1, VGA recovery and callback
-- In “Longhorn”, complete recovery
- In “Longhorn”, driver is notified if hang is detected
-- No longer spinning in driver waiting for GPU
- Driver can reset hardware
-- Applications receive a device lost, but system remains intact
-- Diagnostic data is still created
- Legacy VGA recovery still present as fallback


http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/DW04018_WINHEC2004.ppt

http://www.microsoft.com/whdc/driver/ldk/default.mspx

Chris Altmann
Wednesday, July 14, 2004

That info about Longhorn sounds interesting, but it seems like it is more to address hardware failures than driver failures.  Maybe those are the more common error, though

As for whoever mentioned Linux using user-mode video drivers... I'm not sure if that is true, and I know X uses a totally different architecture than Windows, but the user-mode video driver could contribute to the poor video performance I see pretty regularly on Linux, even doing mundate tasks like switching windows.

Mike McNertney
Wednesday, July 14, 2004

From what I understand, as Linux uses a monolithic kernel the drivers run in kernel mode however the graphics subsystem called xfree86 runs in userspace. Its speed issues I think is due to its architecture and not because it runs in userspace.

---
About Windows running without fs I/O. Sometimes I hear Linux  people brag that in Linux you can unmount the system system hard drive and it still runs. I also read once a story about a BSD doing so and the they only found out that the hard drive was broken when they wanted to replace the old machine.

---
BTW: I have a homebuild NFORCEII with AMD 2500 XP and yes I also have heating / cooling problems and I must use CPU throtteling to keep my CPU from overheating. And the crashes mostly occur when I'm running the seti@home client.

somedude
Wednesday, July 14, 2004

Lots of amazing stuff here!

Maybe one of those 'XP is stable' folks could give me a hint as to why my XP Home box roughly once a week decides to do a 'warm boot' on me? No BSOD just sudden restart. The helpful MS message on reboot suggests that 'a driver' has issued a STOP command. Buggered if it can tell me which one. I've no idea either.

Les C
Thursday, July 15, 2004

Les

Right Click "My Computer"
Choose "Properties"
Choose the  "Advanced Tab"
In the "Startup and Recovery" section, choose "Settings"
Uncheck the box "Automatic Restart"

This should at least give you the blue screen and some more info

Dan G
Thursday, July 15, 2004

>I always tell people to set their 'save frequency' to a value compatible with their tolerance for redoing work

In many (most?) applications that will reset the undo/redo stack, meaning you had better be damn sure that the change you made is something that really works.


Thursday, July 15, 2004

Win9x may be able to aqcuire a new DHCP lease without a reboot, but just try changing the manual address.

A.T.
Thursday, July 15, 2004

The video drivers were put in ring zero because NT3.51 desktop performance was awful.

The Linux GUI has rightly been described as a house of cards, with first X windowing, then your windows manager all basically on top of the shell. For a desktop system XP or 2000 are definitely better.

Strange things cause bluescreens. On my laptop it was the modem driver - updated it and the problem disappeared.

What I would like to know though is why after a couple of hours of using Netscape on XP the whole system can become unresponsive with the screen just not redrawing. Never happens with Win 2K.

Stephen Jones
Thursday, July 15, 2004

Must be Netscape or your video driver because I've never seen that on my XP system...

Chris Nahr
Thursday, July 15, 2004

More helpfully, I suspect that Netscape is hogging GDI resources and not releasing them.

Chris Nahr
Thursday, July 15, 2004

Somethng is hogging the GDI resources. Might be Outlook which on occasion appears to be using 60MB of memory!

Where would I go to find out?

Incidentally I get the same problem with XP at work, where I don't use Netscape, and not on the home desktop, where I use Netscape but with Win 2K.

Stephen Jones
Thursday, July 15, 2004

The free Process Explorer tool by System Internals shows all system handles used by each process, including GDI and User objects:
http://www.sysinternals.com/ntw2k/freeware/procexp.shtml

Lots of other options, too. You could poke around a bit and see if Netscape allocates some unearthly amount of any resource. Right-click on a process and choose Properties -> Performance to see the full list.

For instance, I see that my Avant Browser (on top of IE) is currently using 227 User objects, 472 GDI objects, and 416 other system handles, making it the greediest process in terms of system handles I'm currently running.

Chris Nahr
Thursday, July 15, 2004

Thanks. They weren't enabled by default in Task Manager I've done it now. Netyscape is using 432 GDIs and for some reason Zone Alarm is using 268, nearly as many as Explorer 299.

svchost is actually using a stunning 1442 handles! I'll keep records for the next few days, and see if I can find the culprit.

Stephen Jones
Thursday, July 15, 2004

"About Windows running without fs I/O. Sometimes I hear Linux  people brag that in Linux you can unmount the system system hard drive and it still runs. I also read once a story about a BSD doing so and the they only found out that the hard drive was broken when they wanted to replace the old machine."

I don't know how much faith I would put in this.  I could see that happening in a server application, where the server service is resident in memory, and there's little/no disk I/O (meaning not a file server, or a web server). 

Maybe the BSD system you mention had something like the root FS was decompressed to a ramdisk.  That could insulate the system from a disk failure (the drive is completely in RAM).  But that wouldn't insulate you from a DRIVER failure.  Ramdisk accesses go through a driver too.  That driver can crash just as easily.

In day to day use, you run lots of programs.  You open files.  You close files.  You close the application.  All of those things cause filesystem hits.  Granted, the disk cache can handle some of this, but not all of it.  Just think of all the files created in your TEMP (or /tmp) directory.  Those are all almost function calls to the disk driver.

And I haven't even started to talk about the Pagefile.  Remmeber, the Windows Pagefile backs your the RAM used by your currently running programs.  It's probably one of the most frequently accessed files in all of Windows.

Myron A. Semack
Thursday, July 15, 2004

Stephen, svchost is a host process that contains all kinds of small tasks that aren't worth having their own dedicated process. One svchost process has ~1200 handles on my system, too -- I didn't consider that when I said Avant was the greediest process.

If Netscape isn't usually much worse than what you posted the slowdown must have some other reason, though I can't imagine what it might be.

Chris Nahr
Friday, July 16, 2004

Murphy's Law being what it is, I haven't had the slowdown since I started monitoring (or perhaps that's the Hesienberg principle). svchost does start up with all those handles claimed, so it is unlikely to be the culprit.

Two other points. Does XP mirror all memory to the page file, as has been suggested, and is that specific to XP. And why does it do it? Will clearing the page file on exit take away the reason for this?

Stephen Jones
Sunday, July 18, 2004

Interesting bit of posts here - I was googling to see if anyone read my WinHEC presentation (even we "evil" MS employees aren't above the curiosity to see if anyone is reading our stuff =)  I'll try and address some of the more common questions I've seen on this thread.

1) Hardware hangs are the single most common cause for video driver instability - these hangs are caused by both faulty hardware and software programming the hw incorrectly.

2) Prior to WinXP, hw hangs did not cause bugchecks, they just hung the machine, and no one ever knew if it was video card related or not.

3) In WinXP, we bugcheck the machine so that we can at least attempt to collect diagnostic data.

4) In WinXP SP1, we attempt to "recover" to VGA.  The idea is that 90% of the time, the hang occurs in the engine and that  the unaccelerated VGA portion of the chip is unaffected.  We will use the VGA portion if all else is gone.  It is still a bad experience, but at least lets you save your work.

5) Sending in "report this error to microsoft" helps fix bluescreens.  We've significantly reduced the overall percentage of system crashes caused by video drivers by analysing the information we receive as well as providing it to the chip vendors and encouraging them to fix their crashes.

6) We have been working with hardware vendors to make the hardware "Resetable" so that in the next version of windows, we will be able to recover the entire GPU from hangs instead of just the VGA portion.  This should allow properly constructed applications to continue running after a hardware hang. 

7) In the next version of Windows, most but not all of the graphics driver will move to user mode.  This will result in software errors in most of the driver bringing down only one process instead of the entire system.  Without going into too many details, this should not have an adverse effect on performance since the overall amount of user->kernel transitions should be roughly the same as XP.

8) We're making significant investments in rearchitecting the entire low level graphics subsystem.  The result should be a cleaner, more robust and consistent graphics experience across the board.  This is especially important  given the emphasis on multiple applications sharing the GPU in our new Platform.

I hope this helps - it's great to see people interested in how the graphics subsystem actually works.  Believe it or not, the Microsoft graphics team is _extremely_ concerned about providing a stable graphics experience to our customers.  We're doing everything we can to achieve this - it is simply a very difficult problem given the amount of hardware and applications that must work properly together.  We've been fairly successful at advocating stability to graphics vendors and will continue to do so until the problem is solved.

Bryan Langley
Saturday, July 24, 2004

One other thing - to answer the original poster's question, graphics drivers will be dynamically load/unloadable in the next version of windows.  They were not designed to do this in the current model.

Bryan Langley
Saturday, July 24, 2004

*  Recent Topics

*  Fog Creek Home