Sunday, October 28, 2007

Windows Crash Dump Analysis - Pinpointing Faulty Drivers with Driver Verifier and WinDbg.

I've been having problems with my system lately, random freezes or crashes (BSOD, DRIVER_IRQL_NOT_LESS_OR_EQUAL, etc). Since the system passes memtest86+ and Vista memory tests, and it is updated to the latest and greatest patch level, I'm pretty sure the problems are caused by 3rd party drivers.

I'm going to use a very overlooked tool in Windows, and that is the "Driver Verifier". This tool has been a part of Windows systems ever since Windows 2000, and it's an invaluable tool in debugging faulty drivers.

You'll need to run the tool manually, just start - run - verifier.exe. A wizzard will pop up.

You'll need to create custom settings, Select individual settings from a full list, and pick everything but "Low resource simulation". You can now either pick unsigned drivers (the usual suspects, since signed drivers are usually tested) or select driver names from a list (pick the ones you most suspect... recently installed before crashes appeared and so on). Make sure you don't pick ALL drivers on your machine, that's quite painful.

In my case, I've picked drivers not provided by Microsoft. To be more specific, the Intel Turbo Memory (Robson) Driver, since this is a Santa Rosa platform laptop, and Turbo Memory isn't really know for it's stability or performance boosts. In fact, some laptop providers like HP have said NO to Turbo Memory.

iaStor.sys is the Intel Matrix Storage Manager Driver and ianvstor.sys is the Intel Turbo Memory (Robson) driver. Both install with the Intel Turbo Memory driver.

In case you're more worried about it's actual usefulness than security issues (people reading data from the solid state device) or stability issues, take a look at this article:

Investigating Intel's Turbo Memory: Does it really work?

Once you're done rebooting, make sure you check the results tab:

After this change you'll need to reboot your system, and wait for a crash to occur. Just in case your system will freeze, it's generally a good idea to boot your system in Debugging Mode (F8 at startup) and make sure you have full memory dumps and CrashOnCtrlScroll enabled.

Once you've managed to obtain a crash dump (either by a BSOD or connected a debugger to a hanged system and typing .dump, or even crashing the system yourself with Ctrl - Scroll Lock - Scroll Lock in case of a hang) you'll need to analyze it. For this task, you'll need Windows Debugger (WinDbg) and a Debugging Symbols Server. You can grab them off Microsoft's website. Just make sure they're the right ones for your system.

Once you've got yourself a debugging environment set up, open WinDbg, pick File - Open Crash Dump (Ctrl - D) and open the fresh memory dump (C:\Windows\MEMORY.DMP by default - check the Startup and Recovery Tab to make sure).

As we can see in the image, it did not find any debugging symbols for the iaStor.sys and iaNvStor.sys drivers (the Intel Turbo Memory Drivers) since they are 3rd party drivers.

We're going to type "!analyze -v" to get more details on the error.

Looking at the stack, we can see:

WARNING: Stack unwind information not available. Following frames may be wrong.
88fafd00 80673529 85f2b0e8 8593d194 b92b8f00 iaStor+0x3cf51

That's the Intel TurboMemory driver (well, the Intel Matrix Storage Manager to be more exact).

Since this is a Core 2 system, we have CPU 0 and CPU 1. To run instructions on CPU 1, we need to switch to it, using the ~1 command.

We're going to examine the stack on both CPU's:

0: kd> ~0
0: kd> k
ChildEBP RetAddr
a9563bb0 81ce8651 hal!KeReleaseQueuedSpinLock+0x26
a9563c14 81ce86f6 nt!ExFreePoolWithTag+0xae7
a9563c24 81dec9ae nt!ExFreePool+0xf
a9563c60 81d3949b nt!ObOpenObjectByName+0x47b
a9563d2c 81d39258 nt!CmOpenKey+0x1b1
a9563d50 81c8c92a nt!NtOpenKey+0x39
a9563d50 77ae0f34 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
0097f0c4 765d5fc9 0x77ae0f34
0097f120 7566548e 0x765d5fc9
0097f214 75664e62 0x7566548e
0097f274 75665581 0x75664e62
0097f2a4 75665f46 0x75665581
0097f450 75665f5f 0x75665f46
0097f474 75664b10 0x75665f5f
0097f4d0 75664a05 0x75664b10
0097f51c 76686d7e 0x75664a05
0097f548 767003a2 0x76686d7e
0097f974 766ff44c 0x767003a2
0097f990 766873cb 0x766ff44c
0097f9cc 76687279 0x766873cb
0: kd> ~1
1: kd> k
ChildEBP RetAddr
88fafb64 828059c1 nt!KeBugCheckEx+0x1e
88fafb94 82805d01 crcdisk!VerifyOrStoreSectorCheckSum+0x111
88fafbc4 8280521a crcdisk!VerifyCheckSum+0xa9
88fafc00 82805570 crcdisk!CompleteXfer+0x16a
88fafc14 81ecec69 crcdisk!CrcScsiReadCompletion+0x20
88fafc4c 81caca3b nt!IovpLocalCompletionRoutine+0xcc
88fafc80 81eceb53 nt!IopfCompleteRequest+0x13d
88fafcf0 8297ef51 nt!IovCompleteRequest+0x11c
WARNING: Stack unwind information not available. Following frames may be wrong.
88fafd00 80673529 iaStor+0x3cf51
88fafd10 8067a8b5 iaNvStor+0x16529
88fafd30 8067b1f1 iaNvStor+0x1d8b5
88fafd4c 80679740 iaNvStor+0x1e1f1
88fafd7c 81e25472 iaNvStor+0x1c740
88fafdc0 81c9141e nt!PspSystemThreadStartup+0x9d
00000000 00000000 nt!KiThreadStartup+0x16

We can use lm kv to list currently loaded drivers and version information. From this list, I've selected the two drivers we're interested in:

8065d000 80699000 iaNvStor (no symbols)
Loaded symbol image file: iaNvStor.sys
Image path: \SystemRoot\system32\DRIVERS\iaNvStor.sys
Image name: iaNvStor.sys
Timestamp: Sun Mar 11 10:11:01 2007 (45F3B995)
CheckSum: 000423B0
ImageSize: 0003C000
Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0

82942000 82a00000 iaStor (no symbols)
Loaded symbol image file: iaStor.sys
Image path: \SystemRoot\system32\DRIVERS\iaStor.sys
Image name: iaStor.sys
Timestamp: Mon Feb 12 22:46:47 2007 (45D0D237)
CheckSum: 0004966D
ImageSize: 000BE000
Translations: 0000.04b0 0000.04e0 0409.04b0 0409.04e0

This tells us the driver version and timestamp, to let us know if an update is in order. Usually old drivers are suspect.

Useful commands:

  • ~N where N is the CPU number (count starts from 0) changes to that CPU
  • !analyze -v gives a detailed heuristic analysis of the problem, and looks for 3rd party drivers that might be responsible for the crash.
  • lm kv lists loaded drivers and gives details on them
  • !deadlock the deadlock verifier
  • !vm prints memory usage. If pool usage is close to pool maximum, then a driver might have a memory leak.
  • !poolused if pool tagging is on (by default on 2003+) displays kernel memory usage pools by pool tag, and allows you to map the pools back to the drivers. Mapping it for 3rd party drivers require you to grep printable strings in the driver. See "!poolused c".
  • !thread examine current thread (run on each CPU). If a driver interrupts a running thread, this may not list the cause of the crash.
  • !process 0 0 list active processes. Look for suspect processes that shouldn't be running, or common 3rd party processes that show up in multiple crashes.
Since all results point at the Turbo Memory drivers, and while working I've got another couple of crashes, with similar results (Probably caused by : iaNvStor.sys ( iaNvStor+3523 )), I've got two choices here: either look for updates or remove the driver, and stop using Turbo Memory. Since most benchmarks say it doesn't provide an actual performance boost, and may provide a mild battery boost, I concluded it's safer to just remove the driver, and stop using Turbo Memory on this machines.

The Windows Experience Index on this machine is 4.7 with Turbo Memory enabled, 4.7 with Turbo Memory caching disabled. No change in performance (Sure, the Experience Index may be no proper benchmark, but it's more than relevant in this case).

Seeing how disabling the caching didn't affect performance, I simply removed the software from my machine. Big mistake. Uninstalling the Turbo Memory software also gave this neat little error when booting Windows:

Please insert the Windows recovery CD.

Windows failed to load because a critical System driver is missing or corrupt.

Well, that was fun. At least restoring BOOTSECT managed to fix things.

Final Words:

A vast majority of Windows crashes are caused either by:

Unstable hardware:
  • Entry level memory with no ECC or Chipkill, sometimes running and very high speeds it wasn't designed for
  • Broken memory: Make sure you run memtest86+ or Vista's Memory Tester (boot the DVD and pick Memory Test).
  • Overclocked processors and high temperatures

Buggy 3rd party drivers:

  • Binary blob drivers that aren't proper tested, verified and updated for all versions of Windows
  • Drivers for small time hardware like keyboards with "special" keys or 8 button mice and so on.
My advice is simple: avoid such drivers at all costs. The functionality they add is minor, and the risks and stability problems are not worth it. If your keyboard's "Email" button requires a kernel module, forget about it, the idea is broken by design.

And remember, if you have problems, make sure you obtain a memory dump for analysis, and always perfrom a hardware test before blaming it on software: a full POST, a memtest86+ test, a CPU Prime95 test, monitor system temperatures and disk S.M.A.R.T. data with HD Tune, check your hardware cabling and such, then look at the software. Mainly, at the drivers: remove unneeded 3rd party drivers, software (use Sysinternals autoruns) and hardware, install the latest driver versions, update to the latest Windows patchlevel.


Anonymous said...

Especially avoid Logitech's itouch product, it's broken by design.