Thursday, December 30, 2010

Windows 7 BSOD's

Ok, this one really got on my nerves. And it still does to be honest.
At home on my computer i get these strange random BSOD's. One time it says it has something to do with my disk, next one was about power, and i even had several about memory. Since the random factor of the BSOD's looks like a hardware conflict of any kind i tried to do the following :

- change disks
- change disk settings in the BIOS
- change voltage settings in the BIOS
- overclocked/underclocked several settings in the BIOS
- Changed the PSU (from 350w to 550w)
- Did several disk checks
- Did several memory checks
- Changed SATA cables
- Installed new/different drivers
- Reinstalled Windows 7 about 10 times
- Installed Windows Server 2008 r2

And nothing helped. The times between BSOD's got random, where an overclock from the BIOS seems to help the most. But it only helped to push the times of random BSOD's a little further.

So, time to take it to the testtable.... if i had one ;-)
The following hardware specifications i am using :
- ASRock 890GX Motherboard
- AMD Phenom x4 925
- Recom 550watt PSU
- ATI Radeon HD4850 pci-e video card
- 2x2gb Mushkin memory
- 3 sata disks, 2x80gb and 1x640gb (in IDE mode in BIOS, because AHCI or RAID won't recognize them properly, maybe this is due to the fact that i am using port 4,5 and 6 from my SATA controller instead of 1,2 and 3, have to look into this)

That's it, that is the whole setup i am using. Can be several things of course. One thing it certainly is not the issue, temperature. Why? I have monitored my temperature sensors, since i thought this was the issue. I came to that from a co-worker who putted a vacuum cleaner on his laptop's tempfan while having several power downs. It worked for him, so maybe this was my problem to. I cleaned my PSU (which was DIRTY!), my CPU Fan and my video card fan. It looked like it worked, but it didn't. Temperatures got down immediately, so if temperature is high on your system without any noticeable reason, maybe it is a dirty fan ;-)

So, i have some BSOD dumps, why not get into that?

Ok.. so you have to do some things to read a BSOD memorydump. I use some tooling, like Windows SDK as a start, so we have all debugging components. You can find them here :
When you have installed the debugging tools, you will have a "c:\Program Files\Debugging Tools for Windows (x64)" directory where you can find the file "cdb.exe". That is the only thing we need.

I used it with the following options :

-logo c:debuglog.txt
-c "!analyze -v;r;kv;lmtn;.logclose;q"
-y SRV*c:\symbols*
-i C:\Windows;C:\Windows\system32;C:\Windows\system32\drivers
-z "C:\Windows\Minidump\122910-17409-01.dmp"

When you run "cdb.exe" with these options, you'll get a "c:\debuglog.txt" (the -logo parameter), where the -c parameter gives a command to the debugger (to automate things). For a closer look into this command look into the included helpfile, it will sort things out!
The symbols flag (-y) is pretty self explaining, as the same goes for the -i flag, where you give a reference or references to the location(s) where the images are that we have to use.
The next one, -z, is the minidump we got from the BSOD itself.

So the complete command will be :
"C:\Program Files\Debugging Tools for Windows (x64)\cdb.exe  -logo c:debuglog.txt -c "!analyze -v;r;kv;lmtn;.logclose;q" -y SRV*c:\symbols* -i C:\Windows;C:\Windows\system32;C:\Windows\system32\drivers -z "C:\Windows\Minidump\122910-17409-01.dmp" "

Make a debuglog for every minidump you have and start comparing it. When you open these log's you'll see a lot, and i mean a lot, of information. The first part is about your OS. Buildnumbers, some symbol information and after that you'll get the in depth analysis.

This analysis learned me really nothing. How sad is that :( Well.. it learned me that anything can be the problem. from disk reads/write to memory reads/write (any buffers in the motherboard maybe??). Heating is not the issue, low wattage, corrupt disks, you name, all not the issue. And then we had a real bright moment.. a REAL bright moment. BIOS Updates. And yeah, 2 were available. I installed them both, and WOW. It solved everything.

So next time you go into windows debugging to solve a really annoying problem, make sure you have all updates, windows AND bios, installed, and then look further. Thank god for release notes in BIOS updates (are they even existing??)

1 comment:

  1. And allways Windows gets the blame, but mostly its faulty hardware. I once had an experience like this were the power's earhting (is that even a word? I mean "aarde") was the problem. Glad you went through the logs and found out the BIOS was the problem....

    Gr., roland