New system I built is crashing [Archive] - Glock Talk

PDA

View Full Version : New system I built is crashing


Drjones
08-29-2011, 19:34
Hey guys.

So the new PC I built myself is randomly crashing.

Here's the thread with my specs: http://glocktalk.com/forums/showthread.php?t=1353613

It has been randomly shutting down, then when it reboots, it says "Reboot and select proper device...or insert boot media in selected device."

Today when I came home, it had a BSOD, said Kernel_Data_Inpage_Error and a bunch of other numbers I'm too lazy to type. :supergrin:

I have had a mild overclock on it, perhaps I'm not doing that right? In any case, I just cleared my CMOS and am in the process of creating a system image so that I can just reinstall Windows from scratch and then recover from that image.

Any thoughts as to what could be going on?? Again, totally random shutdowns; it usually happens when the PC is unattended.

I've stressed it a bit (not too much because I need better cooling) with Prime95 & some other programs, it holds up fine. But then I leave it alone, and kaboom.

Any ideas are much appreciated.

Oh, and yes, all Windows updates have been installed, all my drivers are 100% up to date.

Detectorist
08-29-2011, 19:52
Sounds like a HW issue. Recheck cabling from controller to HD and re-seat memory or check memory one module at a time.

Run CHKDSK.

Check your error logs.

srhoades
08-29-2011, 20:37
Is it shutting down, or rebooting? That makes a difference. If it is just shutting down then you either have a heat problem or a power problem. If it is just rebooting then I would burn the latest memtest iso and let it run for 20 minutes or so. Also, go into the system properties and make sure and uncheck "automatically restart on system failure" so if it is rebooting we can see the BSOD code in its entirety.

Pierre!
08-29-2011, 22:36
I dunno SRHoades - 12Gb of ram I would definitely run overnight on that one...

+1 on the check error logs before you reformat and reinstall.

Then, run at stock speed and see what you get.

CPUID (http://www.cpuid.com/) has a HW Monitor that will tell you what your temps are. Get a baseline on temps with no overclock and a stable system.

You may have been a little light on Ram or CPU voltage with your overclock. Sometimes the ram will write bad data and the system will BSOD on you.

You can always goto EventID.net and see if you can decode any messages in your error log.

This is the part where overclocking can be a real P.I.T.A. I used to burnin test with Prime95 and SuperPI and mix that with MemTest86 for a couple of weeks - All this on AMD systems, and AMD systems when they fail an overclock will often scramble MBR forcing a reinstall... :steamed: Nothing like being an expert installer! :rofl:

So - try it stock for a while and see what happens.

Good Luck!

Patrick

Drjones
08-29-2011, 23:12
Well, too late.
I created a system image using Win7s built-in utility, formatted and reinstalled. I was in the process of doing that while I posted this thread from the PC that my new rig replaced.

And I absolutely cannot restore from the image I created. :steamed:

Not a HUGE deal, as I double-backed up all my data before doing all of this (and I can mount the Image as a VHD) but it's a PITA nonetheless.

On the other hand, I had a problem where my system was taking FOREVER to connect to my NAS; that problem seems to be gone. :)

At least now I know I have a fresh, clean system. I'll keep it at stock speeds and also run the new version of Memtest....

CitizenOfDreams
08-29-2011, 23:31
When you build a new machine - especially if you "mildly overclock" it - you should thoroughly (24 hours or more) test your RAM. Memtest86+ is a good tool for that.

Also, it's a good idea to check the cooling system. Heat the CPU with 100% load and see how high the temperature will rise.

srhoades
08-30-2011, 11:01
I dunno SRHoades - 12Gb of ram I would definitely run overnight on that one...


Patrick

For the complete test yes, but faulty RAM shows up rather quick.

McJohnny
08-30-2011, 11:06
1) Back off the overclocking, run memtest86 overnight. If that passes, then
2) find yourself a CPU torture test and run that overnight.

Have you ensured that all components are properly seated? When you installed the processor, did you use the manufacturer-supplied thermal paste, or clean it off and apply a good coating? Is the heat sink on the processor 100% functional?

Have you done a complete & thorough scan & repair of your boot drive?

This is what I'd do to begin debugging.

McJohnny
08-30-2011, 11:22
I guess I was too late to this party. :)

CitizenOfDreams
08-30-2011, 12:00
I guess I was too late to this party. :)

Your computer must be slow, try overclocking it. :rofl:

Drjones
08-30-2011, 21:36
Your computer must be slow, try overclocking it. :rofl:


:rofl:

Drjones
08-30-2011, 21:40
Ok, computer has been running perfectly since last night, no overclock. It has been on all day, no issues.

My Mobo came with 2 6GBPS SATA cables & 2 standard ones - I switched to the other cable for my HD. Also, the cables seem a bit loose where they plug into the mobo - usually SATA cables fit fairly snugly, but not these....they do have the little clip on them that holds them into place though.

I also haven't installed ANY windows updates, nor updated any of the drivers.

Drjones
08-30-2011, 21:54
When you build a new machine - especially if you "mildly overclock" it - you should thoroughly (24 hours or more) test your RAM. Memtest86+ is a good tool for that.

Also, it's a good idea to check the cooling system. Heat the CPU with 100% load and see how high the temperature will rise.


I will download the latest version of MemTest & get crackin'....

With Prime95 and the OC I had it at, temp rises up to about 90C, then I freak out & stop the program.

I gotta get an aftermarket cooler....or should this thing even be getting that hot with the factory cooler??

Now with no OC, it's only getting up to 80C.

I really need a better cooler, I know...and another case fan....

CitizenOfDreams
08-30-2011, 23:17
Here are my random thoughts on the subject...

Maximum core temperature of 90C is OK. Above 90 would get me worried.

Inexpensive aftermarket coolers are no better than the original OEM cooler.

Modern CPUs are failsafe and will not (should not) burn up if the cooling system fails.

Pierre!
08-31-2011, 01:02
WoW - those i7 processors cook! 100C max!

That's crazy! Yer lookin at a Corsair closed loop system for sure!

Daaaannng...

And I thought my 125W Quad Core AMD was *HOT* :rofl:

When Dr Jones said 80C, and you said Citizen says 90C I thought y'all were drinking too much tonight!!! But nope, 100C is the max thermal load...

Just Crazy. Gonna be quite handy in the Winter Time! :cool:

Patrick

Drjones
08-31-2011, 09:36
Here are my random thoughts on the subject...

Maximum core temperature of 90C is OK. Above 90 would get me worried.

Inexpensive aftermarket coolers are no better than the original OEM cooler.

Modern CPUs are failsafe and will not (should not) burn up if the cooling system fails.


Yeah, but you shouldn't be running it that hot though, right? Everything I've read says you have to keep your parts as cool as possible for longevity.

And this is the cooler I was looking at, has received pretty rave reviews on every website I've seen: http://www.amazon.com/Cooler-Master-Hyper-Sleeve-RR-B10-212P-G1/dp/B002G1YPH0/ref=sr_1_1?ie=UTF8&qid=1314804861&sr=8-1

Drjones
08-31-2011, 09:41
1) Back off the overclocking, run memtest86 overnight. If that passes, then
2) find yourself a CPU torture test and run that overnight.

Have you ensured that all components are properly seated? When you installed the processor, did you use the manufacturer-supplied thermal paste, or clean it off and apply a good coating? Is the heat sink on the processor 100% functional?

Have you done a complete & thorough scan & repair of your boot drive?

This is what I'd do to begin debugging.


*SEEMS* perfectly stable without the OC, and not sure if switching SATA cables helped or not....sure hasn't seemed to hurt anything.

I was pretty darn careful when I seated the CPU, and have checked the RAM several times.

I *think* I might have used a little TOO MUCH thermal paste; the stock CPU fan came with a tiny bit, and I applied a little more....may have put too much, based on what I'm reading.

As for the faulty RAM, in my experience, when RAM is bad, it's usually pretty clear; pretty frequent system freezing, etc. I ran a diagnostic tool that I downloaded from MS direct - not sure how accurate it will be since it gives some message about not being fully compatible with over 4GB RAM & I have 12.

As for scan & repair the HD, what tools do you recommend? Windows' built-in utility?

Thank you!

CitizenOfDreams
08-31-2011, 12:29
Yeah, but you shouldn't be running it that hot though, right? Everything I've read says you have to keep your parts as cool as possible for longevity.

If your computer runs near idle most of the time (like mine), I would be perfectly happy with 90C under maximum load.

If you run a busy server or donate your spare CPU time to distributed computing, I would improve the cooling system to knock off 10-15 degrees.

CitizenOfDreams
08-31-2011, 12:59
As for the faulty RAM, in my experience, when RAM is bad, it's usually pretty clear; pretty frequent system freezing, etc.


Not always. I have seen faulty RAM that ran stable most of the time. Then it would glitch in the worst possible moment... and the user would blame Bill Gates.


I ran a diagnostic tool that I downloaded from MS direct - not sure how accurate it will be since it gives some message about not being fully compatible with over 4GB RAM & I have 12.


Memtest86+ does not have any problems with my 24 gigs.


As for scan & repair the HD, what tools do you recommend? Windows' built-in utility?


If you need to check the file system, use the Windows utility. If you suspect your disk has physical problems, check the SMART status (a good program for that is Crystal Disk Info). Then try diagnostic utilities from the drive manufacturer.

Drjones
08-31-2011, 23:10
Well, I was delighted to see that it crashed again today...:steamed::steamed:

Running MemTest now for a little over 2 hours now, no errors yet.

Anything special I need to do, or just boot & let it go?

Any other ideas on what could be causing this? I already pulled the HD the other day & tested it a bit with a few tools, didn't see any errors....will test it again for sure.

WTF else could be causing this? Should I update the BIOS?

Drjones
08-31-2011, 23:11
Is it shutting down, or rebooting? That makes a difference. If it is just shutting down then you either have a heat problem or a power problem. If it is just rebooting then I would burn the latest memtest iso and let it run for 20 minutes or so. Also, go into the system properties and make sure and uncheck "automatically restart on system failure" so if it is rebooting we can see the BSOD code in its entirety.


Yes, it's rebooting, I did uncheck the reboot automatically box.

Drjones
08-31-2011, 23:16
So, I was pulling the HD out of an old HP tower tonight, and man, the SATA cables are plugged TIGHTLY into the ports - remember that I said the SATA cables that came with my MOBO seem rather loose in the ports....could it be something that simple??

Any recommendations for higher-quality cables?

Detectorist
08-31-2011, 23:56
So, I was pulling the HD out of an old HP tower tonight, and man, the SATA cables are plugged TIGHTLY into the ports - remember that I said the SATA cables that came with my MOBO seem rather loose in the ports....could it be something that simple??

Any recommendations for higher-quality cables?

I have seen many issues with HDD cables. When I sent out a new HDD to a customer for a supposedly bad drive, I also included a new cable. I'd say about 15%-20% of the time the Sata cable or connection was the culprit.

Drjones
09-01-2011, 01:19
I have seen many issues with HDD cables. When I sent out a new HDD to a customer for a supposedly bad drive, I also included a new cable. I'd say about 15%-20% of the time the Sata cable or connection was the culprit.


I just upgraded the Firmware on my SSD. Was 2.06, upgraded to 2.11. Supposed to be much better from a performance and stability standpoint.

We'll see....I have a feeling it's gotta be either the RAM or more likely the SSD.

This might well be the last OCZ drive I buy, especially in light of what an enormous PITA it was to do the upgrade.....holy cow....

CitizenOfDreams
09-01-2011, 04:37
Here is what I would do to troubleshoot the instability... Slightly increase the RAM clock frequency and see if that makes it fail. In my last computer I actually had to downclock the memory to make it stable.

As for SATA cables, I've seen them as loose as [insert your favorite reference here]. Never caused a problem. But it won't hurt to try a different cable anyway.

StarvinMarvin
09-01-2011, 23:43
This might well be the last OCZ drive I buy, especially in light of what an enormous PITA it was to do the upgrade.....holy cow....

If memtest passes then imo, this is your issue. I dislike ocz very much. I hope the best for you.

Before you OC again get some better cooling. This can wait b/c your system has to be super fast at stock.

Drjones
09-01-2011, 23:58
If memtest passes then imo, this is your issue. I dislike ocz very much. I hope the best for you.

Before you OC again get some better cooling. This can wait b/c your system has to be super fast at stock.


Well, I ran it for about 2hrs. 15 min....does it really need to go longer than that? I do have 12GB RAM, but have no clue how those tests work, other than they tell me whether RAM is bad or not.:supergrin:

My mobo manufacturer (asus) has some utility that was on the driver disc that is supposed to stress my RAM; should I give that a shot too?

I upgraded my OCZ Drive firmware, and am not sure if I'm imagining things or not, but it does seem to perform a little better now; programs seem to snap open even faster than before. I did not do any before/after benchmarks; I only upgraded because the updated firmware is supposed to help with stability.

The system has been up and running for I think over 24 hours now, but I still need to leave it a few days & see....

This is probably going to be my last OCZ drive; the first one I got for this new system was DOA, and upgrading the firmware on my current drive was stupidly difficult and unintuitive. And I'm a tech.

Anyway, I'm keeping my fingers crossed...and yes, gonna get a better CPU fan and an additional case fan or two soon.

StarvinMarvin
09-02-2011, 10:47
I would run memtest overnight, but after looking through the ocz support forums your bsod issue has to be from the ssd.

OCZ is aware of firmware issues that have been reported in the field that are potentially causing bluescreens on all SF2000 based drives, this issue affects a very small percentage of Vertex 3 and Agility 3 SSDs, and currently less than 1% of all our customers are affected. This hard to replicate issue is a completely different issue than what some other drive manufacturers are experiencing, which may have similar symptoms but is caused by a hardware issue. Unlike other brands OCZ does not use a reference design, and we design and manufacture our SSDs in-house, and are NOT affected by the hardware problems which are unique to other drive vendors.

OCZ is working diligently with our customers and SandForce to quickly resolve the outstanding firmware issues and we will be releasing a firmware update that addresses the bluescreen issue as soon as it becomes available. In the meantime we encourage any customers that are experiencing any bluescreen issues to contact our customer service team for immediate support.

And they offer some recommendations

We recommend any customers that have experienced the BSOD issue do the following:



1. Please ensure that the SSD is connected to your system with a high quality SATA cable that is rated for SATA 6Gbps; these cables are often supplied with motherboards and do not use SATA cables that are intended for use with previous generation drives.

2. For Intel platform users we recommend using the latest release of the Intel RST driver (10.1.0.1008)

3. If after both of these implementations you are still experiencing any BSODs, we have a firmware patch with SATA timing optimizations that we believe will address the issue. Currently, this new firmware is only recommended for customers that have observed the BSOD issue. This firmware version slightly affects sequential write performance as adjustments were made in the timings. OCZ anticipates future optimizations to the base code to minimize any performance delta associated with this temporary workaround.

Once you find a firmware that makes the bsod go away, keep it. If you still get them with 2.11 then maybe try out 2.09 next or wait till they release a new one. Good luck

Drjones
09-02-2011, 19:13
Yep, I'm gonna leave my system on as long as I can and see how it goes.

So far, it's been on a couple days and no crash....we'll see.....I agree it has to be the hard drive, because I frequently get the "insert boot media" or whatever message, which would indicate to me that the computer isn't seeing the HD.

It's really funny that they should mention that you should have the latest Intel RST drivers, because when you install those, it guarantees that you will NOT be able to update the drive's firmware using the factory OCZ Toolbox utility. :upeyes::steamed:

This fact is also all over their support forums.

In any case, my SSD was on 2.06 and I upgraded to the latest, which seems to be 2.11, and so far so good, but I'm not throwing a party yet....

Drjones
09-02-2011, 19:15
Yeah, but you shouldn't be running it that hot though, right? Everything I've read says you have to keep your parts as cool as possible for longevity.

And this is the cooler I was looking at, has received pretty rave reviews on every website I've seen: http://www.amazon.com/Cooler-Master-Hyper-Sleeve-RR-B10-212P-G1/dp/B002G1YPH0/ref=sr_1_1?ie=UTF8&qid=1314804861&sr=8-1


Any comments on this CPU cooler I'm considering?

CitizenOfDreams
09-02-2011, 19:33
Any comments on this CPU cooler I'm considering?

I tried one very similar to it (Cooler Master Hyper N520, see picture below), and it performed a few degrees worse than the OEM Intel cooler. IMO, you would have to step up to water cooling to get a noticeable improvement over the regular heat pipe cooler. Heat pipe is an amazing technology, but it can never outperform forced coolant circulation.

http://images10.newegg.com/NeweggImage/productimage/35-103-057-03.jpg

StarvinMarvin
09-02-2011, 21:07
Any comments on this CPU cooler I'm considering?

The 212+ is very nice because it offers great value. It's a solid performer and priced well.

I'd get that or this (http://www.newegg.com/Product/Product.aspx?Item=N82E16835181011) right now.

If you want to go max performance, the Noctua NH-D14 or Thermalright Silver Arrow are great, but very $$ and huge in size. Those turnkey watercooling units from corsair seem gimmicky to me because they cost more and perform worse than those two air coolers i mentioned above.

I tried one very similar to it (Cooler Master Hyper N520, see picture below), and it performed a few degrees worse than the OEM Intel cooler.

If his cooler is this one
http://www.cooltechpc.com/ctpc/images/intel_775_fan.jpg

Anything will perform better, that cooler above is terrible.

There is an Intel cooler that is decent but if you have that one above and your performance decreased with aftermarket, I'd suggest you re-install it.

CitizenOfDreams
09-02-2011, 22:13
If his cooler is this one
http://www.cooltechpc.com/ctpc/images/intel_775_fan.jpg

Anything will perform better, that cooler above is terrible.

Well, on my machine the Cooler Master did not perform better than the OEM cooler pictured above. :dunno:

I used the same exact thermal paste on both coolers. And I don't think I did anything wrong - because I re-installed both coolers several times and got the same temperature readings each time.

StarvinMarvin
09-03-2011, 01:08
Well, on my machine the Cooler Master did not perform better than the OEM cooler pictured above. :dunno:

I used the same exact thermal paste on both coolers. And I don't think I did anything wrong - because I re-installed both coolers several times and got the same temperature readings each time.

Very strange

What program do you use to monitor temps? Are you talking about load temps or idle temps? The sensors in these chips aren't temp sensors exactly and aren't very accurate at idle.

CitizenOfDreams
09-03-2011, 07:11
Very strange

What program do you use to monitor temps? Are you talking about load temps or idle temps? The sensors in these chips aren't temp sensors exactly and aren't very accurate at idle.

I was using SpeedFan to monitor the temperature of the cores. Both idle temperature and 100% load temperature were a couple of degrees lower with the OEM cooler. :dunno:

Drjones
09-05-2011, 14:28
Hey guys.....so I posted this thread a week ago, which means that it's been a week and no crashes!

So, had to be the Hard Drive firmware causing my problem.

This is probably going to be the last OCZ drive I buy. Between the first one being DOA and OCZ making it such a royal PITA to upgrade the firmware, I'm done.

It's either not possible, or OCZ purposely engineers it so that you can't flash the firmware on your drive while it's running windows, and you also can't do it with any bootable tools OCZ could provide.

I attached the drive to my old system using a USB/SATA connector, and the OCZ Toolbox that you use to flash the firmware wouldn't recognize the drive.

I ended up flashing the firmware successfully using some sort of bootable linux ISO that I burned to a flash drive. Not sure where it came from or who made it (don't think it was OCZ) but it WORKED.

Anyway, waaaaaay too much effort. OCZ seems like they're trying to make this difficult for me.

Anyway, thanks again for your help. Hope I haven't declared victory too soon.....we'll see! But this thing has been running constantly (even overnight) for almost a week.....

Pierre!
09-05-2011, 17:11
Hey guys.....so I posted this thread a week ago, which means that it's been a week and no crashes!

So, had to be the Hard Drive firmware causing my problem.

This is probably going to be the last OCZ drive I buy. Between the first one being DOA and OCZ making it such a royal PITA to upgrade the firmware, I'm done.

It's either not possible, or OCZ purposely engineers it so that you can't flash the firmware on your drive while it's running windows, and you also can't do it with any bootable tools OCZ could provide.

I attached the drive to my old system using a USB/SATA connector, and the OCZ Toolbox that you use to flash the firmware wouldn't recognize the drive.

I ended up flashing the firmware successfully using some sort of bootable linux ISO that I burned to a flash drive. Not sure where it came from or who made it (don't think it was OCZ) but it WORKED.

Anyway, waaaaaay too much effort. OCZ seems like they're trying to make this difficult for me.

Anyway, thanks again for your help. Hope I haven't declared victory too soon.....we'll see! But this thing has been running constantly (even overnight) for almost a week.....

Hmmm... I would not be so hasty... :cool: You may just be the resident OCZ Firmware Flash Guru! :supergrin:

Always take notes on what works... it's a new tool in your 'Computer Services' Tool Kit... and that stuff is worth $$$ at times.

Glad you are up and running again! (http://SeeberComputerServices.com)

Patrick

Drjones
09-05-2011, 23:01
Good points....thanks, Patrick!