I have a client having trouble with her RAID controller. She reports that she keeps getting an error on boot that her raid set is out of synchronization. It takes close to an hour just to boot when this happens. When I've looked at it the problem did not occur, but this can happen with intermittent problems. I ran hard drive diagnostics, CPU, motherboard, RAM, and everything works fine. I have a hard drive controller diagnostic from Seagate, but it isn't designed for RAID. Her motherboard is an Asus K8V, I think it's K8V-MX, she has 2 SATA hard drives mirrored, and she runs Windows XP. I have mentioned that she may have to replace her motherboard, but I don't want to tell her she has to do this unless I can confirm it is necessary. If she replaced the motherboard and the problem continues, I would hear about it!
Thanks, Rob Dyck
On 2010-03-14 Robert Dyck wrote:
I have a client having trouble with her RAID controller. She reports
You didn't mention what RAID controller. I assume some sort of onboard, like Intel MATRIX?
that she keeps getting an error on boot that her raid set is out of synchronization.
This can happen if her comp resets/crashes or shuts off abnormally. The real issue may be diagnosing that, not the after-effect of the raid rebuild.
It takes close to an hour just to boot when this happens.
Hmm, in XP with Matrix RAID (I don't use any other ob ones) I see reasonable boot times (maybe 40% slowdown max) even when in rebuild mode. Only in Vista do I see horrendous boot times during rebuild.
she may have to replace her motherboard, but I don't want to tell her she has to do this unless I can confirm it is necessary. If she replaced the motherboard and the problem continues, I would hear
First thing to do when having weird mobo issues is look for bad caps (even slightly puffy with no goo == bad caps). See my site: http://www.tecnopolis.ca/tecnopolis/leakycaps.html I fix bad cap mobos.
According to the ASUS website that motherboard has VIA VT8237R southbridge.
I did visually examine the caps the last two times she called me about this. I didn't see any bulging caps. I worked as PC technician at a retail store from October 2008 through November 2009. We all have to do grunt jobs sometimes. I was a computer programmer or systems admin since February 1981, but when you're unemployed you take what you can get. I saw more damaged computers each week at that job than I had previously seen in my life. The manager didn't like us to replace bad caps, because that might not fix it and customers at a retail store got pissed off at even slightly high bills. The point is I have seen enough bulging capacitors now to recognise them.
I have so many diagnostic programs now that I can pin-point just about anything. But nothing for RAID. Is there any RAID diagnostic out there?
Thanks for the tip about RAID rebuild times. I haven't used RAID myself on a PC. Slowly puttering away with a server at home, but that's a new install and I'm trying to make it minimal. I have no way of knowing how long a RAID resync is supposed to take. She has 2 drives shadowed.
I did ask her about shutdown procedures, and she did claim she always shuts down properly. She's been using computers long enough that she should know that. But then again I'm reminded of a customer I wrote a system for in the mid 1980s. She complained about database damage. She also claimed she always shutdown the application properly. Back then Windows didn't buffer writes, so all she had to do was exit the application then shut off the power. After repairing the database by hand 3 times, I "updated" her application to flush all write buffers after each transaction. I did warn both her and the manager who hired me that this would slow the application somewhat, but after having to repair the database 3 times it was necessary. The database damage mysteriously went away after that change.
Before I start making nasty accusations, is it possible her mobo RAID controller isn't flushing during the Windows shutdown procedure?
Rob Dyck
On 2010-03-14 Robert Dyck wrote:
According to the ASUS website that motherboard has VIA VT8237R southbridge.
(My personal opinions about VIA southbridges omitted...) :-)
take what you can get. I saw more damaged computers each week at that job than I had previously seen in my life. The manager didn't like us to replace bad caps, because that might not fix it and customers at a retail store got pissed off at even slightly high bills. The point is I have seen enough bulging capacitors now to recognise them.
I'm curious, was the bad caps issue widespread when you were in that job? Any trends you noticed in brand, year, etc? I certainly am noticing trends (like the badcaps issue did not stop in 2003) and would love to hear someone else's experiences in the field.
I have so many diagnostic programs now that I can pin-point just about anything. But nothing for RAID. Is there any RAID diagnostic out there?
If you're using the ob VIA raid then there really is no "raid controller" as we know it. As with (mostly) all ob raid, it's just some firmware bits and bios tweaks to make the system raid capable. There really is no "raid controller" to test. The "controller" is really the main CPU. So, no, you won't find any diag tools.
Now, there will be HD diag tools you can use to test just the HD's, but generally you have to somehow disable the RAID first (risky as when you re-enable, there's a question of it knowing the array is still there!), or take the drives out onto another system (1 at a time) to test.
If she isn't reporting any crashing/rebooting, and the RAID is always degrading, then the problem is almost invariably hard drives dying. Run full scan diags on them. Even with zero errors, drives can kick from arrays. I see a lot of this lately, esp with Seagate drives (what I sell most of). Perhaps the HD vendors are making their non-raid drives more raid-unfriendly to push the raid-version drives, which are now relatively exorbitantly priced? :-)
that's a new install and I'm trying to make it minimal. I have no way of knowing how long a RAID resync is supposed to take. She has 2 drives shadowed.
Good raid systems (like linux md raid) will take a LONG time to rebuild when the system is busy (so it won't bog down your system). Windows fake raid drivers sometimes make the mistake of rebuilding too fast and result in horrible interactive performance for the duration.
Before I start making nasty accusations, is it possible her mobo RAID controller isn't flushing during the Windows shutdown procedure?
Nearly impossible, though in the past I have seen some incredibly braindead VIA southbridge behavior.
Trevor Cordes wrote:
I'm curious, was the bad caps issue widespread when you were in that job?
Any trends you noticed in brand, year, etc? I certainly am
noticing trends (like the badcaps issue did not stop in 2003) and would
love to hear someone else's experiences in the field.
I didn't notice a trend in brand, they all appeared to have issues. I did notice that leaking caps caused the motherboard to completely fail; bulging caps caused strange behaviour rather than complete failure, but it was in the processing of dying quickly. Motherboards older than 2003 did have more problems than newer ones, but the new ones also had bad caps.
There was a strong trend with LCD monitors. We got a number of monitors in for repair, but were not able to repair most of them. The usual problem was blown caps on the power supply. This occurred with Dell, Gateway, and NEC monitors. Most of the monitors brought in were Dell. An occasional monitor had a cracked display; you can't fault the manufacturer if someone sits on a monitor. We couldn't swap parts because they all had the same problem: blown caps in the power supply. I noticed all monitors with this problem had an LCD display manufactured by HannStar. The monitors have 3 parts: LCD display, power supply, and signal processing board. The two boards are not made by HannStar, so you can't fault HannStar, but the power supplies are crap.
I had a bit of a problem there. The owner wanted a technician who was able to repair LCD monitors, but the service department foreman did not want anyone to do so. So I was stuck between the owner and department foreman. I did replace power supply capacitors for one monitor, but as soon as power was applied the replacement caps blew as well. I had checked voltages, and they were correct briefly but kept dropping out. Voltage would be Ok for a fraction of a second, then drop, then come back. I thought replacing the caps would fix it, but whatever blew the caps in the first place burnt out the used replacements as well. Possibly a bad voltage regulator.
One customer had blown capacitors for his onboard motherboard audio. He wanted to connect sound from his computer to his stereo. Unfortunately he connected line out from his computer to line out for his stereo. He said as soon as he connected it, he heard a pop. Sound on his computer hadn't worked since. I opened the case while he was still at the counter; yup, 2 blown caps. I told him we could replace the caps but the surge may have damaged circuitry on his motherboard, and the service department foreman doesn't like us to replace caps, so recommended a new motherboard. Besides, that store charges $30/hour for normal desktop service work, but $60/hour for soldering work. It would cost him less to replace the motherboard. Unfortunately the foreman convinced him to buy a used computer. If he just replaced his motherboard he would have had a brand new motherboard and all the stuff he had before. The used computer cost more than a new motherboard, and was older than his computer. The customer's father said he knows how to solder so may replace the caps himself. I cautioned him that he has to replace the caps with an exact match for capacitance and voltage, as well as dielectric vs. ceramic, and has to be careful when soldering a multi-layer board. Don't overheat the board, you can burn it easily.
The guys talked about bad power supplies in Antec cases, but I thought Antec were premium cases. The computer I have at home has an Antec case, and I've never had problems with it. The Antec cases we got through recycling while I was there worked fine, their power supplies could be used as replacements for customer power supplies. Some technicians also talked about AMD processors not lasting, but again I didn't see that. Again, my computer at home uses an AMD K7 Athalon Thunderbird processor. I've never had problems; but then I was cautioned when I got it that AMD processors require good cooling. I got a very good CPU heat sink. The only problem I've had with my home computer was the video card. I got an G-Force3 v8200 when it was brand new, top-of-the-line. The board died, but the retail store I worked for had an exact match as a used card. I bought the used one at employee discount. When I got home I found the used card had a seized GPU fan, but my old card had a working fan. So I "Frankensteined" it together: took the working fan from the dead card. I now have a program to monitor sensors, the GPU is running hottest of all. I guess that's why that's the component that failed. As for computers through the store: most computers had Intel CPU chips, but that was due to volume of sales. I didn't see any trend with repairs: Intel vs AMD.
That store sells 2 brand of hard drives: Western Digital and Seagate. The Western Digital drives last perfectly for 3 years, but can fail after that. Seagate drives are 50% more expensive, but more reliable.
Maxtor had manufactured hard drives, but were bought out by Seagate. In their last days they replaced the metal top of their hard drives with a heavy foil. The foil top drives failed a lot. IBM Deskstar hard drives also had a reputation for failing, and I did see a number of them come in. Some technicians called them "IBM Deathstar".
Rob Dyck
Robert Dyck wrote:
I have a client having trouble with her RAID controller. She reports that she keeps getting an error on boot that her raid set is out of synchronization. It takes close to an hour just to boot when this happens. When I've looked at it the problem did not occur, but this can happen with intermittent problems. I ran hard drive diagnostics, CPU, motherboard, RAM, and everything works fine. I have a hard drive controller diagnostic from Seagate, but it isn't designed for RAID. Her motherboard is an Asus K8V, I think it's K8V-MX, she has 2 SATA hard drives mirrored, and she runs Windows XP. I have mentioned that she may have to replace her motherboard, but I don't want to tell her she has to do this unless I can confirm it is necessary. If she replaced the motherboard and the problem continues, I would hear about it!
Thanks, Rob Dyck
Dumb thought... If the problem is the onboard RAID controller... Wouldn't it be cheaper for your client to get a RAID card instead of replacing the motherboard? There were a number of suggestions last year and the year before detailing which ones would be best. Of course there are implications I haven't even thought of since I've never been able to get a RAID system working let alone administered one.
Later Mike