I love linux software ("md") raid. I use md raid1 on a zillion systems.
I never has issues. Until today...
I get a call that a customer has lost all their emails for about a month
and their apps' data appears to be old and/or missing. Strange.
I login to the linux server and see:
cat /proc/mdstat
Personalities : [raid1]
md122 : active raid1 sda1[0]
409536 blocks [2/1] [U_]
md123 : active raid1 sda2[0]
5242816 blocks [2/1] [U_]
md124 : active raid1 sda3[0]
1939865536 blocks [2/1] [U_]
md125 : active raid1 sdb1[1]
409536 blocks [2/1] [_U]
md126 : active raid1 sdb2[1]
5242816 blocks [2/1] [_U]
md127 : active raid1 sdb3[1]
1939865536 blocks [2/1] [_U]
That's not correct. These systems should have 3 partitions, not 6. Ah,
md has done some really goofball things with this pathological case. It's
separated the raid into duplicates and assembled each separately! Woah!
They said they had a accidental reboot today (kid hitting reset button).
And it booted/rooted off the wrong schizo set (sda).
There appears to have been a drive failure/kick a month ago:
Apr 4 10:10:32 firewall kernel: [1443781.218260] md/raid1:md127: Disk failure on sda3, disabling device.
Apr 4 10:10:32 firewall kernel: [1443781.218262] <1>md/raid1:md127: Operation continuing on 1 devices.
And it hadn't rebooted since then, before today.
It gets stranger... I rebooted the system trying to test a few recovery
ideas (offsite) out. On the next reboot it came up using the good/current
sdb drive for boot/root! Huh? It's like it's picking which one to use at
random! It still shows 6 md arrays, but it's using the properly 3 this
time.
So is all this a bug?
1. Shouldn't the system have marked the sda as failed/bad PERMANENTLY so
on next reboot it would ignore it. OK, I can understand that if it
thought the whole drive was bad, it wouldn't be able to write to the sda
superblock to survive the reboot. But couldn't it have written the info
to sdb's superblock? If a system can't remember what has failed, then I
don't see how this behaviour can be avoided.
2. Why did linux md bring up both sets of arrays? It can see they are the
same array. Why on earth would it ever split them? That seems majorly
screwy to me.
Still, thank God it didn't try to start syncing the stale set to the good
set! We had backups, but it's a pain to recover. In the end, just
rebooting until luck gives us the current set was all it took. I'll head
on-site to replace the bad disk and do a proper resync.
I have had hardware RAID systems (ARAID99) in this exact situation go into
a schizo state where the disks were unsynched yet both were being used for
writes! The problems always seem to revolve around a disk going "soft"
bad and then coming alive after reboot.