weird md raid situation - Roundtable

14 May 2012


      I love linux software ("md") raid.  I use md raid1 on a zillion systems.  
I never has issues.  Until today...
I get a call that a customer has lost all their emails for about a month 
and their apps' data appears to be old and/or missing.  Strange.
I login to the linux server and see:
cat /proc/mdstat
Personalities : [raid1] 
md122 : active raid1 sda1[0]
      409536 blocks [2/1] [U_]
md123 : active raid1 sda2[0]
      5242816 blocks [2/1] [U_]
md124 : active raid1 sda3[0]
      1939865536 blocks [2/1] [U_]
md125 : active raid1 sdb1[1]
      409536 blocks [2/1] [_U]
md126 : active raid1 sdb2[1]
      5242816 blocks [2/1] [_U]
md127 : active raid1 sdb3[1]
      1939865536 blocks [2/1] [_U]
That's not correct.  These systems should have 3 partitions, not 6.  Ah, 
md has done some really goofball things with this pathological case.  It's 
separated the raid into duplicates and assembled each separately!  Woah!
They said they had a accidental reboot today (kid hitting reset button).  
And it booted/rooted off the wrong schizo set (sda).
There appears to have been a drive failure/kick a month ago:
Apr  4 10:10:32 firewall kernel: [1443781.218260] md/raid1:md127: Disk failure on sda3, disabling device.
Apr  4 10:10:32 firewall kernel: [1443781.218262] <1>md/raid1:md127: Operation continuing on 1 devices.
And it hadn't rebooted since then, before today.
It gets stranger... I rebooted the system trying to test a few recovery 
ideas (offsite) out.  On the next reboot it came up using the good/current 
sdb drive for boot/root!  Huh?  It's like it's picking which one to use at 
random!  It still shows 6 md arrays, but it's using the properly 3 this 
time.
So is all this a bug?
1. Shouldn't the system have marked the sda as failed/bad PERMANENTLY so 
on next reboot it would ignore it.  OK, I can understand that if it 
thought the whole drive was bad, it wouldn't be able to write to the sda
superblock to survive the reboot.  But couldn't it have written the info 
to sdb's superblock?  If a system can't remember what has failed, then I 
don't see how this behaviour can be avoided.
2. Why did linux md bring up both sets of arrays?  It can see they are the 
same array.  Why on earth would it ever split them?  That seems majorly 
screwy to me.
Still, thank God it didn't try to start syncing the stale set to the good 
set!  We had backups, but it's a pain to recover.  In the end, just 
rebooting until luck gives us the current set was all it took.  I'll head 
on-site to replace the bad disk and do a proper resync.
I have had hardware RAID systems (ARAID99) in this exact situation go into 
a schizo state where the disks were unsynched yet both were being used for 
writes!  The problems always seem to revolve around a disk going "soft" 
bad and then coming alive after reboot.