[RndTbl] weird md raid situation

Mon May 14 15:37:09 CDT 2012

I don't know much about the inner workings of Linux software RAID, but I'm
wondering if some ID got borked that makes them look like halves of two
separate arrays.

[root at bob ~]# mdadm --detail /dev/md0 | grep UUID
           UUID : f84b1e9d:5cac2742:d382826c:eabfdbf8
[root at bob ~]# mdadm  --query --examine /dev/sdb1 | egrep '(Magic|UUID)'
          Magic : a92b4efc
           UUID : f84b1e9d:5cac2742:d382826c:eabfdbf8
[root at bob ~]# mdadm  --query --examine /dev/sda2 | egrep '(Magic|UUID)'
          Magic : a92b4efc
           UUID : f84b1e9d:5cac2742:d382826c:eabfdbf8

Just poking around the mdadm command doesn't show anything specific to a
single device. My guess would be that there's some algorithm that
reconstructs the array based on what's found on the controllers. So a
device belongs to the same RAID set as long as they have the same UUID
(which is also repeated in /etc/mdadm.conf on my system). Then it would
look at device specific metadata to figure out the sync status. Browsing
the source to dm-raid1.c and some other files shows there's a notion of a
primary device in the RAID set and some sync tables.

Sean

On Mon, May 14, 2012 at 2:55 PM, Trevor Cordes <trevor at tecnopolis.ca> wrote:

> I love linux software ("md") raid.  I use md raid1 on a zillion systems.
> I never has issues.  Until today...
>
> I get a call that a customer has lost all their emails for about a month
> and their apps' data appears to be old and/or missing.  Strange.
>
> I login to the linux server and see:
>
> cat /proc/mdstat
> Personalities : [raid1]
> md122 : active raid1 sda1[0]
>      409536 blocks [2/1] [U_]
>
> md123 : active raid1 sda2[0]
>      5242816 blocks [2/1] [U_]
>
> md124 : active raid1 sda3[0]
>      1939865536 blocks [2/1] [U_]
>
> md125 : active raid1 sdb1[1]
>      409536 blocks [2/1] [_U]
>
> md126 : active raid1 sdb2[1]
>      5242816 blocks [2/1] [_U]
>
> md127 : active raid1 sdb3[1]
>      1939865536 blocks [2/1] [_U]
>
>
> That's not correct.  These systems should have 3 partitions, not 6.  Ah,
> md has done some really goofball things with this pathological case.  It's
> separated the raid into duplicates and assembled each separately!  Woah!
>
> They said they had a accidental reboot today (kid hitting reset button).
> And it booted/rooted off the wrong schizo set (sda).
>
> There appears to have been a drive failure/kick a month ago:
> Apr  4 10:10:32 firewall kernel: [1443781.218260] md/raid1:md127: Disk
> failure on sda3, disabling device.
> Apr  4 10:10:32 firewall kernel: [1443781.218262] <1>md/raid1:md127:
> Operation continuing on 1 devices.
>
> And it hadn't rebooted since then, before today.
>
> It gets stranger... I rebooted the system trying to test a few recovery
> ideas (offsite) out.  On the next reboot it came up using the good/current
> sdb drive for boot/root!  Huh?  It's like it's picking which one to use at
> random!  It still shows 6 md arrays, but it's using the properly 3 this
> time.
>
> So is all this a bug?
>
> 1. Shouldn't the system have marked the sda as failed/bad PERMANENTLY so
> on next reboot it would ignore it.  OK, I can understand that if it
> thought the whole drive was bad, it wouldn't be able to write to the sda
> superblock to survive the reboot.  But couldn't it have written the info
> to sdb's superblock?  If a system can't remember what has failed, then I
> don't see how this behaviour can be avoided.
>
> 2. Why did linux md bring up both sets of arrays?  It can see they are the
> same array.  Why on earth would it ever split them?  That seems majorly
> screwy to me.
>
>
> Still, thank God it didn't try to start syncing the stale set to the good
> set!  We had backups, but it's a pain to recover.  In the end, just
> rebooting until luck gives us the current set was all it took.  I'll head
> on-site to replace the bad disk and do a proper resync.
>
> I have had hardware RAID systems (ARAID99) in this exact situation go into
> a schizo state where the disks were unsynched yet both were being used for
> writes!  The problems always seem to revolve around a disk going "soft"
> bad and then coming alive after reboot.
> _______________________________________________
> Roundtable mailing list
> Roundtable at muug.mb.ca
> http://www.muug.mb.ca/mailman/listinfo/roundtable
>

-- 
Sean Walberg <sean at ertw.com>    http://ertw.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.muug.mb.ca/pipermail/roundtable/attachments/20120514/a0f72797/attachment.html>