I'm soliciting opinions on whether this is a bug or not.
I had this wacky setup on my Fedora 21 (don't ask why!):
RAID1
on top of
1st HALF: 2nd HALF:
partitions partitions
on top of on top of
RAID0 raw disk
on top of
partitions
on top of
raw disk
And it all worked great. Until I rebooted. Then the RAID1's would come
up but degraded, with only their non-RAID0 half ("2nd HALF" above) in the
array, and the RAID0 parts "gone". I could then re-add the RAID0 half,
reboot and get the same thing.
No matter what I did (messing with mdadm, dracut, grub2, etc) I couldn't
get these things to assemble properly on boot.
Lots of bugs on the net about simpler cases of nested RAID having the same
problem, and many bz's were fixed years ago regarding this. I checked,
and their fixes are in my distro.
I got some help from the old bug guys and when I redid my setup to be
layered like this instead:
RAID1
1st HALF: 2nd HALF:
RAID0 partitions
paritions raw disk
raw disk
... it all magically worked, and they came up on boot, and the bug
disappeared.
The only difference being whether I partition my RAID0 array or make
partitions and then put the multiple RAID0's separately, directly into
each RAID1 array. (The reason I didn't want to do this in the first place
is I was making 5 of these groupings and I didn't want to manage 10
arrays, just 6. And it was only temporary anyhow. I know the buggy
setup is bad design.)
My question is, is this a bug I should report? In theory, in my mind, the
RAID1 on partitioned RAID0 should work fine. The fact mdadm and the
kernel happily support it leans me towards answering "yes". If I can have
it live in the kernel, why not have it survive reboots? I thought you
could nest arbitrary combinations and levels (to X depth) of md, lvm,
paritions, etc. (And, yes, I really tried everything to make it boot,
including insanely detailed mdadm.conf and grub2 boot lines, to minimal
configs, and yes I have partition type fd.) However, I want to make sure
I'm not doing something here that is completely insane and shouldn't be
supported.
It appears dracut, udev and mdadm are responsible for all of this. That's
where the other similar bugs were fixed.
Details:
Here's what it looked like when it was buggy. md9 was the big RAID0 array
that was then parititioned into 4. Note how md9 does come up, get
recognized, but then the boot stuff doesn't "recurse" into those
partitions to see that they themselves are array components. So then
md126 (root by the way) only comes up with 1 or 2 components, after an
annoyingly long delay.
Oct 23 02:37:57 pog kernel: [ 19.542533] sd 8:0:3:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 23 02:37:57 pog kernel: [ 19.542624] sd 8:0:2:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Oct 23 02:37:57 pog kernel: [ 19.553021] sdb: sdb1 sdb2
Oct 23 02:37:57 pog kernel: [ 19.553835] sda: sda1 sda2 sda3 sda4
Oct 23 02:37:57 pog kernel: [ 19.554991] sdc: sdc1 sdc2
Oct 23 02:37:57 pog kernel: [ 19.556918] sd 8:0:2:0: [sdb] Attached SCSI disk
Oct 23 02:37:57 pog kernel: [ 19.558970] sd 8:0:3:0: [sdc] Attached SCSI disk
Oct 23 02:37:57 pog kernel: [ 19.559332] sd 8:0:0:0: [sda] Attached SCSI disk
Oct 23 02:37:57 pog kernel: [ 19.610787] random: nonblocking pool is initialized
Oct 23 02:37:57 pog kernel: [ 19.737894] md: bind<sdb1>
Oct 23 02:37:57 pog kernel: [ 19.742379] md: bind<sdb2>
Oct 23 02:37:57 pog kernel: [ 19.744213] md: bind<sdc2>
Oct 23 02:37:57 pog kernel: [ 19.748375] md: raid0 personality registered for level 0
Oct 23 02:37:57 pog kernel: [ 19.748619] md/raid0:md9: md_size is 285371136 sectors.
Oct 23 02:37:57 pog kernel: [ 19.748623] md: RAID0 configuration for md9 - 1 zone
Oct 23 02:37:57 pog kernel: [ 19.748625] md: zone0=[sdb2/sdc2]
Oct 23 02:37:57 pog kernel: [ 19.748631] zone-offset= 0KB, device-offset= 0KB, size= 142685568KB
Oct 23 02:37:57 pog kernel: [ 19.748633]
Oct 23 02:37:57 pog kernel: [ 19.748650] md9: detected capacity change from 0 to 146110021632
Oct 23 02:37:57 pog kernel: [ 19.752284] md9: p1 p2 p3 p4
Oct 23 02:37:57 pog kernel: [ 19.776482] md: bind<sda1>
Oct 23 02:37:57 pog kernel: [ 19.786343] md: raid1 personality registered for level 1
Oct 23 02:37:57 pog kernel: [ 19.786643] md/raid1:md127: active with 2 out of 2 mirrors
Oct 23 02:37:57 pog kernel: [ 19.786669] md127: detected capacity change from 0 to 419364864
Oct 23 02:37:57 pog kernel: [ 19.823529] md: bind<sda2>
Oct 23 02:37:57 pog kernel: [ 143.320688] md/raid1:md126: active with 1 out of 2 mirrors
Oct 23 02:37:57 pog kernel: [ 143.320733] md126: detected capacity change from 0 to 36350984192