linux md RAID6 + XFS + add 1 drive - Roundtable

10 Sep 2014


      Just some observations, tangential ramblings from a recent experience:
For the first time in a long time I ran out of space on my home file 
server.  It was a 8 x 2TB drive linux software (md) RAID6, 12TB usable, 
with a tweaked XFS on top.  99.5% full and running out of things I can 
painlessly delete.
Yesterday, I added a 2TB 7200rpm WD SE ("enterprise") drive that is 
actually quite affordable.  A very solid, heavy drive.  Built like a SCSI.
Crossed my fingers and did a md RAID6 grow/reshape, which is quite easy to 
do, but took 24 hours 10 mins to complete.  And I did it while the fs live 
and mounted and nfs exported!  (For my sanity I did turn off some 
heavy-data daemons.)  Hooray, no errors!
Read up on XFS grows and did, while mounted, an xfs_grow -d and 
(surprisingly?) it *instantly* returned and df showed the array magically 
was 2TB bigger.  Hooray #2.
Had to adjust my mdadm.conf (I manually specify some parameters) and 
luckily remembered to also do a dracut --force (Fedora system) for next 
boot.
Had to adjust fstab to tweak sunit/swidth (s==stripe) to have the remount 
use a new, sane, value for performance.  I'm a bit confused after reading 
the sparse docs and the nearly non-existent googles of people doing 
precisely this: md grow + xfs grow.  Of course, it's not imperative, 
unless you care about performance.
It appears sunit is not changeable unless you increase it by a factor of a 
power of 2.  For instance, doubling your drives would allow you to double 
your sunit.  That kind of sucks.  Some articles I read said you should set 
sunit = RaidChunkSize * 2, which mine is set to.  But reading the XFS mans 
in more detail it strikes me as incorrect, unless you have only 2 
usable-data drievs (ie 4 drive RAID6 or 3 drive RAID5).  I'm not sure who 
to believe.
Regardless, if my reading of the XFS docs is correct, it looks like my 
sunit was never configured properly, and is set to perfectly match 8 data 
drives, not my old 6 and certainly not my new 7.  I may, in short order, 
get a 10th drive for my array, and run perf tests before and after to see 
if sunit should indeed be a multiple of usable-drives or not.
swidth is easier.  Just make it sunits * usabledrives, and it appears it 
can be any valid multiple, changed at will.  swidth was set correctly for 
my old setup and is now set correctly for my new setup.
Bottom line is, if my new reading of sunit is correct, and it needs to be 
tailored to usable-disk-count and can only be changed by factors of 2, 
then md+XFS is de facto limited to only ever doubling your usable-disk 
count, and NOT allowing of adding 1 disk here, 1 disk there, as I first 
thought.  *IF* you care about performance, that is.  Nothing stops you 
from doing any of this, but if your FS striping does not match your RAID 
striping you could see insane performance decreases.
Neat:
        Filesystem       1K-blocks        Used  Available Use% Mounted on
before: /dev/md4       11716644864 11499740512 216904352   99% /data
after : /dev/md4       13669767872 11499740704 2170027168  85% /data
Final note: After hearing ZFS (on Linux at least) cannot grow by 1 disk 
when using RAID5 or 6, and after nearly 10 years using XFS on md on huge 
arrays, I say give some hero cookies to md/XFS.  It's withstood some 
pretty strange events on my server, and has never blown up.  If I'm wrong 
about the sunit being a problem, then md/XFS is a great option for those 
who want to gradually add space as they need it.