Just some observations, tangential ramblings from a recent experience:
For the first time in a long time I ran out of space on my home file server. It was a 8 x 2TB drive linux software (md) RAID6, 12TB usable, with a tweaked XFS on top. 99.5% full and running out of things I can painlessly delete.
Yesterday, I added a 2TB 7200rpm WD SE ("enterprise") drive that is actually quite affordable. A very solid, heavy drive. Built like a SCSI.
Crossed my fingers and did a md RAID6 grow/reshape, which is quite easy to do, but took 24 hours 10 mins to complete. And I did it while the fs live and mounted and nfs exported! (For my sanity I did turn off some heavy-data daemons.) Hooray, no errors!
Read up on XFS grows and did, while mounted, an xfs_grow -d and (surprisingly?) it *instantly* returned and df showed the array magically was 2TB bigger. Hooray #2.
Had to adjust my mdadm.conf (I manually specify some parameters) and luckily remembered to also do a dracut --force (Fedora system) for next boot.
Had to adjust fstab to tweak sunit/swidth (s==stripe) to have the remount use a new, sane, value for performance. I'm a bit confused after reading the sparse docs and the nearly non-existent googles of people doing precisely this: md grow + xfs grow. Of course, it's not imperative, unless you care about performance.
It appears sunit is not changeable unless you increase it by a factor of a power of 2. For instance, doubling your drives would allow you to double your sunit. That kind of sucks. Some articles I read said you should set sunit = RaidChunkSize * 2, which mine is set to. But reading the XFS mans in more detail it strikes me as incorrect, unless you have only 2 usable-data drievs (ie 4 drive RAID6 or 3 drive RAID5). I'm not sure who to believe.
Regardless, if my reading of the XFS docs is correct, it looks like my sunit was never configured properly, and is set to perfectly match 8 data drives, not my old 6 and certainly not my new 7. I may, in short order, get a 10th drive for my array, and run perf tests before and after to see if sunit should indeed be a multiple of usable-drives or not.
swidth is easier. Just make it sunits * usabledrives, and it appears it can be any valid multiple, changed at will. swidth was set correctly for my old setup and is now set correctly for my new setup.
Bottom line is, if my new reading of sunit is correct, and it needs to be tailored to usable-disk-count and can only be changed by factors of 2, then md+XFS is de facto limited to only ever doubling your usable-disk count, and NOT allowing of adding 1 disk here, 1 disk there, as I first thought. *IF* you care about performance, that is. Nothing stops you from doing any of this, but if your FS striping does not match your RAID striping you could see insane performance decreases.
Neat: Filesystem 1K-blocks Used Available Use% Mounted on before: /dev/md4 11716644864 11499740512 216904352 99% /data after : /dev/md4 13669767872 11499740704 2170027168 85% /data
Final note: After hearing ZFS (on Linux at least) cannot grow by 1 disk when using RAID5 or 6, and after nearly 10 years using XFS on md on huge arrays, I say give some hero cookies to md/XFS. It's withstood some pretty strange events on my server, and has never blown up. If I'm wrong about the sunit being a problem, then md/XFS is a great option for those who want to gradually add space as they need it.
On 14-09-10 03:51 AM, Trevor Cordes wrote:
Final note: After hearing ZFS (on Linux at least) cannot grow by 1 disk when using RAID5 or 6, and after nearly 10 years using XFS on md on huge arrays, I say give some hero cookies to md/XFS. It's withstood some pretty strange events on my server, and has never blown up. If I'm wrong about the sunit being a problem, then md/XFS is a great option for those who want to gradually add space as they need it.
The fixed-topology limitation is endemic to ZFS' design, not just the Linux port.
However, there are many use cases where it's not a limitation: almost exactly the type of system we're trying to build. If you only have (say) 8 drive bays, and you fill all 8 drive bays on day 1, you never need to grow the array by a single drive; the only two growth scenarios that are possible are dictated by your hardware: a) replace all 8 drives with larger drives - ZFS supports this b) add an external drive shelf with another (say) 8 drives - ZFS supports this, as a second sub-volume, effectively creating a RAID n+0 (typically RAID60) volume.
ZFS has three major advantages for most people: 1) no RAID write-hole behaviour; 2) automatic resilvering; 3) integrated, effectively infinite, snapshots with built-in replication.
RHEL 7 (and thus CentOS 7, SL 7, etc.) defaults to XFS for the root file system. Obviously you're not the only one who likes XFS!
I've generally found that the filesystem stripe width doesn't make a whole lot of difference on modern hardware; the worst-case I can recall encountering was actually due to block misalignment, not stripe width in the end. I do recall it making a measurable difference on slow 5400rpm IDE drives with a controller that didn't do useful caching, about 10 years ago.
On 2014-09-10 Adam Thompson wrote:
ZFS has three major advantages for most people: 1) no RAID write-hole behaviour; 2) automatic resilvering; 3) integrated, effectively infinite, snapshots with built-in replication.
Hey, I've loved the ZFS idea since the day I heard about it. Until now it appeared to be the be-all-end-all kitchen-sink. Now I find out it has no +1 capability, it loses its luster a little.
Outside of the enterprise space, I can see lots of scenarios where you want to add a less-than-double amount of disks to an existing array. Especially for home use, where you only want to do the big $1500 outlay on disks every 3-6 years (generally when capacities per $ have doubled).
If you find yourself in the "shoot, we should have bought 1 more disk" after your array is full, then having no option but buying another X drives when your budget is spent kind of sucks. The ability to solve a full-array problem by spending just $150 on one disk at any time is very attractive.
I have a funny feeling that md and the FS's on top of it will slowly add many ZFS features to be almost as rich as ZFS. If ZFS has intentional limitations such as the above, then ZFS will never allow +1. Things will balance out in the end. A month ago I would have told you that ZFS would have taken over the world eventually. I for one am happy it won't, because I love the array/FS choices Linux gives: the more the merrier. And those kernel md RAID guys are *really* on the ball (I read the mailing list).
On 2014-09-10 16:59, Trevor Cordes wrote:
Outside of the enterprise space, I can see lots of scenarios where you want to add a less-than-double amount of disks to an existing array. Especially for home use, where you only want to do the big $1500 outlay on disks every 3-6 years (generally when capacities per $ have doubled).
Yup. Although even in the home-user situation, there are a lot of 4-bay enclosures where adding one more disk can't happen anyway.
If you find yourself in the "shoot, we should have bought 1 more disk" after your array is full, then having no option but buying another X drives when your budget is spent kind of sucks. The ability to solve a full-array problem by spending just $150 on one disk at any time is very attractive.
I definitely agree with you on that point.
ZFS has three critical flaws from my perspective: 1) needs way too much RAM for deduplication 2) license (although FreeBSD and Linux both work around this quite well) 3) inability to alter raid volume topology after creation
I have a funny feeling that md and the FS's on top of it will slowly add many ZFS features to be almost as rich as ZFS.
Well... yes and no. The development is mostly happening on btrfs, which is intended to be a head-to-head competitor to ZFS, including the ability to integrate the md layer into the filesystem layer directly. (This allows for much more intelligent - i.e. faster - failed-drive rebuilds, if the array isn't close to being full.)
DragonFlyBSD has also done some amazing work on the HammerFS filesystem, which was designed as a "better-than-ZFS" option from the outset. It's not yet ported to any other OS (AFAIK) and lacks true built-in RAID functions, but in other ways is a very exciting filesystem.
I guess you could say that ZFS is already a "legacy" filesystem, in the sense that it's well-established, widely-adopted, and it already has competitors nipping at its heels. XFS is in pretty much the same boat, but is a little older, and relies on a RAID block-layer device like most filesystems. Not to say these aren't valid choices today - they most certainly are!
Most of the coming-just-around-the-corner filesystem work appears to be happening on btrfs and Hammer... and based on personal experiences, I wouldn't want to run btrfs in production yet, and Hammer apparently also still has some quirks for the unwary.