expert RAID question

List overview All Threads
Download

newer

older

Re: [RndTbl] Shaw email - new woes

Distributed/replicated home...

Trevor Cordes

22 Apr 2015 22 Apr '15

8:32 a.m.

I'm reading a book talking about RAID 5. For writes (only), it says:

In the case when: "The write changes exactly one disk. The parity context doesn't need to be recovered, since we can compute it just as if the disk had failed. The write involves one write and one parity write. Performance is about 50% of a single disk."

("Parity context" in this book means the contents of that entire stripe; so saying it doesn't need to be recovered means that you don't need to read the other n-2 disks (+1 parity disk) that hold that stripe.)

I'm trying to figure how you "can compute it just as if the disk had failed". This makes no sense to me. I don't see how you could possibly get away with only 2 writes and no reads when changing just a single disk's block in a RAID 5 array. What am I missing??

(This book does have errors, and I am wondering if this is one of them.)

Show replies by date

Adam Thompson

24 Apr 24 Apr

2:36 p.m.

I thought I replied to this, but can't find it, so...

I think the author screwed up. The write changes exactly one "stripe", not exactly one disk. Parity blocks must be recalculated on every write. Also, write performance is not necessarily any percentage of a single disk - it depends on the speed of the parity calculations. RAID5 can be anywhere from 0% to infinity% as fast as a single disk depending on the array.

The parity context (stripe) does in fact need to be read back into working memory prior to being written back out to disk - this is part of the phenomenon known as the RAID 5 "write hole"[1]. Beware, "write hole" is used to describe both a reliability issue and a performance issue - same words, different phenomena. The performance phenomenon is better referred to as a "partial-stripe write". Oracle has a biased (duh!) but technically accurate description of it[2].

I think the author read something about RAID 5 partial-stripe writes and misinterpreted it. OTOH, if he's talking about ZFS "RAIDZ", which is analogous to RAID 5, then he's sort-of correct (but still not 100%).

-Adam

[1] http://www.raid-recovery-guide.com/raid5-write-hole.aspx [2] https://blogs.oracle.com/bonwick/entry/raid_z

On 04/22/2015 03:32 AM, Trevor Cordes wrote:

...

I'm reading a book talking about RAID 5. For writes (only), it says:

In the case when: "The write changes exactly one disk. The parity context doesn't need to be recovered, since we can compute it just as if the disk had failed. The write involves one write and one parity write. Performance is about 50% of a single disk."

("Parity context" in this book means the contents of that entire stripe; so saying it doesn't need to be recovered means that you don't need to read the other n-2 disks (+1 parity disk) that hold that stripe.)

I'm trying to figure how you "can compute it just as if the disk had failed". This makes no sense to me. I don't see how you could possibly get away with only 2 writes and no reads when changing just a single disk's block in a RAID 5 array. What am I missing??

(This book does have errors, and I am wondering if this is one of them.) _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable

Trevor Cordes

25 Apr 25 Apr

5:30 a.m.

On 2015-04-24 Adam Thompson wrote:

...

I think the author screwed up. The write changes exactly one "stripe", not exactly one disk.

Ya, but (allow me to elaborate) he was trying to make a distinction between 3 different RAID 5 write scenarios:

1. write to exactly one stripe width (so exactly one chunk on each disk) 2. write to 2 to n-1 chunks in the same stripe (write hole) 3. write to just 1 chunk to just 1 disk in 1 stripe

It's #3 that I was describing in my initial email. He seems to think that #3 is somehow different than #2. This author isn't stupid, so I'm just trying to see if there's something I am missing. That maybe there is some way to write just 1 chunk and 1 parity chunk without reading the whole stripe (assuming nothing is cached). Normally when an author gets it wrong, you can intuit the thought/editing process and figure out exactly where they made a wrong turn. But in this case I can't fathom any sense out of this at all. How could #3 ever not just be the same as #2?

...

Parity blocks must be recalculated on every write.

Ya, that's what I thought/think.

I just have a strange feeling he's outsmarted all of us and just done a poor job explaining it. But if there was a supersmart algorithm for scenario #3 we'd probably know about it via discussions regarding its implementation in linux, etc. I guess I can hit the source... unless it's just some lone weird hardware raid adapter that has this feature. Before I post an errata I just wanted to make sure I wasn't missing something semi-obvious.

...

Also, write performance is not necessarily any percentage of a single disk - it depends on the speed of the parity calculations. RAID5 can be anywhere from 0% to infinity% as fast as a single disk depending on the array.

He's assuming parity calc / CPU is irrelevant (mostly true these days), it's slow disk (assuming rust for simplicity) that is the bottleneck. Also, it's a "performance" book so he has to put numbers on everything. He seems to pull many out of thin air. Maybe it a "rule of thumb" type thing.

...

I think the author read something about RAID 5 partial-stripe writes and misinterpreted it. OTOH, if he's talking about ZFS "RAIDZ", which is analogous to RAID 5, then he's sort-of correct (but still not 100%).

This is years before ZFS (it's an older book) so definitely just plain jane RAID 5.

3696

Age (days ago)

3699

Last active (days ago)

roundtable@muug.ca

2 comments

2 participants

tags (0)

participants (2)

Adam Thompson
Trevor Cordes