Look back in dmesg output for the RAID module speed tests, notice which one was selected, and divide that number by 2. There's your theoretical bottleneck on the CPU. Take the minimum sustained disk/channel/controller throughput, factor in interrupt latency, device driver efficiency, etc. and make a rough guess as to the overall throughput. Consider that md code seems to have a lot of write barriers for safety - so even a rebuild will aspens much of its time waiting for the disk to sync(). All in all, I think your numbers are probably reasonable. -Adam
Kevin McGregor kevin.a.mcgregor@gmail.com wrote:
I installed Ubuntu Server 10.04.2 LTS AMD64 on a HP ProLiant ML370 G3 (4 x dual-core/hyperthreaded Xeon 2.66 GHz, 8 GB RAM) and I used the on-board SCSI controller to manage 8 x 300 GB 15K RPM SCSI drives in a software RAID 5 set up as a 7-drive array with 1 hot-spare drive. All drives are the exact same model with the same firmware version.
It's currently rebuilding the array (because I just created the array) and /proc/mdstat is reporting "finish=165.7min speed=25856K/sec". Does that sound "right" in the sense that it's the right order of magnitude? I though it should be higher, but I haven't set up such an array before, so I don't have anything to compare it to.
If it's slow, does anyone have a suggestion for speeding it up?
Kevin
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
raid6: using algorithm sse2x2 (2883 MB/s). So, 1% of that is reasonable? :-) Oh well, I guess I can wait until tomorrow for the rebuild to finish.
On Thu, May 19, 2011 at 2:12 PM, Adam Thompson athompso@athompso.netwrote:
Look back in dmesg output for the RAID module speed tests, notice which one was selected, and divide that number by 2. There's your theoretical bottleneck on the CPU. Take the minimum sustained disk/channel/controller throughput, factor in interrupt latency, device driver efficiency, etc. and make a rough guess as to the overall throughput. Consider that md code seems to have a lot of write barriers for safety - so even a rebuild will aspens much of its time waiting for the disk to sync(). All in all, I think your numbers are probably reasonable. -Adam
Kevin McGregor kevin.a.mcgregor@gmail.com wrote:
I installed Ubuntu Server 10.04.2 LTS AMD64 on a HP ProLiant ML370 G3 (4 x dual-core/hyperthreaded Xeon 2.66 GHz, 8 GB RAM) and I used the on-board SCSI controller to manage 8 x 300 GB 15K RPM SCSI drives in a software
RAID
5 set up as a 7-drive array with 1 hot-spare drive. All drives are the
exact
same model with the same firmware version.
It's currently rebuilding the array (because I just created the array) and /proc/mdstat is reporting "finish=165.7min speed=25856K/sec". Does that sound "right" in the sense that it's the right order of magnitude? I
though
it should be higher, but I haven't set up such an array before, so I don't have anything to compare it to.
If it's slow, does anyone have a suggestion for speeding it up?
Kevin
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Well, I did say there was more than one variable involved! 25MB/sec is a bit slow, but I don’t know how efficiently that LSI chip is at managing bus contention, or how efficient the kernel driver for that chip is. I do know from experience that 8-disk arrays are well into the land of diminishing returns from a speed perspective: RAID-1 on two disks or RAID-10 on four seems to be the sweet spot for speed.
Once your rebuild has finished, I would recommend doing some throughput tests (both reading and writing) on the array; something perhaps like:
# sync; time sh –c ‘dd if=/dev/zero of=/mnt/raidarray/BIGFILE.ZERO bs=1M count=1024;sync’
followed by
# time dd if=/dev/md0 of=/dev/null bs=1M count=1024
Those are both very naïve approaches, but should give you a feel for the maximum read and write speeds of your array. I strongly suspect those numbers will be much higher than the raid re-sync rate, again, mostly due to write-flush barriers in the md code.
I’m interested in knowing how this ends, personally… please let us know.
-Adam
(P.S. does anyone know how to avoid top-posting in Outlook 2010?)
From: roundtable-bounces@muug.mb.ca [mailto:roundtable-bounces@muug.mb.ca] On Behalf Of Kevin McGregor Sent: Thursday, May 19, 2011 14:52 To: Continuation of Round Table discussion Subject: Re: [RndTbl] RAID5 rebuild performance
raid6: using algorithm sse2x2 (2883 MB/s). So, 1% of that is reasonable? :-) Oh well, I guess I can wait until tomorrow for the rebuild to finish.
On Thu, May 19, 2011 at 2:12 PM, Adam Thompson athompso@athompso.net wrote:
Look back in dmesg output for the RAID module speed tests, notice which one was selected, and divide that number by 2. There's your theoretical bottleneck on the CPU. Take the minimum sustained disk/channel/controller throughput, factor in interrupt latency, device driver efficiency, etc. and make a rough guess as to the overall throughput. Consider that md code seems to have a lot of write barriers for safety - so even a rebuild will aspens much of its time waiting for the disk to sync(). All in all, I think your numbers are probably reasonable. -Adam
Kevin McGregor kevin.a.mcgregor@gmail.com wrote:
I installed Ubuntu Server 10.04.2 LTS AMD64 on a HP ProLiant ML370 G3 (4 x dual-core/hyperthreaded Xeon 2.66 GHz, 8 GB RAM) and I used the on-board SCSI controller to manage 8 x 300 GB 15K RPM SCSI drives in a software RAID 5 set up as a 7-drive array with 1 hot-spare drive. All drives are the exact same model with the same firmware version.
It's currently rebuilding the array (because I just created the array) and /proc/mdstat is reporting "finish=165.7min speed=25856K/sec". Does that sound "right" in the sense that it's the right order of magnitude? I though it should be higher, but I haven't set up such an array before, so I don't have anything to compare it to.
If it's slow, does anyone have a suggestion for speeding it up?
Kevin
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
_______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Instead if /dev/zero use another deduce to read blindly. There is no guarantee /dev/zero will not produce a sparse file.
Generate a realistic data file from a sufferer volume (say the boot volume or a well used/full USB key or drive). Better data set, better entropy to avoid caching ruining the results.
I'm interested in the results too!
So far I made this md_d0 : active raid10 sdn1[12](S) sdm1[13](S) sdl1[11] sdk1[10] sdj1[9] sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 1757804544 blocks 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU] and got this # dd if=/dev/zero of=/srv/d0/bigzerofile bs=1M count=32768 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 281.034 s, 122 MB/s # dd of=/dev/null if=/srv/d0/bigzerofile bs=1M 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 126.21 s, 272 MB/s
I'm wondering if 12 drives would over-saturate one Ultra-320 channel. Doesn't Ultra-320 suggest a maximum usable (or theoretical) capacity of 320 MB/s? I could try setting up a stripe set/RAID0 of varying numbers of drives and compare. What do you think?
I don't think the external enclosure (HP MSA30?) allows for splitting the drives into two groups. Only one cable can be connected to it, although there may have been an option for a second at purchase.
On Thu, May 19, 2011 at 9:27 PM, Adam Thompson athompso@athompso.netwrote:
Well, I did say there was more than one variable involved! 25MB/sec is a bit slow, but I don’t know how efficiently that LSI chip is at managing bus contention, or how efficient the kernel driver for that chip is. I do know from experience that 8-disk arrays are well into the land of diminishing returns from a speed perspective: RAID-1 on two disks or RAID-10 on four seems to be the sweet spot for speed.
Once your rebuild has finished, I would recommend doing some throughput tests (both reading and writing) on the array; something perhaps like:
# sync; time sh –c ‘dd if=/dev/zero of=/mnt/raidarray/BIGFILE.ZERO bs=1M count=1024;sync’
followed by
# time dd if=/dev/md0 of=/dev/null bs=1M count=1024
Those are both very naïve approaches, but should give you a feel for the maximum read and write speeds of your array. I strongly suspect those numbers will be much higher than the raid re-sync rate, again, mostly due to write-flush barriers in the md code.
I’m interested in knowing how this ends, personally… please let us know.
-Adam
(P.S. does anyone know how to avoid top-posting in Outlook 2010?)
*From:* roundtable-bounces@muug.mb.ca [mailto: roundtable-bounces@muug.mb.ca] *On Behalf Of *Kevin McGregor *Sent:* Thursday, May 19, 2011 14:52 *To:* Continuation of Round Table discussion *Subject:* Re: [RndTbl] RAID5 rebuild performance
raid6: using algorithm sse2x2 (2883 MB/s). So, 1% of that is reasonable? :-) Oh well, I guess I can wait until tomorrow for the rebuild to finish.
On Thu, May 19, 2011 at 2:12 PM, Adam Thompson athompso@athompso.net wrote:
Look back in dmesg output for the RAID module speed tests, notice which one was selected, and divide that number by 2. There's your theoretical bottleneck on the CPU. Take the minimum sustained disk/channel/controller throughput, factor in interrupt latency, device driver efficiency, etc. and make a rough guess as to the overall throughput. Consider that md code seems to have a lot of write barriers for safety - so even a rebuild will aspens much of its time waiting for the disk to sync(). All in all, I think your numbers are probably reasonable. -Adam
Kevin McGregor kevin.a.mcgregor@gmail.com wrote:
I installed Ubuntu Server 10.04.2 LTS AMD64 on a HP ProLiant ML370 G3 (4 x dual-core/hyperthreaded Xeon 2.66 GHz, 8 GB RAM) and I used the on-board SCSI controller to manage 8 x 300 GB 15K RPM SCSI drives in a software
RAID
5 set up as a 7-drive array with 1 hot-spare drive. All drives are the
exact
same model with the same firmware version.
It's currently rebuilding the array (because I just created the array) and /proc/mdstat is reporting "finish=165.7min speed=25856K/sec". Does that sound "right" in the sense that it's the right order of magnitude? I
though
it should be higher, but I haven't set up such an array before, so I don't have anything to compare it to.
If it's slow, does anyone have a suggestion for speeding it up?
Kevin
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
My RAID6 8x 2TB-drive SATA XFS gives:
#dd if=/dev/zero of=/new/test bs=1M count=32768 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 152.032 s, 226 MB/s
#dd of=/dev/null if=/new/test bs=1M 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 57.7027 s, 595 MB/s (wow!!)
During the whole write time both CPU cores were 90-100%, mostly 95-99%! Glad to see RAID/XFS code is multi-core aware. For reading it was 1 core at 100% and the other around 20%. The write limiting factor appears to be my piddly Pentium D on my file server. Still, this is 3-4X the speed my old (1TB drives, crappy PCI SATA cards) array was giving me. The read is quite interesting in that the 100% CPU indicates it is probably doing parity checks on every read.
I think a big part of the good speed is my new 8-port SATA card, an Intel PCI-Express x 8 in a x8 slot. If your SCSI card is just PCI, then the PCI MB/s speed limit is what's killing you. Even PCI-X may be limiting. And the Intel card was pretty cheap, under $200.
BTW, I got stuck with two spare SATA card expander cables (1 card port to 4 SATA drives) if anyone wants some cheap. I can get in the Intel cards too, if anyone wants a complete package.
Yup, a separate channel for each drive plus having them all connected to a PCIex8 controller would make a big difference.
On Fri, May 20, 2011 at 4:54 PM, Trevor Cordes trevor@tecnopolis.ca wrote:
My RAID6 8x 2TB-drive SATA XFS gives:
#dd if=/dev/zero of=/new/test bs=1M count=32768 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 152.032 s, 226 MB/s
#dd of=/dev/null if=/new/test bs=1M 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 57.7027 s, 595 MB/s (wow!!)
During the whole write time both CPU cores were 90-100%, mostly 95-99%! Glad to see RAID/XFS code is multi-core aware. For reading it was 1 core at 100% and the other around 20%. The write limiting factor appears to be my piddly Pentium D on my file server. Still, this is 3-4X the speed my old (1TB drives, crappy PCI SATA cards) array was giving me. The read is quite interesting in that the 100% CPU indicates it is probably doing parity checks on every read.
I think a big part of the good speed is my new 8-port SATA card, an Intel PCI-Express x 8 in a x8 slot. If your SCSI card is just PCI, then the PCI MB/s speed limit is what's killing you. Even PCI-X may be limiting. And the Intel card was pretty cheap, under $200.
BTW, I got stuck with two spare SATA card expander cables (1 card port to 4 SATA drives) if anyone wants some cheap. I can get in the Intel cards too, if anyone wants a complete package. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
For yet another comparison, my RAID10 6x 750 GB SATA XFS gives: # dd if=/dev/zero of=bigfile bs=1M count=16384 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 79.5334 s, 216 MB/s # dd of=/dev/null if=bigfile bs=1M 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 31.4869 s, 546 MB/s
Maybe I should switch to RAID6. ;-)
Kevin
On Fri, May 20, 2011 at 4:54 PM, Trevor Cordes trevor@tecnopolis.ca wrote:
My RAID6 8x 2TB-drive SATA XFS gives:
#dd if=/dev/zero of=/new/test bs=1M count=32768 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 152.032 s, 226 MB/s
#dd of=/dev/null if=/new/test bs=1M 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 57.7027 s, 595 MB/s (wow!!)
During the whole write time both CPU cores were 90-100%, mostly 95-99%! Glad to see RAID/XFS code is multi-core aware. For reading it was 1 core at 100% and the other around 20%. The write limiting factor appears to be my piddly Pentium D on my file server. Still, this is 3-4X the speed my old (1TB drives, crappy PCI SATA cards) array was giving me. The read is quite interesting in that the 100% CPU indicates it is probably doing parity checks on every read.
I think a big part of the good speed is my new 8-port SATA card, an Intel PCI-Express x 8 in a x8 slot. If your SCSI card is just PCI, then the PCI MB/s speed limit is what's killing you. Even PCI-X may be limiting. And the Intel card was pretty cheap, under $200.
BTW, I got stuck with two spare SATA card expander cables (1 card port to 4 SATA drives) if anyone wants some cheap. I can get in the Intel cards too, if anyone wants a complete package. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Okay, here's (maybe) the last word on my RAID issues at work, if anyone's still reading these. :)
My RAID5 array is 8x 300 GB 15K RPM U320 drives, 7 active with 1 hot spare md_d1 : active raid5 sdu1[6] sdv1[7](S) sdt1[5] sds1[4] sdr1[3] sdq1[2] sdp1[1] sdo1[0] 1757804544 blocks level 5, 256k chunk, algorithm 2 [7/7] [UUUUUUU]
Just one drive from this array shows: /dev/sdr: Timing cached reads: 1438 MB in 2.00 seconds = 718.54 MB/sec Timing buffered disk reads: 384 MB in 3.00 seconds = 127.97 MB/sec
/srv/d1# dd if=/dev/zero of=bigtestfile bs=2M count=16384 16384+0 records in 16384+0 records out 34359738368 bytes (34 GB) copied, 414.181 s, 83.0 MB/s [writing]
/srv/d1# dd of=/dev/null if=bigtestfile bs=2M 16384+0 records in 16384+0 records out 34359738368 bytes (34 GB) copied, 126.176 s, 272 MB/s [reading]
My RAID10 array is 14x 300 GB 10K RPM U320 drives, 12 active with two hot spares md_d0 : active raid10 sdn1[12](S) sdm1[13](S) sdl1[11] sdk1[10] sdj1[9] sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 1757804544 blocks 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]
Just one drive from this array shows: /dev/sdl: Timing cached reads: 1440 MB in 2.00 seconds = 719.94 MB/sec Timing buffered disk reads: 254 MB in 3.00 seconds = 84.63 MB/sec
/srv/d0# dd if=/dev/zero of=bigtestfile bs=2M count=16384 16384+0 records in 16384+0 records out 34359738368 bytes (34 GB) copied, 280.52 s, 122 MB/s [writing]
/srv/d0# dd of=/dev/null if=bigtestfile bs=2M 16384+0 records in 16384+0 records out 34359738368 bytes (34 GB) copied, 126.134 s, 272 MB/s [reading]
Or one could say that reading is the same speed from either array, and pretty much at the maximum practical for an Ultra-320 bus; writing is faster (for my configuration) by 50% on the RAID10 array as compared to the RAID5 array, despite the RAID5 array sporting drives which are ~50% faster (15K RPM vs. 10K RPM).
Curiously, copying the 'bigtestfile' from d0 to d1 or d1 to d0 results in ~80 MB/s either way. I can't think of an explanation off the top of my head.
There is a lot more testing which could be done, and I'm not saying one configuration is better than the other. I think I may leave it as is for now.
Kevin
On Mon, May 23, 2011 at 5:33 PM, Kevin McGregor kevin.a.mcgregor@gmail.comwrote:
For yet another comparison, my RAID10 6x 750 GB SATA XFS gives: # dd if=/dev/zero of=bigfile bs=1M count=16384 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 79.5334 s, 216 MB/s # dd of=/dev/null if=bigfile bs=1M 16384+0 records in 16384+0 records out 17179869184 bytes (17 GB) copied, 31.4869 s, 546 MB/s
Maybe I should switch to RAID6. ;-)
Kevin
On Fri, May 20, 2011 at 4:54 PM, Trevor Cordes trevor@tecnopolis.cawrote:
My RAID6 8x 2TB-drive SATA XFS gives:
#dd if=/dev/zero of=/new/test bs=1M count=32768 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 152.032 s, 226 MB/s
#dd of=/dev/null if=/new/test bs=1M 32768+0 records in 32768+0 records out 34359738368 bytes (34 GB) copied, 57.7027 s, 595 MB/s (wow!!)
During the whole write time both CPU cores were 90-100%, mostly 95-99%! Glad to see RAID/XFS code is multi-core aware. For reading it was 1 core at 100% and the other around 20%. The write limiting factor appears to be my piddly Pentium D on my file server. Still, this is 3-4X the speed my old (1TB drives, crappy PCI SATA cards) array was giving me. The read is quite interesting in that the 100% CPU indicates it is probably doing parity checks on every read.
I think a big part of the good speed is my new 8-port SATA card, an Intel PCI-Express x 8 in a x8 slot. If your SCSI card is just PCI, then the PCI MB/s speed limit is what's killing you. Even PCI-X may be limiting. And the Intel card was pretty cheap, under $200.
BTW, I got stuck with two spare SATA card expander cables (1 card port to 4 SATA drives) if anyone wants some cheap. I can get in the Intel cards too, if anyone wants a complete package. _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable