On 2016-04-11 Trevor Cordes wrote:
I'm getting extremely bizarre low FS performance.
262144000 bytes (262 MB) copied, 26.1413 s, 10.0 MB/s
My only thoughts now are firmware problems, bios setting problems, or cable problem. I need to go onsite to check all three.
Problem was...
Drum roll please.....
cable!
But a lesson to learn. First, after I looked long and hard at the cable that came with the Intel server board, the one I had used, it probably was only a 3G/s cable. They gave me two weird cables that had 2 SATA cables taped together on each. Since this is a 1 year old board, and no other cables were included, I figured they gave me 6G/s cables. Nah, let's confuse the system builders and give them no 6's with this board!
Second mistake, I had put this 3G/s cable in the 2 6G/s ports. I think the rust drive is too old to be 6G/s, but the SSD is surely 6.
Here's where it gets interesting: I guess SATA doesn't autodetect cable capability, like, say, IDE with 40 vs 80 conductor. I'm not terribly surprised, but still, one would have hoped it would auto-negotiate *the cable capacity*. I know this because SMART confirmed the drive was running in 6G/s mode. Lesson learned
Even weirder was that this even worked at all to produce a relatively stable, non-data-corrupting setup that would give consistent 7-14MB/s speed. It's like it was shooting electrons down at way too high a speed and the odd one would get through ok. I'm sure it must be checksumming/ECC on the SATA bus that was saving the day. Must be robust! I actually find this quite amusing.
Stranger still was the asymmetric wonkiness: my read tests were showing ~400MB/s reads while writes were still ~10MB/s. Huh? Still puzzled on that one. Maybe the drive speaks to the controller with slightly more voltage due to different manufacturing tolerances? Who knows. Or maybe it's some weird effect of the placement of the pair of wires for R vs W, like outside edge of the cable vs inside?
Finally, I think I've guessed why the write test to /boot, which was non-degraded RAID-1 (1 SSD + 1 rust), was faster than the test to just / (1 SSD): dd with fdatasync on top of the RAID-1 layer must have been waiting for the RAID layer to say things were synced, and the RAID layer *must* be satisfied things are synced when only *1* drive completes. That would make sense, though in my mind I would have thought it would demand 2 be synced. I suppose RAID and its superblock updating or whatever is being really smart about this. Just a guess.
End of the day I'm now getting 400MB/s read on big files, and 500MB/s write using the previously discussed tests. Woohoo!