I don't run any heavy workload such as this one, but I would not be surprised if that fixes your problem for good. I had quite a few issues with Intel NICs on Linux - especially with the 3.x kernel - that were solved by disabling GSO/RSO/TSO.
Kind regards,
Alberto Abrao 204-202-1778 204-558-6886 www.abrao.net
On 2020-08-10 10:57 a.m., Gilbert E. Detillieux wrote:
On 2020-08-04 1:28 p.m., Gilbert E. Detillieux wrote:
On 2020-08-04 12:55 p.m., Adam Thompson wrote:
I can't remember, did you try disabling HW offload on both sending and receiving ends already? (Either end could trigger the SSH abort.)
I hadn't yet. (I was trying a few other things first, such as changing MAC algorithms, and rebooting with the older kernel, neither of which seemed to affect things.)
I've now disabled both rx and tx checksum offloading. We'll see if that makes a difference.
So, after almost 6 days running with rx and tx checksum offloading disabled, not a single "Corrupted MAC on input" error! My overnight rsync now runs to completion.
I hope this isn't premature, but I think we found the problem! (Who would have thought it could make such a difference?!)
It also doesn't seem to have caused a noticeable performance hit. I'm thinking we were disk I/O bound on the remote (receiving) end, anyway, so if the network I/O is a bit slower, we wouldn't see it.
Thanks, everyone, for your suggestions!
Gilbert