It may be worth having a look at the switch interface statistics for that server. Such repeatable errors must be showing up somewhere other than the servers. Also, can you replicate the problem locally?
Sent from my phone.
On Aug 4, 2020, at 12:55, Adam Thompson athompso@athompso.net wrote:
It's bizarre that you're getting this so consistently - the FCS plus the checksum between them should trigger TCP retries long before the errors percolate up to the application layer. I can't remember, did you try disabling HW offload on both sending and receiving ends already? (Either end could trigger the SSH abort.) -Adam
On 2020-08-04 11:14, Gilbert E. Detillieux wrote: Just FYI, I had an rsync fail just a while ago about 17 seconds (and 253MB) into a file transfer. Same error on the remote side. After restarting, it died again, this time after 3m07s (and 2.6GB). So, it is fairly random, yet consistent! :P Gilbert
On 2020-08-04 9:56 a.m., Gilbert E. Detillieux wrote: On 2020-07-31 11:42 p.m., Trevor Cordes wrote:
Gilbert, to confirm, your bug hits after you have transferred lots of data, right? It's not giving this error right at the beginning upon connection, right? Do you have any stats on approx how much data goes across each time before the error hits? Is it consistent or all over the map?
Yes, these are on large file transfers. Yes, they are occasional, and random. But I had to restart a large (41-ish GB?) file transfer 3 times last week due to repeated errors. A typical nightly backup results in 160 GB or so to transfer, and lately, the rsync fails (somewhere along the way) more often than not. Gilbert
Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable