On 2020-07-31 Hartmut W Sager wrote:
Oops, *sorry Gilbert*, I looked at this thread again, and it was *your* position on checksums that I'm supporting. Maybe I have some bad/failing Chinese capacitors in my head. :)
Your creator should have sprung for the 5c better caps! ;-)
Looks like you guys are right. TCP only has a 16-bit checksum, and it's a simple sum then 1's complement over (most) of the whole packet.
Some post says microsoft says (paraphrased): "Basically transmit 100MB+ over a typical Internet connection and you are very likely to see a silent failure."
I don't know about that! But, yes, even if you get 1 error through every 1GB TCP, that's pretty awful to contemplate.
I guess rsync detects/corrects for this automatically, but unfortunately Gilbert is seeing errors in the ssh wrapper layer, in a place where ssh is sensitive to errors and wants to barf instead of retry. It almost would be better if ssh would just pass up junk to rsync at let it deal with it.
Gilbert, to confirm, your bug hits after you have transferred lots of data, right? It's not giving this error right at the beginning upon connection, right? Do you have any stats on approx how much data goes across each time before the error hits? Is it consistent or all over the map?