On 20 Feb, Bill Reid wrote:
What is odd is that the problem started immediately after a Shaw network upgrade. Of course I have been unable to get a handle on what the upgrade entailed.
Mine started a few months ago. Charleswood area, N of Roblin between Perim and Charleswood Rd. If they did that area a few months ago, and my problem started, then did your area recently, and your problem started, then their upgrade most likely has something to do with this.
Why I think it is a Linux issue is that the interface restarts by resetting the interface. Since I am not using DHCP nothing is happening on the Shaw side. Of course what causes the interface to shutdown. Because it only happens on the Internet side then I am assuming that Shaw is perhaps doing some kind of polling. For example, perhaps multicast which causes Linux to choke. Why Linux
I didn't do too much testing when it went down before. ping failed, but I didn't try pinging my next hop, I don't think.
starts up on its own after 1-2 hours is also a puzzle.
I think mine fixed itself a few times before writing the script. But some days I'd notice it'd been down from 2am-6am or something and I'd wake up and have to manually ifdown/ifup. So I don't think you can rely on it auto-fixing.
Watching how often my script emails me the past couple of weeks, this problem seems to occur quite often!
Thu Feb 8 04:09:05 2007 Thu Feb 8 11:06:32 2007 Fri Feb 9 04:03:12 2007 Fri Feb 9 16:08:18 2007 Mon Feb 12 12:09:24 2007 [ Feb 13, lots of outages, unrelated ] Fri Feb 16 02:02:51 2007 Fri Feb 16 09:55:32 2007 Fri Feb 16 14:53:50 2007 Fri Feb 16 17:05:44 2007 Fri Feb 16 17:06:56 2007 Fri Feb 16 17:08:05 2007 Sat Feb 17 18:09:45 2007 Mon Feb 19 18:59:04 2007 Tue Feb 20 02:42:52 2007
The times (like Feb 16) when it's down for a while is probably other Shaw outage issues. Also, my script may be imperfect in its patience for replies and number of tests.
I was going to drop in a Dlink router but I think before doing that I will try using wireshark with a ring buffer and using your script I will kill wireshark before restarting the interface. Hopefully this would show the last packet that hit the interface and provide some clue.
Keep in mind that routers may have some funky intelligence much like my script and may auto-restart interfaces that appear hung, without ever letting you know. That is why they may appear to solve the problem. XP may do something similar, which would explain why all Shaw customers aren't going crazy with this problem.
I think I will integrate some of Sean's ideas into my script and log it all. I'll post my updated script here. I'll also post the log results (or a link to) once I get another hit.