I am seeing a problem with Shaw Business Internet that has me puzzled.
I have a Linux server connected to the Shaw DOCSIS cable modem. It has been running without problems until Shaw did some "network upgrades" on Jan 11. Since then the Internet goes away for 1-2 hours every day or so. Initially I thought it was Shaw's problem and they sent techs out twice. They adjusted the signal levels but the problem still persisted. Power cycling the modem does not get the interface working. Doing a network restart on Linux does get it going again. The box has two interfaces and the problem only occurs with the Internet interface. I have swapped the interface without any change. The network interfaces also use different drivers.
When the interface is down Shaw can still connect to the modem. It would appear that the stopping and starting of the interface is a Linux issue.
I remember a discussion on the list a while ago about a similar problem with the thought that it was related to Shaw's DHCP server. I am using a static IP so maybe the problem is something else.
The symptoms would suggest that some network activity is causing the Ethernet to shut down. How it gets going again is not clear. There are no messages in syslog when the interface stops or starts.
My next thing to try is to use something like a Linksys router.
-- Bill
On 19 Feb, Bill Reid wrote:
levels but the problem still persisted. Power cycling the modem does not get the interface working. Doing a network restart on Linux does get it going again. The box has two interfaces and the problem only occurs with the Internet interface.
Yes, this is the exact same problem myself and a few others have reported here. On my end, it's gotten worse. Nearly every day it goes down. It's so bad I wrote a cheesy script run from inittab, see below. Seems to work well.
Even if you're static, you could still be handled by DHCP. So you can't rule that out.
Interesting that you suggest there's a possible linux-side aspect to this... I suppose there could have been some obscure bug introduced into the kernel or dhclient code a few revs ago. I'm on FC5, newest updates. Still, I'm more confident the problem lies with Shaw.
I have 2-5 interfaces on my boxes and, as you said, it's only the external Shaw iface that has the problem.
Maybe if enough of us networking gurus (not your average Shaw customers!) put our complaints to Shaw someone will actually look into this.
#cat /usr/local/sbin/internet-keep-up #!/bin/bash # # hack b/c Shaw's dhcp is farked up and hoses me
# don't thrash on initial boot, easiest to just sleep a long time sleep 300
while : do ping -c 1 -i 3 -w 15 anothershawdomain.ca >/dev/null 2>&1 || { ping -c 1 -i 3 -w 15 130.179.16.8 >/dev/null 2>&1 || { # could try more pings before declaring downage ifdown eth0 >/dev/null 2>&1 sleep 2 ifup eth0 >/dev/null 2>&1 sleep 2 # insert other housekeeping here mail -s 'internet-keep-up: had to restart internet' yourself@you.ca </dev/null >/dev/null 2>&1 } sleep 30 } done
Trevor Cordes wrote:
On 19 Feb, Bill Reid wrote:
Interesting that you suggest there's a possible linux-side aspect to this... I suppose there could have been some obscure bug introduced into the kernel or dhclient code a few revs ago. I'm on FC5, newest updates. Still, I'm more confident the problem lies with Shaw.
I have 2-5 interfaces on my boxes and, as you said, it's only the external Shaw iface that has the problem.
I am also on FC5 with the latest updates.
What is odd is that the problem started immediately after a Shaw network upgrade. Of course I have been unable to get a handle on what the upgrade entailed.
Why I think it is a Linux issue is that the interface restarts by resetting the interface. Since I am not using DHCP nothing is happening on the Shaw side. Of course what causes the interface to shutdown. Because it only happens on the Internet side then I am assuming that Shaw is perhaps doing some kind of polling. For example, perhaps multicast which causes Linux to choke. Why Linux starts up on its own after 1-2 hours is also a puzzle.
I was going to drop in a Dlink router but I think before doing that I will try using wireshark with a ring buffer and using your script I will kill wireshark before restarting the interface. Hopefully this would show the last packet that hit the interface and provide some clue.
Thanks, Bill
On 20 Feb, Bill Reid wrote:
What is odd is that the problem started immediately after a Shaw network upgrade. Of course I have been unable to get a handle on what the upgrade entailed.
Mine started a few months ago. Charleswood area, N of Roblin between Perim and Charleswood Rd. If they did that area a few months ago, and my problem started, then did your area recently, and your problem started, then their upgrade most likely has something to do with this.
Why I think it is a Linux issue is that the interface restarts by resetting the interface. Since I am not using DHCP nothing is happening on the Shaw side. Of course what causes the interface to shutdown. Because it only happens on the Internet side then I am assuming that Shaw is perhaps doing some kind of polling. For example, perhaps multicast which causes Linux to choke. Why Linux
I didn't do too much testing when it went down before. ping failed, but I didn't try pinging my next hop, I don't think.
starts up on its own after 1-2 hours is also a puzzle.
I think mine fixed itself a few times before writing the script. But some days I'd notice it'd been down from 2am-6am or something and I'd wake up and have to manually ifdown/ifup. So I don't think you can rely on it auto-fixing.
Watching how often my script emails me the past couple of weeks, this problem seems to occur quite often!
Thu Feb 8 04:09:05 2007 Thu Feb 8 11:06:32 2007 Fri Feb 9 04:03:12 2007 Fri Feb 9 16:08:18 2007 Mon Feb 12 12:09:24 2007 [ Feb 13, lots of outages, unrelated ] Fri Feb 16 02:02:51 2007 Fri Feb 16 09:55:32 2007 Fri Feb 16 14:53:50 2007 Fri Feb 16 17:05:44 2007 Fri Feb 16 17:06:56 2007 Fri Feb 16 17:08:05 2007 Sat Feb 17 18:09:45 2007 Mon Feb 19 18:59:04 2007 Tue Feb 20 02:42:52 2007
The times (like Feb 16) when it's down for a while is probably other Shaw outage issues. Also, my script may be imperfect in its patience for replies and number of tests.
I was going to drop in a Dlink router but I think before doing that I will try using wireshark with a ring buffer and using your script I will kill wireshark before restarting the interface. Hopefully this would show the last packet that hit the interface and provide some clue.
Keep in mind that routers may have some funky intelligence much like my script and may auto-restart interfaces that appear hung, without ever letting you know. That is why they may appear to solve the problem. XP may do something similar, which would explain why all Shaw customers aren't going crazy with this problem.
I think I will integrate some of Sean's ideas into my script and log it all. I'll post my updated script here. I'll also post the log results (or a link to) once I get another hit.
I have a friend who lives a few blocks from me in the same neck of the woods and he was using a linksys version 5 router and had the similar problems you were experiencing ... he was using dhcp as it's residential -- the problem he was experiencing was that it was terribly slow until he rebooted his router (would stay up fine for a day or two max) -- which leads one to think it's a router thing -- upgrading to current images and checking the settings etc, no change ... but, I guess anything's possible. He stuck in a trendnet router (the tew43*BRP* type of thing) and he has not had problems with his connection.
I run dual linksys routers in parallel off a switch to my shaw modem and have never experienced this problem .. they've all been up for months with no downtime. Not sure if it's relevant or not, but could it be shaw is poking around the services and see who is running what and doing "something" etc etc ...
Just my 2c
Dan.
Bill Reid wrote:
... I remember a discussion on the list a while ago about a similar problem with the thought that it was related to Shaw's DHCP server. I am using a static IP so maybe the problem is something else. The symptoms would suggest that some network activity is causing the Ethernet to shut down. How it gets going again is not clear. There are no messages in syslog when the interface stops or starts. My next thing to try is to use something like a Linksys router.
Dan & Michele wrote:
I have a friend who lives a few blocks from me in the same neck of the woods and he was using a linksys version 5 router and had the similar problems you were experiencing ... he was using dhcp as it's residential -- the problem he was experiencing was that it was terribly slow until he rebooted his router (would stay up fine for a day or two max) --
Probably not the same problem. Network activity is not slow it just stops completely.
-- Bill
Do you have an arp entry for your next hop gw when its down? What's the link status reported by ifconfig? Errors? Tcpdump the interface, do you see anything at all? (Normally you should see a lot of arping, at least on the residential side)
Sean
On 2/19/07, Bill Reid billreid@shaw.ca wrote:
I am seeing a problem with Shaw Business Internet that has me puzzled.
I have a Linux server connected to the Shaw DOCSIS cable modem. It has been running without problems until Shaw did some "network upgrades" on Jan 11. Since then the Internet goes away for 1-2 hours every day or so. Initially I thought it was Shaw's problem and they sent techs out twice. They adjusted the signal levels but the problem still persisted. Power cycling the modem does not get the interface working. Doing a network restart on Linux does get it going again. The box has two interfaces and the problem only occurs with the Internet interface. I have swapped the interface without any change. The network interfaces also use different drivers.
When the interface is down Shaw can still connect to the modem. It would appear that the stopping and starting of the interface is a Linux issue.
I remember a discussion on the list a while ago about a similar problem with the thought that it was related to Shaw's DHCP server. I am using a static IP so maybe the problem is something else.
The symptoms would suggest that some network activity is causing the Ethernet to shut down. How it gets going again is not clear. There are no messages in syslog when the interface stops or starts.
My next thing to try is to use something like a Linksys router.
-- Bill _______________________________________________ Roundtable mailing list Roundtable@muug.mb.ca http://www.muug.mb.ca/mailman/listinfo/roundtable
Sean Walberg wrote:
Do you have an arp entry for your next hop gw when its down? What's the link status reported by ifconfig? Errors? Tcpdump the interface, do you see anything at all? (Normally you should see a lot of arping, at least on the residential side)
The arp table is empty with the entry for the gw incomplete.
ifconfig shows everything normal. no errors.
I can not remember if I tried tcpdump but pings certainly did not work. I think ifconfig showed no activity but I can not be sure.
-- Bill
As per Sean's ideas, here's my updated script that will give some juicy details next time the bug hits. Forgive the perl everywhere, I'm waaay more comfortable in perl than bash.
I'll report back when I get a hit.
internet-keep-up #!/bin/bash # # hack b/c Shaw's dhcp is farked up and hoses me
# don't thrash on initial boot, easiest to just sleep a long time sleep 300
log=/var/log/internet-keep-up.log
while : do
ping -c 1 -i 3 -w 15 anothershawhost.likeyourfriendshouse.ca >/dev/null 2>&1 || {
ping -c 1 -i 3 -w 15 130.179.16.8 >/dev/null 2>&1 || {
date >>$log 2>&1
# dump routing table netstat -rn >>$log 2>&1
# can we ping our next hop? nexthop=$(netstat -rn | perl -ne 'print($1),$f++ if /^0.0.0.0\s+(\S+)/; END { print "127.0.0.1" if !$f }') ping -c 3 -i 3 -w 10 $nexthop >>$log 2>&1
arp |grep $nexthop >>$log 2>&1
ifconfig eth0 >>$log 2>&1
tcpdump -c 5 -i eth0 >>$log 2>&1 & sleep 30 # cheesy kill -- can't remember how to get child pid in bash kill $(ps -ef | grep 'tcpdump -c 5 -i eth0' | grep -v grep | head -1 | perl -pe 's/^\S+\s+(\d+).*/$1/') >>$log 2>&1
ifdown eth0 >/dev/null 2>&1 sleep 2 ifup eth0 >/dev/null 2>&1 sleep 2
mail -s 'internet-keep-up: had to restart internet' you@yourhost.ca </dev/null >/dev/null 2>&1
}
sleep 30
}
done