MUUG.ca http has been down for the last 3 hours or so (but the host is pingable) I notice the HTTP service has been down more and more lately as well, wondering if thats actually apache being down, or if its a lack of server threads able to handle client requests at that particular time?
Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/
We were actually just talking about this on the board list. I haven't had time to look into it yet, but I think the first place to check will be apache. As far as I know we're just running default worker settings, so there may be some tuning to do.
-- Wyatt Zacharias (mobile)
On 26 Jan 2017 4:56 pm, "Theodore Baschak" theodore@ciscodude.net wrote:
MUUG.ca http has been down for the last 3 hours or so (but the host is pingable) I notice the HTTP service has been down more and more lately as well, wondering if thats actually apache being down, or if its a lack of server threads able to handle client requests at that particular time?
Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/
Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
I try an "apachectl graceful" on the server, and it has no effect. I try "apachectl restart" and it's OK for a few seconds, responding to http requests, but after a while it gets jammed up again. Not sure what's going on, but I have to catch a bus soon...
Gilbert
On 26/01/2017 5:00 PM, Wyatt Zacharias wrote:
We were actually just talking about this on the board list. I haven't had time to look into it yet, but I think the first place to check will be apache. As far as I know we're just running default worker settings, so there may be some tuning to do.
-- Wyatt Zacharias (mobile)
On 26 Jan 2017 4:56 pm, "Theodore Baschak" <theodore@ciscodude.net mailto:theodore@ciscodude.net> wrote:
MUUG.ca http has been down for the last 3 hours or so (but the host is pingable) I notice the HTTP service has been down more and more lately as well, wondering if thats actually apache being down, or if its a lack of server threads able to handle client requests at that particular time?
On 2017-01-26 Gilbert E. Detillieux wrote:
I try an "apachectl graceful" on the server, and it has no effect. I try "apachectl restart" and it's OK for a few seconds, responding to http requests, but after a while it gets jammed up again. Not sure what's going on, but I have to catch a bus soon...
I'll check it out tonight. I'm nearly positive Adam had already tweaked the worker settings because ps shows waaaay more workers than apache usually does by default. We may need even more. There might be load limits being hit too.
I also have some thoughts about using iptables and/or qos (tc) controls to give priority to "local" connections (like shaw, mts, les, uofm). Perhaps even create 3 tiers: local (likely to be muugers/manitobans), normal, and wtf-are-you-using-us (China, etc). No one will be blocked, we'll just give more TCP SYN's and/or egress b/w to the people we are supposed to be serving first. We can discuss here or at a board meeting.
Prioritizing/limiting would also allow us to tune down the load so that the box doesn't become overall useless for everyone (like we're experiencing today, sort of).
Why not just use nginx or some other web server that can handle the higher load? The mirror might be able to be moved over to use it at least..
On 2017-01-26 5:25 PM, Trevor Cordes wrote:
On 2017-01-26 Gilbert E. Detillieux wrote:
I try an "apachectl graceful" on the server, and it has no effect. I try "apachectl restart" and it's OK for a few seconds, responding to http requests, but after a while it gets jammed up again. Not sure what's going on, but I have to catch a bus soon...
I'll check it out tonight. I'm nearly positive Adam had already tweaked the worker settings because ps shows waaaay more workers than apache usually does by default. We may need even more. There might be load limits being hit too.
I also have some thoughts about using iptables and/or qos (tc) controls to give priority to "local" connections (like shaw, mts, les, uofm). Perhaps even create 3 tiers: local (likely to be muugers/manitobans), normal, and wtf-are-you-using-us (China, etc). No one will be blocked, we'll just give more TCP SYN's and/or egress b/w to the people we are supposed to be serving first. We can discuss here or at a board meeting.
Prioritizing/limiting would also allow us to tune down the load so that the box doesn't become overall useless for everyone (like we're experiencing today, sort of). _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable
On 2017-01-26 Robert Keizer wrote:
Why not just use nginx or some other web server that can handle the higher load? The mirror might be able to be moved over to use it at least..
It may not be a load-causes-kernel-not-to-run-things issue but more of a apache-is-set-to-deny-connections-beyond-certain-load issue.
Anyhow, let's move this discussion to [board] with the interested partie's cc'd rather than fill up [RndTbl]. I'll reply there in a second with the relevant cc's.
Also, I need a clear reboot policy for that box. When the auto-updates email me saying one needed a reboot (i.e. kernel), can I reboot? Best time to reboot? Does Adam need to be pre-informed in case it doesn't come back up? etc.
If it's safe, I'd like to have the authority to reboot in the wee hours (like 3am) whenever the box reports a kernel sec update. Right now we're running on a kernel that is about 3 sec updates old, uptime of several months.