MUUG http down last 3 hours

List overview All Threads
Download

newer

older

Re: [RndTbl] new ECC computer...

weird mts latency issue?

Theodore Baschak

26 Jan 2017 26 Jan '17

10:46 p.m.

MUUG.ca http has been down for the last 3 hours or so (but the host is pingable) I notice the HTTP service has been down more and more lately as well, wondering if thats actually apache being down, or if its a lack of server threads able to handle client requests at that particular time?

Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/

Attachments:

attachment.html (text/html — 909 bytes)

Show replies by date

Wyatt Zacharias

26 Jan 26 Jan

11 p.m.

We were actually just talking about this on the board list. I haven't had time to look into it yet, but I think the first place to check will be apache. As far as I know we're just running default worker settings, so there may be some tuning to do.

-- Wyatt Zacharias (mobile)

On 26 Jan 2017 4:56 pm, "Theodore Baschak" theodore@ciscodude.net wrote:

...

MUUG.ca http has been down for the last 3 hours or so (but the host is pingable) I notice the HTTP service has been down more and more lately as well, wondering if thats actually apache being down, or if its a lack of server threads able to handle client requests at that particular time?

Theodore Baschak - AS395089 - Hextet Systems https://ciscodude.net/ - https://hextet.systems/ http://mbix.ca/

Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable

Gilbert E. Detillieux

11:08 p.m.

I try an "apachectl graceful" on the server, and it has no effect. I try "apachectl restart" and it's OK for a few seconds, responding to http requests, but after a while it gets jammed up again. Not sure what's going on, but I have to catch a bus soon...

Gilbert

On 26/01/2017 5:00 PM, Wyatt Zacharias wrote:

...

We were actually just talking about this on the board list. I haven't had time to look into it yet, but I think the first place to check will be apache. As far as I know we're just running default worker settings, so there may be some tuning to do.

-- Wyatt Zacharias (mobile)

On 26 Jan 2017 4:56 pm, "Theodore Baschak" <theodore@ciscodude.net mailto:theodore@ciscodude.net> wrote:
MUUG.ca http has been down for the last 3 hours or so (but the host
is pingable)
I notice the HTTP service has been down more and more lately as
well, wondering if thats actually apache being down, or if its a
lack of server threads able to handle client requests at that
particular time?

-- Gilbert E. Detillieux E-mail: gedetil@cs.umanitoba.ca Dept. of Computer Science Web: http://www.cs.umanitoba.ca/~gedetil/ University of Manitoba Phone: (204)474-8161 Winnipeg MB CANADA R3T 2N2 Fax: (204)474-7609

Trevor Cordes

11:25 p.m.

On 2017-01-26 Gilbert E. Detillieux wrote:

...

I try an "apachectl graceful" on the server, and it has no effect. I try "apachectl restart" and it's OK for a few seconds, responding to http requests, but after a while it gets jammed up again. Not sure what's going on, but I have to catch a bus soon...

I'll check it out tonight. I'm nearly positive Adam had already tweaked the worker settings because ps shows waaaay more workers than apache usually does by default. We may need even more. There might be load limits being hit too.

I also have some thoughts about using iptables and/or qos (tc) controls to give priority to "local" connections (like shaw, mts, les, uofm). Perhaps even create 3 tiers: local (likely to be muugers/manitobans), normal, and wtf-are-you-using-us (China, etc). No one will be blocked, we'll just give more TCP SYN's and/or egress b/w to the people we are supposed to be serving first. We can discuss here or at a board meeting.

Prioritizing/limiting would also allow us to tune down the load so that the box doesn't become overall useless for everyone (like we're experiencing today, sort of).

Robert Keizer

11:38 p.m.

Why not just use nginx or some other web server that can handle the higher load? The mirror might be able to be moved over to use it at least..

On 2017-01-26 5:25 PM, Trevor Cordes wrote:

...

On 2017-01-26 Gilbert E. Detillieux wrote:

...
I try an "apachectl graceful" on the server, and it has no effect. I try "apachectl restart" and it's OK for a few seconds, responding to http requests, but after a while it gets jammed up again. Not sure what's going on, but I have to catch a bus soon...

I'll check it out tonight. I'm nearly positive Adam had already tweaked the worker settings because ps shows waaaay more workers than apache usually does by default. We may need even more. There might be load limits being hit too.

I also have some thoughts about using iptables and/or qos (tc) controls to give priority to "local" connections (like shaw, mts, les, uofm). Perhaps even create 3 tiers: local (likely to be muugers/manitobans), normal, and wtf-are-you-using-us (China, etc). No one will be blocked, we'll just give more TCP SYN's and/or egress b/w to the people we are supposed to be serving first. We can discuss here or at a board meeting.

Prioritizing/limiting would also allow us to tune down the load so that the box doesn't become overall useless for everyone (like we're experiencing today, sort of). _______________________________________________ Roundtable mailing list Roundtable@muug.ca https://muug.ca/mailman/listinfo/roundtable

Trevor Cordes

27 Jan 27 Jan

12:07 a.m.

On 2017-01-26 Robert Keizer wrote:

...

Why not just use nginx or some other web server that can handle the higher load? The mirror might be able to be moved over to use it at least..

It may not be a load-causes-kernel-not-to-run-things issue but more of a apache-is-set-to-deny-connections-beyond-certain-load issue.

Anyhow, let's move this discussion to [board] with the interested partie's cc'd rather than fill up [RndTbl]. I'll reply there in a second with the relevant cc's.

Trevor Cordes

26 Jan 26 Jan

11:28 p.m.

Also, I need a clear reboot policy for that box. When the auto-updates email me saying one needed a reboot (i.e. kernel), can I reboot? Best time to reboot? Does Adam need to be pre-informed in case it doesn't come back up? etc.

If it's safe, I'd like to have the authority to reboot in the wee hours (like 3am) whenever the box reports a kernel sec update. Right now we're running on a kernel that is about 3 sec updates old, uptime of several months.

2988

Age (days ago)

2989

Last active (days ago)

roundtable@muug.ca

6 comments

5 participants

tags (0)

participants (5)

Gilbert E. Detillieux
Robert Keizer
Theodore Baschak
Trevor Cordes
Wyatt Zacharias