Accurate Monitoring of system load

List overview All Threads
Download

newer

older

MTS SMTP

(no subject)

John Lange

16 Dec 2005 16 Dec '05

5:33 p.m.

Does anyone have any suggestions for accurate monitoring of system load on web servers?

In the situation where you have a server which is hosting multiple applications I've found it next to impossible to determine with any accuracy what applications or pages within those applications are causing load.

At any given moment its easy to see what is using CPU and memory. But in the case of a web server these are brief spikes that don't mean anything. Whats needed is a way to see what has caused the most load over a given time period (say the last 10 minutes).

Does anyone have any suggestions for tools that can help determine load.

-- John Lange OpenIT ltd. www.Open-IT.ca (204) 885 0872 VoIP, Web services, Linux Consulting, Server Co-Location

Show replies by date

Tim Lavoie

16 Dec 16 Dec

6:02 p.m.

...

...
...
...
...
"John" == John Lange john.lange@open-it.ca writes:

John> Does anyone have any suggestions for accurate monitoring of John> system load on web servers?

John> In the situation where you have a server which is hosting John> multiple applications I've found it next to impossible to John> determine with any accuracy what applications or pages John> within those applications are causing load.

John> At any given moment its easy to see what is using CPU and John> memory. But in the case of a web server these are brief John> spikes that don't mean anything. Whats needed is a way to John> see what has caused the most load over a given time period John> (say the last 10 minutes).

John> Does anyone have any suggestions for tools that can help John> determine load.

Hey John,

There are a couple of things which you can do to figure out what is going on. I tend to use a couple of additional Apache log directives to get more info for each request. Adding "%O %I %D" gets you bytes out, in (say, for big posts) and the request duration in microseconds. This gives you enough to run an analysis script, or just use the Mark I Eyeball when things have been acting up. It might also be useful to log the query string as well, but be careful if you have apps which pass things like passwords in query parameters; you don't want those floating around in your logs.

If possible, benchmarking your app or overall site is a helpful exercise, especially if the tool used allows you to simulate multiple user sessions. LoadRunner is a commercial (and pricy) tool to do this, but OpenSTA looks like a reasonable, free alternative. Running a bunch of concurrent users will help find areas which work OK in development, but which don't scale well.

Cheers, Tim

John Lange

6:16 p.m.

On Fri, 2005-12-16 at 12:02 -0600, Tim Lavoie wrote:

...

There are a couple of things which you can do to figure out what is going on. I tend to use a couple of additional Apache log directives to get more info for each request. Adding "%O %I %D" gets you bytes out, in (say, for big posts) and the request duration in microseconds. This gives you enough to run an analysis script, or just use the Mark I Eyeball when things have been acting up. It might also be useful to log the query string as well, but be careful if you have apps which pass things like passwords in query parameters; you don't want those floating around in your logs.

There is a tool called apache-top which tails the log files and gives you something similar to top. However, this is far from the ideal solution when you have a web server with 100 sites on it, each with its own log file.

It also doesn't tell you anything if the load is in a mySQL query. The web server process can be idle but waiting for a mySQL query.

...

If possible, benchmarking your app or overall site is a helpful exercise, especially if the tool used allows you to simulate multiple user sessions. LoadRunner is a commercial (and pricy) tool to do this, but OpenSTA looks like a reasonable, free alternative. Running a bunch of concurrent users will help find areas which work OK in development, but which don't scale well.

BTW, A great load testing tool is:

http://jakarta.apache.org/jmeter/

But load testing isn't what I need. I don't care how much load the applications cause when stressed, what I need to know is which applications are are using the most resources in real-life.

By way of a better example let me give more detail.

Lets say you have a server with 100 clients on it. Some of them busy, some of them not.

One site may have a very load intensive application on it but that only gets 1 hit every 10 minutes.

Another site could be getting 1 hit per second but the application is very light.

The only thing you can tell is that the server is under load. There is next to no way to determine who is responsible. Just because a site is getting more traffic doesn't make it load intensive and vs. versa.

What you need is a tool like top but that keeps history and then shows average load over some given time span.

Regards,

-- John Lange OpenIT ltd. www.Open-IT.ca (204) 885 0872 VoIP, Web services, Linux Consulting, Server Co-Location

Tim Lavoie

6:48 p.m.

How about something like atop, with process accounting enabled? You might need to enable this in the kernel if it isn't already, but should track all the info you might want. If you don't run it interactively, atop spits out flat ASCII without looking for the interactive commands.

There are also a couple of kernel patch options specific to atop, which get you top-style info for disk and network usage on a per-process level.

About JMeter, it looks a bit awkward for the sorts of benchmarking I end up doing, as you configure all requests pretty much manually. The other tools I mentioned are nice for dealing with a complex app, in that they record a session by proxying your browser requests. The script is something you edit afterwards to get it to capture elements at run-time, which you play back as part of the script. As an example, you might get an ID for some value during the session, and you want to use that one, not the one you recorded initially. Naturally, this is only if you need to do this sort of thing.

Tim

Stuart Williams

7:16 p.m.

...

...
...
...
...
John Lange writes:

Subject: [RndTbl] Accurate Monitoring of system load

...

Does anyone have any suggestions for accurate monitoring of system load on web servers?

I've found sar/sadc/sysstat to be very helpful (on Solaris 8 years ago). It's nice that sadc records the data, not just the output of reports, so you can change your reporting focus later.

Stuart.

Theodore

7:25 p.m.

Not sure what kind of application you are trying to monitor, however if its PHP there are some excellent profilers out there. I used one once to figure out that 99% of my buddies code execution time was spent in dns lookups. I forget which one it was, I did an apt-cache search php profiler to find it anyways.

Theodore Baschak

John Lange wrote:

...

Does anyone have any suggestions for accurate monitoring of system load on web servers?

In the situation where you have a server which is hosting multiple applications I've found it next to impossible to determine with any accuracy what applications or pages within those applications are causing load.

At any given moment its easy to see what is using CPU and memory. But in the case of a web server these are brief spikes that don't mean anything. Whats needed is a way to see what has caused the most load over a given time period (say the last 10 minutes).

Does anyone have any suggestions for tools that can help determine load.

7171

Age (days ago)

7171

Last active (days ago)

roundtable@muug.ca

5 comments

4 participants

tags (0)

participants (4)

John Lange
Stuart Williams
Theodore
Tim Lavoie