Figure 1 is a screen shot of the Resolver control panel and shows the
CPU usage pattern for a WebSphere application server process (java4493).
The top chart is a plot of cpu usage as captured once every ten seconds
which is the typical best resolution of most performance data capture tools
on the market. The bottom chart is a plot of the cpu usage as captured
once per second which is still an order of magnitude from the fastest
rate of capture that the WHAM collection tools offer. Clearly there is a
significant loss of detail at the slower sample rate. The question to be
asked is, does it matter?
Figure 1
Figure 1 shows dips to 100% CPU usage on a regular basis for several
seconds at a time. As it turns out this is an application that is running
garbage collections on a frequent basis because it starved of memory. So
what is that doing to the respose time and throughput of the application?
Figure 2
In Figure 2 we can see that the throughput rate of the application server
as represented by the top graph with a sample rate of once per second, goes
to 0 at the same time as the CPU drops to 100%. This is to be expected during
garbage collection as the JVM is frozen while the garbage collection process
takes place. What we can see in the graph below it which is sampled at a ten
second rate, is that the transaction rate varies between 15 and 30 requests
per second but never dips to 0. This is a result of the averaging effect that
slower rate sampling has on the data. Finally we want to see what this is
doing to the response time and is shown in Figure 3.