Refractor -- (WebSphere Example)

Figure 1 is a screen shot of the Refractor control panel and shows the response time for a short load test that was run against a Websphere application server. The top chart is a plot of response time by WebSphere component as it processes each URL as measured from the HTTP server. It is displayed as a series of bars per unit of time that show on average what the component of time spent in each tier was during that time slice. In this picture the bars are one second in width and the Web Server response time is shown in green, the Java or WebSphere Application Server (WAS) response time is shown in blue and the database which is DB2 is shown in red. Response time alone isn't a complete characterization of the behavior of the application so we have plotted in the bottom half of the Refractor Panel, the throughput in requests per second at the (WAS). On the left hand side of the panel we see a detaled listing of the processes in each tier. This is the workload that was automatically identified by the Refractor so that the analyst doesn't have to know what PIDs were associated with what function.

Figure 1

In order to better understand the behavior of a tier in the application, we need to look at how the response time is broken down. We also want to identify whether a given tier is scaling and if not, where is the lack of scalability occuring. In Fig 2 we have plotted data specific to Tier 2.

Figure 2

The left hand side of the chart panel shows two charts that address the application scalability as a function of throughput. The upper left is the standard scatter plot of throughput vs. response time. In a load test, the response time will remain fairly constant until some queuing begins to occur in the application. Once queuing starts, response time rises rather sharply as throughput is increased only marginally. This is seen in the WAS at about 200 requests per second. Any attempt to increase throughput above 200 requests per second results in substantially higher response times. The charts on the right hand side of the chart panel are plotted against time and show througput vs. time and the components of the WAS response time vs. time. In the bottom right we see that throughput is increasing as the load is ramped up for the first 100s of the load test. The main component of WAS response time for the first minute is CPU usage and some client latency which is just the time it takes to access DB2 for supporting functions in the application. After the first minute, the DB2 response time increases substantially. At about 150s we see that a new component of response time in the WAS shows up. That is Server Serialization or queuing. This increases dramatically between 150s and 200s. During this timeframe we can see also that the throughput of the WAS actually drops. That is typical of saturation in a server and can be seen in the folding back pattern of the upper left scatter plot. The black line shown is an annotation that we have made to highlight the progression of throughtput vs. response time. Response time doubles from 10ms to 25ms as the throughput increases from 0 to 200 requests per second. This is also clear in the chart of throughput and response time vs time from 0 to 150s. Above 200 requests per second, response time increases exponentially and throughput actually drops. This leads to the foldback curve annoted on the chart.

The second indicator of scalability is the Througput vs CPU shown for the WAS. This is a scatter plot that shows the percent of a CPU used vs. througput by the WAS. To understand this, consider that if a single request costs .5s of CPU time then 2 requests per second should consume 1 CPU, 4 requests per second should consume 2 cpus and so on. Therefore, if the cost per request is consistent across the range of throughputs, this plot should be very close to linear which it is. A chart that tapers down would indicat that there are inefficiencies in higher loading. In this plot, there are two outlier points which are associated with a garbage collection interval that can be seen in the throughput chart at around 120s. The source of the increased response time is both internal queuing as well as increased response time from the database server.

To Learn More about This problem follow the arrow  

To Go Back follow this arrow  




home | company | products | services | downloads | news & events | partners

WHAM Engineering & Software
Austin, Texas: (888) 852-9426    info@wham.com
  Copyright © 2000-2003 WHAM Engineering & Software