ROI on Six Case Studies of WHAM Customers

William R. Sullivan

 

 

 

Abstract

This note presents six case studies of projects that WHAM has undertaken in the last three and one half years that compute ROI as a result of employing performance tuning and ROI in the case of purchasing more hardware.  In the tuning cases the ROI is very high and positive, in the purchase cases the ROI is very negative.

 

Case 1 A Web Application (Aug 1998)

This is a case study of a Web Application Infrastructure that today serves over 1 million requests per day.  At the time of the case it was in pre-production and was intended to support up to 10,000 requests per hour within four months.  The application infrastructure was being deployed in one trial application that serviced Law Schools. The customer is LEXIS-NEXIS and the current application can be accessed at htttp://www.lexis.com.  When this application was originally designed, it was a client-server application that was only accessed through dedicated connections.  In 1997 it was decided to deploy it to the web and so the law students were the first audience.  The initial deployment was completed in late July 98 and almost immediately there were substantial performance problems.  One thing that was clear was that the system (Sun ES6000 with 18x333Mhz  CPUs) on which the application had been deployed (along with quite a few other business critical applications) became over 80% utilized at times during the day.

 

The problems were traced to the application and most disturbing was the fact that the application was only servicing about 13 users at the time of these incidents.  WHAM was engaged to address the solution to this problem.  The initial PO for the engagement was $50k.  The final cost was around $40k.  The initial assessment of the capacity and performance of the application given a baseline measurement of one instance which serviced 1.17 requests per second with a response time of 5.7 seconds and a CPU cost of 3.5s per request.  This worked out to requiring 35,000 CPU seconds per hour to process 10,000 requests per hour but since each instance could only process 4212 requests per hour, at least three would be needed.  The cost of a pair of CPUs on this system was $26,500 and so with 9.72 per hour needed, the cost of that application would be $128,790 in CPUs alone. 

 

The project to improve the performance and capacity of the took three weeks and was based on WHAM measurements and analysis with about 40 man hours of support from various members of the project team (testers and developers).  The results were on the order of 10x improvement.  The final measurement of one application instance produced a throughput of  9.1 requests per second at a CPU cost of .35s per request and a response time of .36s.  The cost of the CPUs needed to support this application at a rate of 10,000 requests per hour is slightly less than one so the CPU costs were cut by an order of magnitude.  The ROI can be calculated as  (Return – Investment)/Investment  which in this case leads us to 221% if we take $45,000 invested which includes both the time of the LEXIS-NEXIS employees and the cost of WHAM services then the ROI is 186% in just three weeks.  That return doesn’t stop there however because that application infrastructure now supports in excess of 1,000,000 requests per day which is ten times greater than what was projected for the initial deployment.  Over the last three years then the ROI for this project would be 1860%.

 

Case 2  DotCom Three Tier Web Application (May 1999)

This was a very simple and short assignment. This customer had been contacted in Feb 99 about using WHAM tools and methodology prior to deploying their new site.  The new site was built using Sun Servers with Solaris 2.6, Vignette StoryServer and Oracle.  The customer contacted WHAM one week before they needed to go live.  The problem was that the system wasn’t performing 1/10th as well as the old system built from CGI scripts and implemented on a four way SGI server.  The initial configuration was a E250 two way with 2GB main memory for the Netscape Enterprise Web Server and the Vignette Story Server with a E450 with 4 CPUs and 4Gb main memory for the database.

 

The customer was in a small crisis when we contacted them in May 99.  They happened to be running a performance test that morning and invited us over to load the WHAM DRM and measure the systems under real load.  This customer had the advantage of a working site with live users that they were able to switch over to the new site in order to test the new site.  Now this isn’t the recommended approach to load testing a new deployment but it was their approach.  They switched the load over and within ten minutes we had enough information to make a good determination of the problem.  A configuration change of the StoryServer was needed in order to reduce memory contention on the system.  The change was a reduction in the number of available StoryServer process instances which resulted in a reduction in response time from 30 seconds down to 3 seconds. 

 

The cost per operation in CPU was reduced by 15% due to eliminating unnecessary memory contention that would not have been addressed by adding memory.  The problem was the result of too many process address spaces sharing the address translation hardware and cache facility across the two CPUs.  With only ten minutes of collected data using the high resolution and additional metrics that only WHAM can collect, we were able to solve a problem that had plagued them for more than a month.  The plan had been to increase the size of the WebServer from a 250 with 2 CPUs to a 450 with 4 CPUs. The data showed us that their mistake was a common one and that more was better was the wrong strategy.  In this case, less was more, that is to say, less processes could do more work in the same time using the same CPUs.  This was a tremendous saving because the Vignette StoryServer was licensed at a cost of $25,000 per CPU so the incremental cost in the hardware would have been $25,000 but the software licensing cost would have been twice that so the total savings was $75,000.  The total cost to the customer for the consulting engagement plus a report informing them how to configure and scale their systems was only $2500. 

 

The conservative ROI on this engagement can be based on 15% of the cost of the two CPUs and the software ($9300).  This would be an ROI of 272% in a time frame of one day.  However, knowledge is persistent which meant they were able to carry this configuration knowledge forward to all of their subsequent purchases which they made as the load on the site grew.  The benefit of two hours was 15% of every two CPUs that they needed for running the web site.  After one year, the total number of CPUs dedicated to the StoryServer and Netscape applications was 12 and now the savings were six times as large which after one year made the ROI  1632%.  This case clearly demonstrates how a little time and money expended can save a lot and keep on saving.

 

Case 3a  DotCom Number 2 (Feb 2000)

This dotcom was initially interested in retaining WHAM to figure out why their front page download times were about four times slower than the rest of the industry as shown by Keynote.  They subscribed to the service and regularly compared their front-page download time of  32 seconds to the average of the top 40 business front pages measured by Keynote at about 8 seconds.  WHAM was hired to investigate and within two days was able to discover the source of the problem and correct it.  The WHAM DRM is able to measure the time taken to send data to the other end of a TCP connection and that is called the send time.  What the data from the DRM indicated was that the send times for this customers web servers were about 4 times as long as the processing time to generate the page.  The customer had been suspecting  their ISP as the source of the slow download times experienced by their customers.

 

Because the customer had Solaris Web Servers, the WHAM consultant decided to make sure that they were properly configured for use on the internet before blaming the ISP.  It turns out that the Solaris systems are configured at the TCP level for best performance on a LAN where packet loss is unlikely, by default.  The long send times on a consistent basis over the WAN were a tip-off that maybe the server was improperly configured.  If the ISP were at fault, there wouldn’t be such consistency in the long send times and the data collected by the WHAM DRM showed that the long send times were a consistent thing rather than an intermittent symptom.  It turned out that the Solaris servers were not properly configured for operation over the internet or with Windows clients.  When the configuration parameters were changed to comply with operation over the WAN and to suit Windows clients, the front page download time was cut to an average of 15 seconds.  Now there is no particular ROI here but this problem had been dogging this company for about six months.

 

Case 3b DotCom Number 2 (July 2000)

In this instance the customer had been so pleased with the initial results obtained by using the WHAM DRM that they had purchased it for use on all their servers.  Here is a description from the customer of this particular problem and it’s solution.

 

This dotcom environment initially consisted of a Window NT system for the commerce site and Vignette Story application running on Solaris/Oracle systems for the content site. An integrated system was developed so that all functions would be served by a single platform. The target environment was two dual-processor E250s providing web services and a SUN E4500 equipped with six processors, running Oracle 8I, as the database server. Initial calculations indicated this was sufficient capacity for the combined functions. Some stress testing was performed in a scale-reduced test environment prior to a Friday evening deployment and the application performed acceptably during the weekend.

 

As the load increased to the peak during the first business day, performance deteriorated below the acceptable level. CPU occupancy was high and user sessions were timing out. Performance returned to normal as traffic dissipated. The initial assessment was that additional database server hardware would be needed to handle the peak conditions.

 

Adding two processors to the E4500 would cost $40,000 and require a week to arrive. However, that was much less than the incremental additional database license fee. At that time, Oracle’s license for Internet databases was based on a power unit – the number of processors times their clock rate (in megahertz). Two additional processors would have cost an additional $120,000 plus $40,000 maintenance. A little due diligence was in order before spending $200,000.

 

The WHAM DRM tool was already installed, but the data had not yet been analyzed in the frenzy of initial deployment. With WHAM’s assistance, the database was diagnosed as actually performing quite well. A key measurement, transaction latency, indicated that the server was responding to queries from the web servers well within acceptable limits. Furthermore, the number of idle database connections was observed to grow over time, leading to some of the high CPU occupancy observed on the database.  Eliminating the idle connections took care of the CPU issues but didn’t change the database transaction latency.  Time to look at the web servers.

 

In the Story Server architecture, a master process receives responses from the database server and places them in queue for the appropriate slave process. Analysis of the performance data from the StoryServer hosts indicated that the master process was losing the response under load. The vendor was alerted and presented with the data; they quickly provided a patch. The system began to perform as designed.

 

The cost here was $25,000 for the WHAM DRM on all of the servers and the savings was $200,000 so the ROI was 700%.  This problem took about two weeks to resolve from the first analysis until the patch was delivered by the vendor.

 

Case 4 A Websphere Deployment  (July 2000 – April 2001)

This case also involves the customer in a previous case (LEXIS-NEXIS) but in a situation where they are now interested in evaluating the claims of the marketers of Web based application software development environments such as WebLogic and Webshpere.   One of the touted benefits is improved time to market through the reduction of programming costs via the use of pre-programmed and reusable components such as Enterprise Java Beans or EJBs.  In order to determine the viability of a product that makes such claims to be able to also support 1,000,000 requests per day, a prototype was developed.  WHAM has participated in the development of this prototype in the hardware evaluation phase as well as the software evaluation phase through the provision of both products and services.  This case is a little more nebulous than the first as it was an eight month process and involved many different people and organizations.  The initial configuration required to service a specified number of simultaneous transactions which is based on current production measurements was $1,625,000 for the combination of hardware and software licensing.

 

Through a process of measurement and analysis and working with the vendor to get the Webshpere flaws ironed out as well as improvements to the prototype applications, the ultimate cost of the configuration was only $200,000.  The savings was $1.42m which is substantial and only represents the needs of one the current production systems.  As time goes on, the gift of performance optimization continues to give which results in an increasing ROI over time.  The estimated cost of the consulting, tuning of the prototypes and purchase of software for the purpose of tuning is $250,000 for this case.  Hence the ROI is 468% realized over a period of eight months.  When this software is placed into production the ROI will continue to grow from the initial tuning as more transactions are completed at this lower cost.

 

Case 5 A Merchandise Planning Application (Jun 2001 – Nov 2001)

This application from i2 is named TradeMatrix and is implemented by our customer to optimize the purchase of retail stock.  The customer is a Fortune 100 retailer.  The customer was planning to have up to 400 users of the software running on a single IBM s80.  The initial system was a 6 CPU system with 12GB of main memory.  The application was to be used during the day by financial planners and buyers at various corporate levels.  Each user accesses part of the overall plan for their input and modification during the working day.  At the end of the day the database is updated, reorganized and the batch jobs that consolidate the days changes are run.  The entire plan with changes is committed and reanalyzed then stored back out to the database prior to the start of the next day.  The first problem encountered with the application was that it wasn’t able to complete the entire plan consolidation overnight.  This became clear from running the batch portion on a smaller part of the plan than for the scope including all of the users. 

 

The DRM was used to measure the behavior of the application.  It was determined from the additional metrics only available through the DRM, that the application had an implementation flaw causing it to run slowly.  This flaw was evident from the data and when presented to the i2 development team, allowed them to identify and eliminate the source of the slowness.  The application was corrected in a subsequent release and the execution time of the plan consolidation was reduced by 80%.

 

The next phase of the project dealt with sizing the systems for up to 400 users with varying computational needs.  Prior to having the WHAM DRM, the IBM development team had been using the available system tools and BMC Patrol Perform for AIX.  The sizing that the development team had determined necessary for full deployment was three s85s with 24 CPUs each.  One s80 would be used for the database, one would be used for the planning application and the third would serve as a backup and development system.  The quote for the systems was $2.2m.  Using information provided by the business areas about the workload estimates for each different user type and WHAM DRM measurements, the Capacity Planning team came to a different conclusion.   The Capacity Planning team was able to confidently suggest that the current two systems could be upgraded to 12 CPUs from 6 and 24GB from 6GB.  An additional offline 6 CPUs per system were purchased that will be charged for if needed.  The total cost of the upgrade was $220,000.  The saving in this case was $1,980,000 and the cost of the tools, consulting and analysis as well as customer time was $100,000.  The ROI here is 1880% and the time frame of the project was four months.  This customer has already purchased additional licenses and is ready to use the same approach on the next corporate project that is intended to leverage IT as a productivity enhancer for the corporation.

 

Case 6 Hardware Purchase with Negative ROI (Feb 2001)

This case represents a customer that decided to purchase hardware as a result of relying on the measurement tools available on the system (vmstat).  The customer had informed WHAM that they wished to engage WHAM to come measure their systems after the purchase of an additional 12 CPUs per system (ES6500).  The purpose of the engagement was to determine how much spare capacity was available.  WHAM asked the customer how they knew that CPU capacity was the problem.  The customer responded that the users complained of slowness in response time when more than three of them tried to use the system at that same time.  The systems had 14 CPUs originally with 29GB of main memory (the applications were fairly CPU intensive and muli-threaded).  The system administrators had used vmstat to find that the CPU utilization of the system was 100% during the times that the users were complaining.  In the customer’s mind this was obviously a problem that could be solved with more CPUs so the request for $500,000 was made and approved for purchase.

 

Two months later WHAM was invited in to measure the systems and the results were rather astounding.  In this case we are going to show the three charts that we presented to management at the completion of the one week engagement.  In  Figure 1 CPU Cost  and Execution Time vs. # of CPUs, we plot the speedup in execution time for a specific processing request vs. the Number of CPUs used to process the request.    This is important to note because each user of the system was allowed to select the number of CPUs that they wanted to use in processing their request.  All of the users were selecting 10 CPUs by default.  Looking at this chart we can see that our measurements of the actual application behavior show that using 10 CPUs only resulted in a reduction in computing time of about 12 minutes vs using 4 CPUs.  However, the total CPU seconds used by 10 CPUs is slightly more than twice that used by one CPU and about 1.9 times as much as for four CPUs.  An additional noteworthy fact that can be drawn from Figure 1 CPU Cost  and Execution Time vs. # of CPUs, is that the optimum choice for number of processors is 4.  This is the intersection of Execution Time vs. # CPUs and Normalized Cost vs #CPUs.  This intersection represents the greatest speedup for the least cost, going either up or down in number of CPUs will cost more CPU time without commensurate execution time reduction or reduce the possible execution time reduction that could be obtained by using extra CPUs.

 

Figure 1 CPU Cost  and Execution Time vs. # of CPUs

 

This study showed us that the application was not effectively making use of more than four CPUs and further study showed us that there was a point in the processing where the application scalability broke down.  We can see from Figure 2 Speedup Per Iteration, that the fifth and sixth iterations were where the application scalability broke down badly.  We were able to identify the iterations using additional measurements provided by the WHAM DRM but which are not provided by the operating system.  This application always uses at least four iterations and often times as many as twenty.  So it was clear that purchasing extra CPUs wouldn’t speed up the application if more than four were employed.  Furthermore, since there were only about three users on the system at any given time and there were fourteen CPUs, there was plenty of capacity to spare if users restricted their use of the system to only four CPUs.  Finally, notice from Figure 3 Normalized Cost  Per Iteration vs. # CPUs that the     CPU cost per iteration past the fourth was increasing non-linearly with each CPU added to they system.  Hence, the addition of CPUs was a negative factor for the user experience.

 

Figure 2 Speedup Per Iteration

 

Figure 3 Normalized Cost  Per Iteration vs. # CPUs

 

In computing the ROI of the CPU purchase here we have no return on a $500,000 expenditure.  Hence, the ROI is –100%.  This is a complete loss and not what one would expect when a purchase of new hardware is made.  Now there is always the argument that the budget was available or that the workload will eventually grow to meet the capacity but these are both specious arguments used to cover up mistakes by those that made the mistake.  In the case of the first argument, the budget could have been better used somewhere else or also, not spending it would add to the bottom line profit of the organization.  In the second case, current measurements indicate that there are still only three to five users in the current system and that the hardware wouldn’t have been needed before 2002 at the earliest.  Purchasing a rapidly depreciating item such as a computer without fully utilizing it is simply writing down the unused portion as a complete loss.