WebSphere Commerce v7 performance with with WebSphere eXtreme Scale


WebSphere Commerce v7 performance with eXtreme Scale



Introduction

WebSphere Commerce sites heavily leverage dynacache to reduce database roundtrips, and thus gain an important performance boost. WebSphere Commerce uses dynacache to cache...

Dynacache is an in-process cache. Each instance of a cluster contains its own instance of dynacache, including copies of each cache entry. Each WebSphere Commerce instance contains its own cache within its JVM.

Each instance of dynacache contains the same, although not necessarily identical (more on this later), cache entries. Typically, the dynacache-based application server topology has to keep the many dynacache instances in synch. If cache entry "a" in the figure above is invalidated by one of the application servers, dynacache informs the other cluster members of the invalidation (failure to do so leaves many shoppers viewing stale data) by dispatching an invalidation message via the WAS Domain Replication Service (DRS).

WebSphere eXtreme Scale is a distributed shared cache technology. The eXtreme Scale-based topology has a single logical instance of the cache that is shared among commerce servers. Since this cache is shared across servers, unlike Dynacache, multiple copies of the same pages and fragments are not needed for each. Instead, a single cache instance is created on the first request for that page or fragment, and is then available to all commerce servers sharing the cache.

The following sections discuss how eXtreme Scale can improve performance for key scenarios frequently experienced by high-volume commerce customers. We will discuss how eXtreme Scale can potentially reduce the impact of a full or partial site restart for end users. We will also discuss the cache invalidation scenarios needed by retailers to reflect catalog updates, and consider how eXtreme Scale potentially improves the end user experience during these events.


WebSphere Commerce and WebSphere eXtreme Scale integration

eXtreme Scale V7.0 supports many caching topologies. However, not all of them are supported by Feature Pack 1. WebSphere Commerce support is focused on the eXtreme Scale dynacache plugin component as a configuration choice for commerce customers. eXtreme Scale is not included as part of Feature Pack 1.

Type of topology supported by Feature Pack 1:


Optimizing site recovery time

WebSphere Commerce sites, as with most IT assets, may need to take components offline periodically for maintenance. Likewise, the site may want to keep components in reserve for failover or peak events when the site might need additional capacity.

Bringing cold components into a working site presents an interesting challenge when caching is involved. The component (typically an LPAR, JVM, or application) performs optimally when the cache is warm. However, if the component was offline, the cache may have been lost entirely or become significantly stale.

This requires the component to work harder until the cache reaches its optimal state. During this period, the component may not support its full capacity as it uses extra resource, such as mid-tier CPU and database capacity, to generate responses that will eventually reside in the cache.

Cache warm-up after a cold-start is particularly problematic if the component is immediately thrust into a heavy workload requiring the component to respond at full capacity. For example, if a JVM restarted during a Black Friday sales event, it becomes active in a farm handling a peak load for the year.


Improving startup performance under heavy load

Restarting parts of a commerce site under heavy load presents special challenges. Without a populated cache to boost the site performance, end users who are sent to the newly started portion of the site might experience response time degradation. Also, the mid-tier CPU and the database would see increased utilization.

It is important for the retail site to rebuild the cache and bring site response times and utilization to a steady state quickly.

The key difference between a dynacache-based topology and the eXtreme Scale-based topology is that the latter has only a single cache instance that is populated by multiple application server JVMs. WebSphere Commerce using eXtreme Scale means the site shares a single cache, reducing the start up times.

In this example, the team tested an extreme situation: Restarting the entire site prior to releasing a full load against it. This simulates some failover scenarios involving active-passive datacenter concepts.

The figure below shows the data comparing a traffic surge to a newly-started site using commerce with traditional dynacache versus commerce with the new eXtreme Scale central caching capability. The red arrow indicates the point where a steady state is considered to be reached. Spikes in I/O are produced by garbage collection of cached objects stored in the disk offload file.

The figure below shows CPU utilization on the commerce machine for a restart scenario with eXtreme Scale. The disk I/O activity is at a low level with minor spikes due to application server logging. The red arrow indicates the point where steady state is considered to be reached.

Overall in our cold start tests, we observed that with eXtreme Scale, the commerce site tends to reach steady state in about 60% of the time required for dynacache - almost twice as fast! In our tests, we used four commerce cluster members. A larger commerce cluster would likely see a bigger benefit.

Improvement in average response time is another notable benefit of employing eXtreme Scale to help with the cold start situation. This is shown in the figure below.

The eXtreme Scale topology shows less statistical scatter in response times, and overall consistently faster response times. In our testing, we observed up to 25% improvement in average response times. This produces a better end user experience for the shoppers and will tend to improve revenue (due to fewer shoppers leaving the site) during major high-volume events.


Improving startup performance under light load and load ramp-up

For a system with load, when one of the JVMs is shut down for maintenance, two important things happen:

Once the JVM is restarted after maintenance, the HTTP plug-in will route a proportion of requests to that JVM. The cache instance for that JVM is cold (not populated with cache entries). Given that the restart occurs under lighter load conditions, the system has sufficient resources to operate until the cache is warm. However, a proportion of shoppers whose requests are directed to this JVM by the HTTP plug-in may still experience slower response times until the cache is warm.

The figure below shows CPU and I/O utilization for the commerce server for both dynacache-based and eXtreme Scale-based topologies. For the commerce server running dynacache-based topology, you can clearly see a CPU and disk utilization disturbance. The commerce server running the eXtreme Scale topology remains virtually undisturbed by the JVM restart.

The response time curve for the eXtreme Scale-based topology is undisturbed. For the eXtreme Scale topology, all JVMs share a single cache instance that remains undisturbed during JVM restart. eXtreme Scale, therefore, brings significant benefit during a warm restart. Shoppers get a superior end user experience, and retailers are consequently able to realize higher revenue during operational procedures.


Optimizing invalidation performance

Another important set of scenarios reflects the fact that many web retailers need to periodically invalidate the contents of their cache. Caches need to be regularly invalidated for a number of reasons. The most common of these are:

These cache invalidation scenarios generally fall into two categories:


Invalidating the entire cache

The figure below shows a comparison of I/O rate on the database server for both the dynacache-based and eXtreme Scale-based topologies. Database I/O rate is an important indicator of overall site performance. As the cache is completely invalidated, all customer requests are sent to the database server for processing. The database server can bottleneck on disk I/O, resulting in reduced throughput and poor response times.

When the cache is invalidated, the database I/O rate for both topologies rapidly increases to a high level. I/O rate for the eXtreme Scale-based topology recovers faster and settles at a lower level than for the dynacache-based topology. In the case of the eXtreme Scale topology, the single cache instance is populated by several JVMs. Once a particular cache entry is refreshed by any cluster member, then it is immediately available to all of the other cluster members. This causes the cache to become populated faster that in the dynacache case. Larger numbers of commerce cluster members will show higher degradation as they are all competing for the single database instance during this warm-up time.


Continously invalidating part of the cache

To simulate this scenario, we developed an algorithm that invalidated 25% of the cache entries every twenty minutes - similar to the effect that is caused by a live inventory feed. As with the previous scenario, where we invalidate the entire cache, the database server I/O rate is a good indicator of overall site performance. The figure below shows a comparison of database server I/O for both dynacache-based and eXtreme Scale-based topologies.

In our test, the overall I/O rate (yellow line) for the dynacache-based topology remains consistently high at about 800 I/O operations per second. The test site remains bottlenecked on the database server I/O and bound by performance characteristics of the database server disk.

The database I/O for eXtreme Scale-based topology is able to recover and drop to a much lower level of about 300 I/O operations per second. The test site is bottlenecked on database I/O for short periods of time following each partial invalidation event, but recovers quickly. Again, this good result is due to the fact that with eXtreme Scale, multiple JVMs work to populate a single cache instance. The cache is populated faster, allowing the database server I/O rate to relax.


Conclusion

WebSphere Application Server dynacache with disk offload can provide commerce sites with excellent performance characteristics. There are, however, a number of important advantages to deploying WebSphere eXtreme Scale together with WebSphere Commerce. These include:

WebSphere eXtreme Scale works best for caching commerce pages, page fragments, and commands. For best performance results it is best not to cache distributed maps in eXtreme Scale.

Some customers are especially likely to benefit from integrating their site with eXtreme Scale:

Customer results may vary due to differences in scenario and hardware environment. We would like to encourage customers to evaluate adding WebSphere eXtreme Scale to their commerce site.

Resources