Workload is not getting distributed

 

+

Search Tips   |   Advanced Search

 

  1. Web (HTTP) requests are not distributed to all servers
  2. Enterprise bean requests are not distributed to all servers
  3. Enterprise bean requests are not distributed evenly
  4. A failing server still receives enterprise bean requests (failover is not completed)
  5. Stopped or hung servers do not share the workload after being restored
  6. A cluster does not fail over to its backup cluster

 

Web (HTTP) requests are not distributed to all servers

If HTTP requests are not being distributed to all servers:

  • Check your PrimaryServers list. The plug-in load balances across all servers that are defined in the PrimaryServers list, if affinity has not been established. If you do not have a PrimaryServers list defined, the plug-in load balances across all servers defined in the cluster, if affinity has not been established. In the case where affinity has been established, the plug-in should go directly to that server, for all requests within the same HTTP session.

  • If some servers are servicing requests and one or more others are not, try accessing a problem server directly to verify that it works, apart from workload management issues. If that does not work:

  • See the article HTTP plug-in component troubleshooting tips for more information.

  • Check the steps for diagnosing workload management issues in Troubleshooting the Workload Management component.

 

Enterprise bean requests are not distributed to all servers

If a client cannot reach a server in a cluster thought to be reachable, a server might be marked unusable, or is down. To verify this:

 

Enterprise bean requests are not distributed evenly

There are a number of possible reasons for this behavior, which generally fall into one or more of these categories:

  • Improper configuration
  • Environment issues such as the availability of servers or applications.
  • A large numbers of requests that involve transactional affinity, or
  • A small number of clients

Workload management in WAS is based on a round robin scheme of request distribution. This results in balance being determined by numbers of requests rather than by any other measure. A true balance problem is determined by comparing the number of requests processed by each member of the cluster with the weights that have been set for each of those members. This is done by following the steps in the Troubleshooting the Workload Management component topic.

  • When the percentage of requests that arrive for each member of the cluster is consistent with the weights then further analysis of the application is required to determine the cause for the workload being imbalanced even when the number of requests is balanced.

  • When the number of numIncomingNonWLMObjectRequests is not balanced among the members of the cluster and is large in relation to the numIncomingRequests then the reason for the imbalance is the non-distributable components installed on the members of the cluster. A modification to the configuration will yield a more balanced environment.

  • When the number of numIncomingStrongAffinityRequests is not balanced among the members of the cluster and is large in relation to the numIncomingRequests then the reason for the imbalance is the requests which are invoked within a transaction. These can be reduced by installing the objects involved within a transaction within the same cluster.

 

A failing server still receives enterprise bean requests (failover is not completed)

Some possible causes of this problem are:

  • The client might have been in a transaction with an enterprise bean on the server that went down. Check the JVM logs of the application server hosting the problem enterprise bean instance. If a request is returned with CORBA SystemException COMM_FAILURE org.omg.CORBA.completion_status.COMPLETED_MAYBE, this might be working as designed. The design is to let this particular exception flow back to the client, since the transaction might have completed. Failing over this request to another server could result in this request being serviced twice.

  • If the requests sent to the servers come back to the client with any other exceptions consistently, it might be that no servers are available. In this case, follow the resolution steps as outlined in Troubleshooting the Workload Management component.

 

Stopped or hung servers do not share the workload after being restored

This error occurs when the servers that were unavailable are not recognized by the Workload Management component after they are restored. There is an unusable interval determined by the property com.ibm.websphere.wlm.unusable.interval during which the workload manager waits to send to a server that has been marked unusable. By default this is 15 minutes.

We can confirm that this is the problem by ensuring that servers that were down are now up and capable of servicing requests. Then wait for the unusable interval to elapse before checking to determine whether failover occurs.

 

A cluster does not fail over to its backup cluster

You might experience an error that is similar to the following sample

[10/11/04 13:11:10:233 CDT] 00000036 SelectionMana A    WWLM0061W: An error was 
encountered sending a request to cluster member  {MEMBERNAME=FlorenceEJBServer1, 
NODENAME=fwwsaix1Node01} and that member has been  marked unusable for future 
requests to the cluster "", because of exception:  org.omg.CORBA.COMM_FAILURE: 
CONNECT_FAILURE_ON_SSL_CLIENT_SOCKET - JSSL0130E:  java.io.IOException: Signals 
that an I/O exception of some sort has occurred.   Reason:  Connection refused  
vmcid: 0x49421000  minor code: 70  completed: No" 

Perform the following steps to fix your configuration:

  1. Review your deployment manager hostname and bootstrap port for each backup cluster setting.

  2. Review your core group bridge peer ports to make sure the hostname and DCS port are accurate.

  3. Verify that the names of your primary and backup clusters match.

  4. If your application is going through security to go to the backup cluster, review your security configuration. You might need to use single sign on (SSO) and import the LTPA keys to the backup cell.

For current information available from IBM Support on known problems and their resolution, see the IBM Support page.

IBM Support has documents that can save you time gathering information needed to resolve this problem. Before opening a PMR, see the IBM Support page.

 

Addenda

If none of these problem solution descriptions fix your problem:

  1. Browse the JVM logs of the problem deployment manager and application servers:

    1. Look up any error messages by selecting the Reference view of the information center navigation and expanding Messages in the navigation tree.

    2. Use the Log Analyzer to browse and analyze the service log (activity.log) of the deployment manager and any nodes encountering problems. View the activity.log files in both NetworkDeployment_install_root/logs and ApplicationServer_install_root/logs.

    3. If Java exceptions appear in the log files, try to determine the actual subcomponent that is directly involved in the problem by examining the trace stack and looking for a WebSphere Application Server-related class near the top of the stack (names beginning with com.ibm.websphere or com.ibm.ws) that created the exception. If appropriate, review the steps for troubleshooting the appropriate subcomponent under the Troubleshooting by component: what is not working? topic.

      For example, if the exception appears to have been thrown by a class in the com.ibm.websphere.naming package, review the Naming Services Component troubleshooting tips topic.

  2. Ensure that all the machines in your configuration have TCP/IP connectivity to each other by running the ping command:

    1. From each physical server to the deployment manager

    2. From the deployment manager to each physical server

  3. Although the problem is happening in a clustered environment, the actual cause might be only indirectly related, or unrelated, to clustering. Investigate all relevant possibilities:

    1. If an enterprise bean on one or more servers is not serving requests, review the Cannot access an enterprise bean from a servlet, JSP, stand-alone program, or other client and Cannot access an object hosted by WAS from a servlet, JSP file, or other client topics.

    2. If problems seem to appear after enabling security, review the Errors or access problems after enabling security topic.

    3. If an application server stops responding to requests, or spontaneously dies (its process closes), review the Web module or application server dies or hangs topic.

    4. If SOAP requests are not being served by some or all servers, review the Errors returned to client trying to send a SOAP request topic.

    5. If you have problems installing or deploying an application on servers on one or more nodes, review the Troubleshooting code deployment and installation problems topic.

  4. If your topology consists of a Windows-based deployment manager with UNIX-based servers, browse any recently-updated .xml and .policy files on the UNIX-based platform using vi to ensure that Control-M characters are not present in the files. To avoid this problem in the future, edit these files using vi on the UNIX-based platform, to avoid inserting these characters.

  5. Check the steps for Workload management component troubleshooting tips.

  6. Check to see if the problem is identified and documented by looking at available online support (hints and tips, technotes, and fixes).


 

Related Tasks


Troubleshooting by task
Troubleshooting by component

 

See Also


Errors setting up multiserver environments
Workload management component troubleshooting tips