Network Deployment (Distributed operating systems), v8.0 > Reference > Administrator best practices
Workload is not getting distributed
This information might help you diagnose the trouble if you are having a workload distribution problem.
New feature: Beginning in WAS v8.0 you can configure the server to use the High Performance Extensible Logging (HPEL) log and trace infrastructure instead of using SystemOut.log , SystemErr.log, trace.log, and activity.log files or native z/OS logging facilities. If you are using HPEL, you can access all of your log and trace information using the LogViewer command-line tool from your server profile bin directory. See the information about using HPEL to troubleshoot applications for more information on using HPEL.New feature:
What kind of problem are you seeing?
- HTTP requests are not distributed to all servers
- Enterprise bean requests are not distributed to all servers
- Enterprise bean requests are not distributed evenly
- A failing server still receives enterprise bean requests (failover is not completed)
- #rtrb_wlmprobs__hung
- A cluster does not fail over to its backup cluster
If none of these problem solution descriptions fix your problem:
- Browse the JVM logs of the problem dmgr and application servers:
- Look up any error messages by selecting the Reference view of the information center navigation and expanding Messages in the navigation tree.
- Use the Log and Trace Analyzer toolt o browse and analyze the service log (activity.log) of the dmgr and any nodes encountering problems. View the activity.log files in both WAS_HOME/logs and WAS_HOME/logs.
If Java exceptions appear in the log files, try to determine the actual subcomponent that is directly involved in the problem by examining the trace stack and looking for a product-related class near the top of the stack (names beginning with com.ibm.websphere or com.ibm.ws) that created the exception. If appropriate, review the steps for troubleshooting the appropriate subcomponent under the Troubleshoot WebSphere applications section of the Information Center.
For example, if the exception appears to have been thrown by a class in the com.ibm.websphere.naming package, review the "Naming Services Component troubleshooting tips" topic.
- Ensure that all the machines in the configuration have TCP/IP connectivity to each other by running the ping command:
- From each physical server to the dmgr
- From the dmgr to each physical server
- Although the problem is happening in a clustered environment, the actual cause might be only indirectly related, or unrelated, to clustering. Investigate all relevant possibilities:
- If an enterprise bean on one or more servers is not serving requests, review the "Cannot access an enterprise bean from a servlet, JSP, stand-alone program, or other client" and "Cannot look up an object hosted by the product from a servlet, JSP file, or other client" topics.
- If problems seem to appear after enabling security, review the "Errors or access problems after enabling security" topic.
- If an application server stops responding to requests, or spontaneously dies (its process closes), review the "Web module or application server dies or hangs" topic.
- If SOAP requests are not being served by some or all servers, review the "Errors returned to client trying to send a SOAP request" topic.
- If we have problems installing or deploying an application on servers on one or more nodes, review the "Troubleshooting code deployment and installation problems" topic.
- If your topology consists of a Windows-based dmgr with supported UNIX systems servers, browse any recently-updated .xml and .policy files on the supported UNIX-based systems using vi to ensure that Control-M characters are not present in the files.
To avoid this problem in the future, edit these files using vi on the supported UNIX-based systems, to avoid inserting these characters.
- Check for troubleshooting tips for the workload management component.
- Check to see if the problem is identified and documented by looking at available online support (hints and tips, technotes, and fixes).
HTTP requests are not distributed to all servers
If HTTP requests are not being distributed to all servers:
- Check your Primary Servers list. The plug-in load balances across all servers that are defined in the Primary Servers list, if affinity has not been established. If you do not have a Primary Servers list defined, the plug-in load balances across all servers defined in the cluster, if affinity has not been established. In the case where affinity has been established, the plug-in should go directly to that server, for all requests within the same HTTP session.
- If some servers are servicing requests and one or more others are not, try accessing a problem server directly to verify that it works, apart from workload management issues. If that does not work:
- Use the admin console to ensure that the affected server is running.
- See the topic "Web resource does not display" for more information.
- See the "HTTP plug-in component troubleshooting tips" topic for more information.
- Check the steps for diagnosing workload management issues in the "Troubleshooting the Workload Management component" topic.
Enterprise bean requests are not distributed to all servers
If a client cannot reach a server in a cluster thought to be reachable, a server might be marked unusable, or is down.
To verify this:
- Use the admin console to verify that the server is started. Try starting it, or if started, stop and restart it.
- Browse the admin console and verify that the node that runs the server having the problem appears. If it does not:
- Review the steps for adding a node to a cluster.
- Review the steps in the section One or more nodes do not show up in the admin console.
- If possible, try accessing the enterprise bean directly on the problem server to see if there is a problem with TCP/IP connectivity, application server health, or other problem not related to workload management. If this fails, review the "Cannot access enterprise bean from a servlet, JSP, stand-alone program , or other client" topic.
- Check the steps for diagnosing workload management issues in the "Troubleshooting the Workload Management component" topic.
Enterprise bean requests are not distributed evenly
There are a number of possible reasons for this behavior, which generally fall into one or more of these categories:
- Improper configuration
- Environment issues such as the availability of servers or applications.
- A large numbers of requests that involve transactional affinity, or
- A small number of clients
Workload management in the product is based on a weighted proportional scheme to spray requests among the servers. This results in balance being determined by numbers of requests rather than by any other measure. A true balance problem is determined by comparing the number of requests processed by each member of the cluster with the weights that have been set for each of those members. This is done by following the steps in the topic "Troubleshooting the Workload Management component".
A failing server still receives enterprise bean requests (failover is not completed)
Some possible causes of this problem are:
- The client might have been in a transaction with an enterprise bean on the server that went down. Check the JVM logs of the application server hosting the problem enterprise bean instance. If a request is returned with CORBA SystemException COMM_FAILURE org.omg.CORBA.completion_status.COMPLETED_MAYBE, this might be working as designed. The design is to let this particular exception flow back to the client, since the transaction might have completed. Failing over this request to another server could result in this request being serviced twice.
- If the requests sent to the servers come back to the client with any other exceptions consistently, it might be that no servers are available. In this case, follow the resolution steps as outlined in the topic "Troubleshooting the Workload Management component".
A cluster does not fail over to its backup cluster
You might experience an error that is similar to the following sample:
[10/11/04 13:11:10:233 CDT] 00000036 SelectionMana A WWLM0061W: An error was encountered sending a request to cluster member {MEMBERNAME=FlorenceEJBServer1, NODENAME=fwwsaix1Node01} and that member has been marked unusable for future requests to the cluster "", because of exception: org.omg.CORBA.COMM_FAILURE: CONNECT_FAILURE_ON_SSL_CLIENT_SOCKET - JSSL0130E: java.io.IOException: Signals that an I/O exception of some sort has occurred. Reason: Connection refused vmcid: 0x49421000 minor code: 70 completed: No"Perform the following steps to fix the configuration:
- Review your dmgr hostname and bootstrap port for each backup cluster setting.
- Review your core group bridge peer ports to make sure the hostname and distribution and consistency services (DCS) port are accurate.
- Verify that the names of your primary and backup clusters match.
- If the application is going through security to go to the backup cluster, review the security configuration. You might need to use single sign on (SSO) and import the Lightweight Third Party Authentication (LTPA) keys to the backup cell.
Troubleshoot administration
View JVM logs
Add logging and tracing to the application
Related
Multiserver environment errors
Workload management component troubleshooting tips
Naming service troubleshooting tips
Application access problems
Enterprise bean cannot be accessed from a servlet, a JSP file, a stand-alone program, or another client
Application client SOAP request troubleshooting tips
Web module or application server stops processing requests
Application deployment problems
Web server plug-in troubleshooting tips
Web resource is not displayed
Access problems after enabling security