Home
Troubleshoot WebSphere eXtreme Scale
- Troubleshooting and support for WebSphere eXtreme Scale
- Enable logging
- Collecting trace
- Troubleshooting with High Performance Extensible Logging (HPEL)
- Analyzing log and trace data
- Troubleshooting the product installation
- Troubleshooting client connectivity
- Troubleshooting cache integration
- Troubleshooting the JPA cache plug-in
- Troubleshooting IBM eXtremeMemory
- Troubleshooting administration
- Troubleshooting data monitoring
- Troubleshooting multiple data center configurations
- Troubleshooting loaders
- Troubleshooting XML
- Troubleshooting deadlocks
- Troubleshooting lock timeout exceptions for a multi-partition transaction
- Troubleshooting security
- Troubleshooting Liberty profile configurations
- Collecting data with the IBM Support Assistant Data Collector
- IBM Support Assistant for WebSphere eXtreme Scale
Troubleshooting and support for WebSphere eXtreme Scale
To isolate and resolve problems with your IBM products, we can use the troubleshooting and support information. This information contains instructions for using the problem-determination resources that are provided with your IBM products, including WebSphere eXtreme Scale .
Techniques for troubleshooting problems
Troubleshooting is a systematic approach to solving a problem. The goal of troubleshooting is to determine why something does not work as expected and how to resolve the problem. Certain common techniques can help with the task of troubleshooting.
The first step in the troubleshooting process is to describe the problem completely. Problem descriptions help you and the IBM technical-support representative know where to start to find the cause of the problem. This step includes asking yourself basic questions:
- What are the symptoms of the problem?
- Where does the problem occur?
- When does the problem occur?
- Under which conditions does the problem occur?
- Can the problem be reproduced?
The answers to these questions typically lead to a good description of the problem, which can then lead you to a problem resolution.
What are the symptoms of the problem?
When starting to describe a problem, the most obvious question is "What is the problem?" This question might seem straightforward; however, we can break it down into several more-focused questions that create a more descriptive picture of the problem. These questions can include:
- Who, or what, is reporting the problem?
- What are the error codes and messages?
- How does the system fail? For example, is it a loop, hang, crash, performance degradation, or incorrect result?
Where does the problem occur?
Determining where the problem originates is not always easy, but it is one of the most important steps in resolving a problem. Many layers of technology can exist between the reporting and failing components. Networks, the data grid, and servers are only a few of the components to consider when we are investigating problems.
The following questions help you to focus on where the problem occurs to isolate the problem layer:
- Is the problem specific to one platform or operating system, or is it common across multiple platforms or operating systems?
- Is the current environment and configuration supported?
- Do all users have the problem?
- (For multi-site installations.) Do all sites have the problem?
If one layer reports the problem, the problem does not necessarily originate in that layer. Part of identifying where a problem originates is understanding the environment in which it exists. Take some time to completely describe the problem environment, including the operating system and version, all corresponding software and versions, and hardware information. Confirm that we are running within an environment that is a supported configuration; many problems can be traced back to incompatible levels of software that are not intended to run together or have not been fully tested together.
When does the problem occur?
Develop a detailed timeline of events leading up to a failure, especially for those cases that are one-time occurrences. We can most easily develop a timeline by working backward: Start at the time an error was reported (as precisely as possible, even down to the millisecond), and work backward through the available logs and information. Typically, you need to look only as far as the first suspicious event that you find in a diagnostic log.
To develop a detailed timeline of events, answer these questions:
- Does the problem happen only at a certain time of day or night?
- How often does the problem happen?
- What sequence of events leads up to the time that the problem is reported?
- Does the problem happen after an environment change, such as upgrading or installing software or hardware?
Responding to these types of questions can give you a frame of reference in which to investigate the problem.
Under which conditions does the problem occur?
Knowing which systems and applications are running at the time that a problem occurs is an important part of troubleshooting. These questions about your environment can help you to identify the root cause of the problem:
- Does the problem always occur when the same task is being performed?
- Does a certain sequence of events need to happen for the problem to occur?
- Do any other applications fail at the same time?
Answering these types of questions can help you explain the environment in which the problem occurs and correlate any dependencies. Remember that just because multiple problems might have occurred around the same time, the problems are not necessarily related.
Can the problem be reproduced?
From a troubleshooting standpoint, the ideal problem is one that can be reproduced. Typically, when a problem can be reproduced you have a larger set of tools or procedures at your disposal to help you investigate. Consequently, problems that we can reproduce are often easier to debug and solve.
However, problems that we can reproduce can have a disadvantage: If the problem is of significant business impact, you do not want it to recur. If possible, recreate the problem in a test or development environment, which typically offers you more flexibility and control during your investigation.
- Can the problem be recreated on a test system?
- Are multiple users or applications encountering the same type of problem?
- Can the problem be recreated by running a single command, a set of commands, or a particular application?
Searching knowledge bases
We can often find solutions to problems by searching IBM knowledge bases. We can optimize your results by using available resources, support tools, and search methods.
We can find useful information by searching the information center for WebSphere eXtreme Scale . However, sometimes you need to look beyond the information center to answer your questions or resolve problems.
To search knowledge bases for information that you need, use one or more of the following approaches:
- Search for content using the IBM Support Assistant (ISA).
ISA is a no-charge software serviceability workbench that helps you answer questions and resolve problems with IBM software products. We can find instructions for downloading and installing ISA on the ISA website .
- Find the content that you need using the IBM Support Portal .
The IBM Support Portal is a unified, centralized view of all technical support tools and information for all IBM systems, software, and services. The IBM Support Portal lets you access the IBM electronic support portfolio from one place. We can tailor the pages to focus on the information and resources that you need for problem prevention and faster problem resolution. Familiarize yourself with the IBM Support Portal by viewing the demo videos (https://www.ibm.com/blogs/SPNA/entry/the_ibm_support_portal_videos) about this tool. These videos introduce you to the IBM Support Portal, explore troubleshooting and other resources, and demonstrate how we can tailor the page by moving, adding, and deleting portlets.
Getting fixes
A product fix might be available to resolve your problem.
To find and install fixes:
- Obtain the tools required to get the fix. Use the IBM Update Installer to install and apply various types of maintenance packages for WebSphere eXtreme Scale or WebSphere eXtreme Scale Client. Because the Update Installer undergoes regular maintenance, use the most current version of the tool.
- Determine which fix you need. Recommended fixes for WebSphere eXtreme Scale to select the latest fix. When you select a fix, the download document for that fix opens.
- Download the fix. In the download document, click the link for the latest fix in the "Download package" section.
- Apply the fix. Follow the instructions in the "Installation Instructions" section of the download document.
- Subscribe to receive weekly e-mail notifications about fixes and other IBM Support information.
Getting fixes from Fix Central
Use Fix Central to find the fixes that are recommended by IBM Support for a variety of products, including WebSphere eXtreme Scale . With Fix Central, we can search, select, order, and download fixes for your system with a choice of delivery options. A WebSphere eXtreme Scale product fix might be available to resolve your problem.
To find and install fixes:
- Obtain the tools that are required to get the fix. If it is not installed, obtain your product update installer. We can download the installer from Fix Central . This site provides download, installation, and configuration instructions for the update installer.
- Select as the product, and select one or more check boxes that are relevant to the problem to resolve.
- Identify and select the fix that is required.
- Download the fix.
- Open the download document and follow the link in the "Download Package" section.
- When downloading the file, ensure that the name of the maintenance file is not changed. This change might be intentional, or it might be an inadvertent change that is caused by certain web browsers or download utilities.
- Apply the fix.
- Optional: Subscribe to receive weekly e-mail notifications about fixes and other IBM Support updates.
Contacting IBM Support
IBM Support provides assistance with product defects, answers FAQs, and helps users resolve problems with the product.
After trying to find your answer or solution by using other self-help options, such as release notes, we can contact IBM Support. Before contacting IBM Support, your company or organization must have an active IBM maintenance contract, and you must be authorized to submit problems to IBM. For information about the types of available support, see the Support portfolio topic in the "Software Support Handbook".
To contact IBM Support about a problem:
- Define the problem, gather background information, and determine the severity of the problem. For more information, see the Getting IBM support topic in the Software Support Handbook.
- Gather diagnostic information.
- Submit the problem to IBM Support in one of the following ways:
- With IBM Support Assistant (ISA).
- Online through the IBM Support Portal : We can open, update, and view all of your service requests from the Service Request portlet on the Service Request page.
- By phone: For the phone number to call in your region, see the Directory of worldwide contacts web page.
Results
If the problem that you submit is for a software defect or for missing or inaccurate documentation, IBM Support creates an Authorized Program Analysis Report (APAR). The APAR describes the problem in detail. Whenever possible, IBM Support provides a workaround that we can implement until the APAR is resolved and a fix is delivered. IBM publishes resolved APARs on the IBM Support website daily, so that other users who experience the same problem can benefit from the same resolution.
Exchanging information with IBM
To diagnose or identify a problem, you might need to provide IBM Support with data and information from your system. In other cases, IBM Support might provide you with tools or utilities to use for problem determination.
Sending information to IBM Support
To reduce the time that is required to resolve your problem, we can send trace and diagnostic information to IBM Support.
Procedure
To submit diagnostic information to IBM Support:
- Open a problem management record (PMR).
- Collect the diagnostic data that you need. Diagnostic data helps reduce the time that it takes to resolve your PMR. We can collect the diagnostic data manually or automatically:
- Collect the data manually.
- Collect the data automatically.
- Compress the files using the .zip or .tar file format.
- Transfer the files to IBM. Use one of the following methods to transfer the files to IBM:
- IBM Support Assistant
- The Service Request tool
- Standard data upload methods: FTP, HTTP
- Secure data upload methods: FTPS, SFTP, HTTPS
If we are using a z/OS product and you use ServiceLink / IBMLink to submit PMRs, we can send diagnostic data to IBM Support in an e-mail or by using FTP.
All of these data exchange methods are explained on the IBM Support website .
Receiving information from IBM Support
Occasionally an IBM technical-support representative might ask you to download diagnostic tools or other files. Use FTP to download these files.
Before you begin
Ensure that your IBM technical-support representative provided you with the preferred server to use for downloading the files and the exact directory and file names to access.
Procedure
To download files from IBM Support:
- Use FTP to connect to the site that your IBM technical-support representative provided and log in as
anonymous. Use your e-mail address as the password.
- Change to the appropriate directory:
- Change to the /fromibm directory.
cd fromibm- Change to the directory that your IBM technical-support representative provided.
cd nameofdirectory
- Enable binary mode for the session.
binary
- Use the get command to download the file that your IBM technical-support representative specified.
get filename.extension
- End your FTP session.
quit
Subscribing to Support updates
To stay informed of important information about the IBM products that you use, we can subscribe to updates.
By subscribing to receive updates about the product, we can receive important technical information and updates for specific IBM Support tools and resources. We can subscribe to updates by using one of two approaches:
- Social media subscriptions
- The following RSS feed is available for the product:
- RSS feed for WebSphere eXtreme Scale forum
For general information about RSS, including steps for getting started and a list of RSS-enabled IBM web pages, visit the IBM Software Support RSS feeds site.
- My Notifications
- With My Notifications, we can subscribe to Support updates for any IBM product. My Notifications replaces My Support, which is a similar tool that you might have used in the past. With My Notifications, we can specify to receive daily or weekly e-mail announcements. We can specify what type of information we want to receive, such as publications, hints and tips, product flashes (also known as alerts), downloads, and drivers. My Notifications enables you to customize and categorize the products about which we want to be informed and the delivery methods that best suit your needs.
To subscribe to Support updates:
- Subscribe to the RSS feed for the WebSphere eXtreme Scale forum .
- On the subscription page, click the RSS feed icon.
- Select the option to use to subscribe to the feed.
- Click Subscribe.
- Subscribe to My Notifications by going to the IBM Support Portal and click My Notifications in the Notifications portlet.
- Sign in using your IBM ID and password, and click Submit.
- Identify what and how we want to receive updates.
- Click the Subscribe tab.
- Select the appropriate software brand or type of hardware.
- Select one or more products by name and click Continue.
- Select your preferences for how to receive updates, whether by e-mail, online in a designated folder, or as an RSS or Atom feed.
- Select the types of documentation updates to receive, for example, new information about product downloads and discussion group comments.
- Click Submit.
Results
Until you modify your RSS feeds and My Notifications preferences, you receive notifications of updates that you have requested. We can modify your preferences when needed; for example, if you stop using one product and begin using another product.
IBM Software Support RSS feeds
Subscribe to My Notifications support content updates
My Notifications for IBM technical support
My Notifications for IBM technical support overview
Enable logging
Use logs to monitor and troubleshoot your environment.
Logs are saved different locations and formats depending on the configuration.
- Enable logs in a stand-alone environment.
With stand-alone catalog servers, the logs are in the location where we run the start server command. For container servers, we can use the default location or set a custom log location:
- Default log location: The logs are in the directory where the start server command was run. If you start the servers in the wxs_home /bin directory, the logs and trace files are in the logs/<server_name> directories in the bin directory.
- Custom log location: To specify an alternate location for container server logs, create a properties file, such as server.properties, with the following contents:
workingDirectory=<directory> traceSpec= systemStreamToFileEnabled=trueThe workingDirectory property is the root directory for the logs and optional trace file. WebSphere eXtreme Scale creates a directory with the name of the container server with a SystemOut.log file, a SystemErr.log file, and a trace file. To use a properties file during container startup, use the -serverProps option and provide the server properties file location.
- Enable logs in WebSphere Application Server.
See WebSphere Application Server: Enabling and disabling logging for more information.
- Retrieve FFDC files.
FFDC files are for IBM support to aid in debug. These files might be requested by IBM support when a problem occurs. These files are in a directory labeled, ffdc, and contain files that resemble the following:
server2_exception.log server2_20802080_07.03.05_10.52.18_0.txt
- .NET: Enable logs in a .NET client.
Logs in a .NET client are configured by default and are written to the logs directory on the client.
What to do next
View the log files in their specified locations. Common messages to look for in the SystemOut.log file are start confirmation messages, such as the following example:
CWOBJ1001I: ObjectGrid Server catalogServer01 is ready to process requests.
Configure remote logging
We can enable remote logging to save log entries on a remote server. Remote logging can be helpful when set a detailed debugging log level to help isolate a problem or monitor behavior over a long time period.
- You must have a syslog server available to listen for and capture events.
- The names of the catalog servers, container servers, and application servers (if we are using WebSphere Application Server) must contain alphanumeric characters only. Syslog RFC 1364 does not allow non-alphanumeric characters for the TAG field. The TAG field contains the server name in the syslog messages.
Use remote logging for analysis of historical data. The servers in your environment keep a limited number of log files in the system. Configure remote logging if you require more log files to be saved for further analysis. The remote logging server aggregates the data from multiple servers. We can configure your entire topology of catalog servers and container servers to send files to the same remote logging server.
- Configure remote logging on each catalog server or container server. Enable remote logging by editing the following properties in the server properties file:
- syslogEnabled
- Enables remote logging for analysis of historical data. You must have a syslog server available to listen for and capture events.
Default: false
- syslogHostName
- Host name or IP address of the remote server on which we want to log historical data.
- syslogHostPort
- Port number of the remote server on which we want to log historical data.
Valid values: 0-65535
Default: 512
- syslogFacility
- Indicates the type of remote logging facility that is being used.
Valid values: kern, user, mail, daemon, auth, syslog, lpr, news, uucp, cron, authpriv, ftp, sys0, sys1, sys2, sys3,local0, local1, local2, local3, local4, local5, local6, local7
Default:
user
- syslogThreshold
- Specifies the threshold of the severity of messages to send to the remote logging server. To send both warning and severe messages, enter a value of WARNING. To send severe messages only, enter
SEVERE.
Valid values: SEVERE,
WARNINGDefault:
WARNING
- Restart the catalog servers and container servers on which you changed the properties.
Results
Messages are sent to your configured remote logging server for archival and analysis. .NET:
.NET client logs
Logs in a .NET client are configured by default and are written to files in the logs directory and the Windows event log.
Default log file locations and settings
After you install the WebSphere eXtreme Scale Client for .NET, the log directories are created, based on the log directory location that you specify during installation. The following log files are generated:
- SystemOut.log: Contains information, error, warning, and failure messages. This file is in the logs/ directory of the client.
Location log_dirirectory\SystemOut.log Level of log Logs all messages at the information level Size 10 MB max size per file Maximum files 50 maximum archived files - SystemErr.log: Contains error and failure messages. This file is in the logs/ directory of the client.
Location log_dirirectory\SystemError.log Level of log Logs all messages at the error level Size 10 MB max size per file Maximum files 50 maximum archived files - SystemFirstFail.log: Contains first failure information that may or may not be recoverable by client. This file is in the logs/ directory of the client.
Location log_dirirectory\SystemFirstFailure.log Level of log Logs all First Failure messages Size 10 MB max size per file Maximum files 50 maximum archived files - Windows event log: Fatal errors go in the Windows event log. Fatal errors occur when the client can no longer take transactions. WebSphere eXtreme Scale messages are logged in the Windows event log as WXSEventLog messages.
Trace and FFDC logs
Trace logs are not enabled by default on .NET clients. If you need to collect trace for a .NET client, contact the Support team for further assistance.
Collecting trace
Use trace to monitor and troubleshoot your environment.
You must provide trace for a server when you work with IBM support.
Collecting trace can help you monitor and fix problems in the deployment of WXS. How you collect trace depends on the configuration.
- Collect trace within a WebSphere Application Server environment.
If the catalog and container servers are in a WebSphere Application Server environment, see WebSphere Application Server: Working with trace for more information.
- Collect trace with the stand-alone catalog or container server start command.
We can set trace on a catalog service or container server using the -traceSpec and -traceFile parameters with the start server command. For example:
startOgServer.sh catalogServer -traceSpec ObjectGridPlacement=all=enabled -traceFile /home/user1/logs/trace.log
startXsServer.sh catalogServer -traceSpec ObjectGridPlacement=all=enabled -traceFile /home/user1/logs/trace.logThe -traceFile parameter is optional. If you do not set a -traceFile location, the trace file goes to the same location as the system out log files.
- Collect trace on the stand-alone catalog or container server with a properties file.
To collect trace from a properties file, create a file, such as a server.properties file, with the following contents:
workingDirectory=<directory> traceSpec=<trace_specification> systemStreamToFileEnabled=trueThe workingDirectory property is the root directory for the logs and optional trace file. If the workingDirectory value is not set, the default working directory is the location used to start the servers, such as wxs_home /bin. To use a properties file during server startup, use the -serverProps parameter with the startOgServer command and provide the server properties file location.
- JAVA: Collect trace on a stand-alone Java client.
We can start trace collection on a stand-alone client by adding system properties to the startup script for the client application.
In the following example, trace settings are specified for the com.ibm.samples.MyClientProgram application:
java -DtraceSettingsFile=MyTraceSettings.properties -Djava.util.logging.manager=com.ibm.ws.bootstrap.WsLogManager -Djava.util.logging.configureByServer=true com.ibm.samples.MyClientProgramFore more information, see WebSphere Application Server: Enabling trace on client and stand-alone applications .
- .NET: Collect trace on a .NET client.
Trace is not enabled by default for .NET clients. To collect trace for a .NET client, contact the Support team for further assistance.
- JAVA: Collect trace with the ObjectGridManager interface.
We can also set trace during run time on an ObjectGridManager interface. Setting trace on an ObjectGridManager interface can be used to get trace on a WXS client while it connects to a WXS and commits transactions. To set trace on an ObjectGridManager interface, supply a trace specification and a trace log.
ObjectGridManager manager = ObjectGridManagerFactory.getObjectGridManager(); ... manager.setTraceEnabled(true); manager.setTraceFileName("logs/myClient.log"); manager.setTraceSpecification("ObjectGridReplication=all=enabled");
- Collect trace on container servers with the xscmd utility.
To collect trace with the xscmd utility, use the -c setTraceSpec command. Use the xscmd utility to collect trace on a stand-alone environment during run time instead of during startup. We can collect trace on all servers and catalog services or we can filter the servers based on the ObjectGrid name, and other properties. For example, to collect ObjectGridReplication trace with access to the catalog service server, run:
xscmd -c setTraceSpec -spec "ObjectGridReplication=all=enabled"We can also disable trace by setting the trace specification to *=all=disabled.
Results
Trace files are written to the specified location.
- Server trace options
We can enable trace to provide information about your environment to IBM support.
Server trace options
We can enable trace to provide information about your environment to IBM support.
About trace
WebSphere eXtreme Scale trace is divided into several different components. We can specify the level of trace to use for a catalog server or container server. Common levels of trace include: all, debug, entryExit, and event.
An example trace string follows:
ObjectGridComponent=level=enabledWe can concatenate trace strings. Use the * (asterisk) symbol to specify a wildcard value, such as ObjectGrid*=all=enabled. If you need to provide a trace to IBM support, a specific trace string is requested. For example, if a problem with replication occurs, the ObjectGridReplication=debug=enabled trace string might be requested.
Trace specification
- ObjectGrid
- General core cache engine.
- ObjectGridCatalogServer
- General catalog service.
- ObjectGridChannel
- Static deployment topology communications.
- ObjectGridClientInfo
- DB2 client information.
- ObjectGridClientInfoUser
- DB2 user information.
- ObjectgridCORBA
- Dynamic deployment topology communications.
- ObjectGridDataGrid
- The AgentManager API.
- ObjectGridDynaCache
- The WebSphere eXtreme Scale dynamic cache provider.
- ObjectGridEntityManager
- The EntityManager API. Use with the Projector option.
- ObjectGridEvictors
- ObjectGrid built-in evictors .
- ObjectGridJPA
- JPA loaders.
- ObjectGridJPACache
- JPA cache plug-ins.
- ObjectGridLocking
- ObjectGrid cache entry lock manager.
- ObjectGridLogHandler
- Remote logging information.
- ObjectGridMBean
- Management beans.
- ObjectGridMonitor
- Historical monitoring infrastructure.
- ObjectGridNative
- WebSphere eXtreme Scale native code trace, including eXtremeMemory native code.
- ObjectGridOSGi
- The WebSphere eXtreme Scale OSGi integration components.
- ObjectGridPlacement
- Catalog server shard placement service.
- ObjectGridQuery
- ObjectGrid query.
- ObjectGridReplication
- Replication service.
- ObjectGridRouting
- Client/server routing details.
- ObjectGridSecurity
- Security trace.
- ObjectGridSerializer
- The DataSerializer plug-in infrastructure.
- ObjectGridStats
- ObjectGrid statistics.
- ObjectGridTransactionManager
- The WebSphere eXtreme Scale transaction manager.
- ObjectGridWriteBehind
- ObjectGrid write behind.
- ObjectGridXA
- Multi-partition transaction trace.
- ObjectGridXM
- General IBM eXtremeMemory trace.
- ObjectGridXMEviction
- eXtremeMemory eviction trace.
- ObjectGridXMTransport
- eXtremeMemory general transport trace.
- ObjectGridXMTransportInbound
- eXtremeMemory inbound specific transport trace.
- ObjectGridXMTransportOutbound
- eXtremeMemory outbound specific transport trace.
- Projector
- The engine within the EntityManager API.
- QueryEngine
- The query engine for the Object Query API and EntityManager Query API.
- QueryEnginePlan
- Query plan trace.
- TCPChannel
- The IBM eXtremeIO TCP/IP channel.
- XsByteBuffer
- WebSphere eXtreme Scale byte buffer trace.
Start stand-alone servers that use the ORB transport
Administering with the xscmd utility
Troubleshooting with High Performance Extensible Logging (HPEL)
HPEL is a log and trace facility that we can use in stand-alone and WebSphere Application Server environments. Use HPEL to store and access log, trace, System.err, and System.out information produced by the application server or applications. HPEL is an alternative to the basic log and trace facility, which provides the Java virtual machine (JVM) logs, diagnostic trace, and service log files. These files are commonly named SystemOut.log/SystemErr.log, trace.log and activity.log. HPEL provides a log data repository, a trace data repository, and a text log file.
Instead of the existing logging facility, we can use HPEL, which is disabled by default. In HPEL mode, the log and trace contents are written to a log data or trace data repository in a proprietary binary format. Therefore, disabling HPEL can improve server performance by providing faster log and trace handling capabilities. Enable HPEL with the server properties files for your container servers and catalog servers. After you enable HPEL, all WebSphere eXtreme Scale logging and the resulting log files are placed in the specified HPEL repository location.
- Set properties to enable HPEL logging. Edit the Server properties file for each container and catalog server with the properties to use.
- hpelEnable
- Specifies if High Performance Extensible Logging (HPEL) is enabled. HPEL logging is enabled when the property is set to true.
Default: false
- hpelRepositoryLocation
- Specifies the HPEL logging repository location.
Default: "." (the runtime location)
- hpelEnablePurgeBySize
- Indicates if the HPEL purges log files by size. We can set the size of the files with the hpelMaxRepositorySize property.
Default: true (enabled)
- hpelEnablePurgeByTime
- Indicates if the HPEL purges log files by time. Set the amount of time with the hpelMaxRetentionTime property.
Default: true (enabled)
- hpelEnableFileSwitch
- Indicates if the HPEL file is enabled to create a new file at a specified hour. Use the hpelFileSwitchHour property to specify the hour at which to create a new file.
Default: false (disabled)
- hpelEnableBuffering
- Indicates if the HPEL buffering is enabled.
Default: false (disabled)
- hpelIncludeTrace
- Indicates if the HPEL text files include tracing.
Default: false (disabled)
- hpelOutOfSpaceAction
- Indicates the action to be performed when the disk space has been exceeded.
Default:
PurgeOld
Possible values: PurgeOld, StopServer, StopLogging
- hpelOutputFormat
- Indicates the format of the log files to be generated.
Default:
Basic
Possible values: Basic, Advanced, CBE-1.0.1
- hpelMaxRepositorySize
- Indicates the maximum size of files, in megabytes. This value is used when you able the hpelEnablePurgeBySize property.
Default:
50
- hpelMaxRetentionTime
- Indicates the maximum retention time to hold files, in hours.
Default:
48
- hpelFileSwitchHour
- Indicates the hour at which to create a new file. This value is used when the hpelEnableFileSwitch property is enabled.
Default:
0
- Restart the servers on which you modified the server properties file to set HPEL properties. After HPEL is enabled and the server restarted, the previous WebSphere eXtreme Scale logging information is no longer available. The previous logging information is replaced by equivalent HPEL information.
- Use the HPEL command-line log viewer to view your log files....
cd /opt/IBM/WebSphere/eXtremeScale/ObjectGrid/bin
./logViewer.sh -helpCreate a legacy format log file, legacyFormat.log, that contains only log records INFO, WARNING, and SEVERE:
./logViewer.sh -outLog ../logs/legacyFormat.log -minLevel INFO -maxLevel SEVERE
Use a text editor to view the legacy format log file that you created.
View only the log records for thread 0:
./logViewer.sh -thread 0
View only WARNING messages:
./logViewer.sh -level WARNING
Retrieve all log records NOT from loggers that begin with com.ibm:
./logViewer.sh -excludeLoggers com.ibm.*
Extract a repository of just WARNING and SEVERE messages and save the resulting file in a new directory:
./logViewer.sh -minLevel WARNING -maxLevel SEVERE -extractToNewRepository ../logs/newHPELRepository
Export the contents of the resulting repository to a text format log file:
./logViewer.sh -repositoryDir ../logs/newHPELRepository -outLog ../logs/newFormat.log
Analyzing log and trace data
Use the log analysis tools to analyze how your runtime environment is performing and solve problems that occur in the environment.
We can generate reports from the existing log and trace files in the environment. These visual reports can be used for the following purposes:
- To analyze runtime environment status and performance:
- Deployment environment consistency
- Logging frequency
- Run topology versus configured topology
- Unplanned topology changes
- Quorum status
- Partition replication status
- Statistics of memory, throughput, processor usage, and so on
- To troubleshoot problems in the environment:
- Topology views at specific points in time
- Statistics of memory, throughput, processor usage during client failures
- Current fix pack levels, tuning settings
- Quorum status
Log analysis overview
Use the xsLogAnalyzer tool to help troubleshoot issues in the environment.
All failover messages
Displays the total number of failover messages as a chart over time. Also displays a list of the failover messages, including the servers that have been affected
All eXtreme Scale critical messages
Displays message IDs along with the associated explanations and user actions, which can save you the time from searching for messages.
All exceptions
Displays the top five exceptions, including the messages and how many times they occurred, and what servers were affected by the exception.
Topology summary
Displays a diagram of how your topology is configured according to the log files. Use this summary to compare to your actual configuration, possibly identifying configuration errors.
Topology consistency: Object Request Broker (ORB) comparison table
Displays ORB settings in the environment. Use this table to help determine if the settings are consistent across your environment.
Event timeline view
Displays a timeline diagram of different actions that have occurred on the data grid, including life cycle events, exceptions, critical messages, and first-failure data capture (FFDC) events.
Run log analysis
We can run the xsLogAnalyzer tool on a set of log and trace files from any computer.
- Enable logs and trace.
- Collect your log files.
The log files can be in various locations depending on how you configured them. If we are using the default log settings, we can get the log files from the following locations:
- In a stand-alone installation:
wxs_install_root/bin/logs/<server_name>
- In an installation that is integrated with WebSphere Application Server:
was_root/logs/<server_name>
- Collect your trace files.
The trace files can be in various locations depending on how you configured them. If we are using the default trace settings, we can get the trace files from the following locations:
- In a stand-alone installation: If no specific trace value is set, the trace files are written to the same location as the system out log files.
- In an installation that is integrated with WebSphere Application Server:
was_root/profiles/server_name/logs
Copy the log and trace files to the computer from which we are planning to use the log analyzer tool.
- To create custom scanners in your generated report, create a scanner specifications properties file and configuration file before we run the tool.
- Run the xsLogAnalyzer tool.
The script is in the following locations:
- In a stand-alone installation:
wxs_install_root/ObjectGrid/bin
- In an installation that is integrated with WebSphere Application Server:
was_root/bin
If your log files are large, consider using the -startTime, -endTime, and -maxRecords parameters when we run the report to restrict the number of log entries that are scanned. Using these parameters when we run the report makes the reports easier to read and run more effectively. We can run multiple reports on the same set of log files.
xsLogAnalyzer.sh -logsRoot c:\myxslogs -outDir c:\myxslogs\out -startTime 11.09.27_15.10.56.089 -endTime 11.09.27_16.10.56.089 -maxRecords 100
-logsRoot Absolute path to the log directory to evaluate (required). -outDir Existing directory to write the report output. If you do not specify a value, the report is written to the root location of the xsLogAnalyzer tool. -startTime Start time to evaluate in the logs. The date is in the following format: year.month.day_hour.minute.second.millisecond -endTime End time to evaluate in the logs. The date is in the following format: year.month.day_hour.minute.second.millisecond -trace Trace string, such as ObjectGrid*=all=enabled
-maxRecords Maximum number of records to generate in the report. The default is 100. If you specify 50, the first 50 records are generated for the specified time period.
- Open the generated files.
If you did not define an output directory, the reports are generated in a folder called report_date_time. To open the main page of the reports, open the index.html file.
- Use the reports to analyze the log data. Use the following tips to maximize the performance of the report displays:
- To maximize the performance of queries on the log data, use as specific information as possible. For example, a query for server takes much longer to run and returns more results than
server_host_name.
- Some views have a limited number of data points that are displayed at one time. We can adjust the segment of time that is being viewed by changing the current data, such as start and end time, in the view.
Create custom scanners for log analysis
We can create custom scanners for log analysis. After you configure the scanner, the results are generated in the reports when we run the xsLogAnalyzer tool. The custom scanner scans the logs for event records based on the regular expressions specified.
- Create a scanner specifications properties file that specifies the general expression to run for the custom scanner.
- Create and save a properties file.
The file must be in directory...
loganalyzer_root/config/custom
We can name the file as: you like. The file is used by the new scanner, so naming the scanner in the properties file is useful, for example:
my_new_server_scanner_spec.properties
- Include the following properties in the my_new_server_scanner_spec.properties file:
include.regular_expression = REGULAR_EXPRESSION_TO_SCAN
The REGULAR_EXPRESSION_TO_SCAN variable is a regular expression on which to filter the log files.
Example: To scan for instances of lines that contain both the "xception" and "rror" strings regardless of the order, set the property...
include.regular_expression
...to the following value:
include.regular_expression = (xception.+rror)|(rror.+xception)
This regular expression causes events to be recorded if the string "rror" comes before or after the "xception" string.
Example: To scan through each line in the logs for instances of lines that contain either the phrase "xception" or the phrase "rror" strings regardless of the order, set the include.regular_expression property to the following value:
include.regular_expression = (xception)|(rror)
This regular expression causes events to be recorded if the either the "rror" string or the "xception" string exist.
- Create a configuration file that the xsLogAnalyer tool uses to create the scanner.
- Create and save a configuration file.
The file must be in...
loganalyzer_root/config/custom
We can name the file as scanner_nameScanner.config, where scanner_name is a unique name for the new scanner. For example, you might name the file serverScanner.config
- Include the following properties in the scanner_nameScanner.config file:
scannerSpecificationFiles = loganalyzer_root/config/custom/my_new_scanner_spec.properties
We can also specify multiple scanner specification files by using a semi-colon separated list:
scannerSpecificationFiles = loganalyzer_root/config/custom/my_new_scanner_spec1.properties;loganalyzer_root/config/custom/my_new_scanner_spec2.properties
- Run the xsLogAnalyzer tool.
Results
After we run the xsLogAnalyzer tool, the report contains new tabs in the report for the custom scanners that you configured. Each tab contains the following views:
- Charts
- A plotted graph that illustrates recorded events. The events are displayed in the order in which the events were found.
- Tables
- A tabular representation of the recorded events.
- Summary reports
Troubleshooting log analysis
Use the following troubleshooting information to diagnose and fix problems with the xsLogAnalyzer tool and its generated reports.
- Problem: Out of memory conditions occur when we are using the xsLogAnalyzer tool to generate reports. An example of an error that might occur follows: java.lang.OutOfMemoryError: GC overhead limit exceeded.
Solution: The xsLogAnalyzer tool runs within a JVM. We can configure the JVM to increase the heap size before we run the xsLogAnalyzer tool by specifying some settings when we run the tool. Increasing the heap size enables more event records to be stored in JVM memory. Start with a setting of 2048M, assuming the operating system has enough main memory. On the same command-line instance in which we are planning to run the xsLogAnalyzer tool, set the maximum JVM heap size:
java -XmxHEAP_SIZEm
The HEAP_SIZE value can be any integer and represents the number of megabytes that are allocated to JVM heap. For example, you might run java -Xmx2048m. If the out of memory messages continue, or you do not have the resources to allocate 2048m or more of memory, limit the number of events that are being held in the heap. We can limit the number of events in the heap up by passing the -maxRecords parameter to the .xsLogAnalyzer command
- Problem: When you open a generated report from the xsLogAnalyzer tool, the browser hangs or does not load the page.
Cause: The generated HTML files are too large and cannot be loaded by the browser. These files are large because the scope of the log files that we are analyzing is too broad.
Solution: Consider using the -startTime, -endTime, and -maxRecords parameters when we run the xsLogAnalyzer tool to restrict the number of log entries that are scanned. Using these parameters when we run the report makes the reports easier to read and run more effectively. We can run multiple reports on the same set of log files.
Troubleshooting the product installation
Installation Manager is a common installer for many IBM software products that you use to install this version of WXS.
Results
Notes on logging and tracing:
- An easy way to view the logs is to open Installation Manager and go to File > View Log. An individual log file can be opened by selecting it in the table and then clicking the Open log file icon.
- Logs are located in the logs directory of Installation Manager's application data location. For example:
- WINDOWS: Administrative installation:
C:\Documents and Settings\All Users\Application Data\IBM\Installation Manager
- WINDOWS: Non-administrative installation:
C:\Documents and Settings\user_name\Application Data\IBM\Installation Manager
- UNIX/Linux: Administrative installation:
/var/IBM/InstallationManager
- UNIX/Linux: Non-administrative installation:
$HOME/var/ibm/InstallationManager
- The main log files are time-stamped XML files in the logs directory, and they can be viewed using any standard web browser.
- The log.properties file in the logs directory specifies the level of logging or tracing that Installation Manager uses. To turn on tracing for the WebSphere eXtreme Scale plug-ins, for example, create a log.properties file with the following content:
com.ibm.ws=DEBUG
com.ibm.cic.agent.core.Engine=DEBUG
global=DEBUGRestart Installation Manager as necessary, and Installation Manager outputs traces for theWebSphere eXtreme Scale plug-ins.
Notes on troubleshooting:
- UNIX/Linux: By default, some HP-UX systems are configured to not use DNS to resolve host names. This could result in Installation Manager not being able to connect to an external repository.
We can ping the repository, but nslookup does not return anything.
Work with your system administrator to configure your machine to use DNS, or use the IP address of the repository.
- In some cases, you might need to bypass existing checking mechanisms in Installation Manager.
- On some network file systems, disk space might not be reported correctly at times; and you might need to bypass disk-space checking and proceed with the installation.
To disable disk-space checking, specify the following system property in the config.ini file in IM_HOME/eclipse/configuration and restart Installation Manager:
cic.override.disk.space=sizeunit
where size is a positive integer and unit is blank for bytes, k for kilo, m for megabytes, or g for gigabytes. For example:
cic.override.disk.space=120 (120 bytes)
cic.override.disk.space=130k (130 kilobytes)
cic.override.disk.space=140m (140 megabytes)
cic.override.disk.space=150g (150 gigabytes)
cic.override.disk.space=trueInstallation Manager will report a disk-space size of Long.MAX_VALUE. Instead of displaying a very large amount of available disk space, N/A is displayed.
- To bypass operating-system prerequisite checking, add disableOSPrereqChecking=true to the config.ini file in IM_HOME/eclipse/configuration and restart Installation Manager.
If you need to use any of these bypass methods, contact IBM Support for assistance in developing a solution that does not involve bypassing the Installation Manager checking mechanisms.
- For more information on using Installation Manager, read the Installation Manager Version 1.5 Information Center .
Read the release notes to learn more about the latest version of Installation Manager. To access the release notes, complete the following task:
- WINDOWS: Click Start > Programs > Installation Manager > Release Notes.
- UNIX/Linux: Go to the documentation subdirectory in the directory where Installation Manager is installed, and open the readme.html file.
- If a fatal error occurs when you try to install the product:
- Make a backup copy of your current product installation directory in case IBM support needs to review it later.
- Use Installation Manager to uninstall everything installed under the product installation location (package group). You might run into errors, but they can be safely ignored.
- Delete everything that remains in the product installation directory.
- Use Installation Manager to reinstall the product to the same location or to a new one.
Note on version and history information: versionInfo.sh and historyInfo.sh return version and history information based on all of the installation, uninstallation, update, and rollback activities performed on the system.
JAVA: Troubleshooting client connectivity
There are several common problems specific to clients and client connectivity that we can solve as described in the following sections.
- Problem: If we are using the EntityManager API or byte array maps with the COPY_TO_BYTES copy mode, client data access methods result in various serialization-related exceptions or a NullPointerException exception.
The following error occurs when we are using the COPY_TO_BYTES copy mode:
java.lang.NullPointerException at com.ibm.ws.objectgrid.map.BaseMap$BaseMapObjectTransformer2.inflateObject(BaseMap.java:5278) at com.ibm.ws.objectgrid.map.BaseMap$BaseMapObjectTransformer.inflateValue(BaseMap.java:5155)
The following error occurs when we are using the EntityManager API:
java.lang.NullPointerException at com.ibm.ws.objectgrid.em.GraphTraversalHelper.fluffFetchMD(GraphTraversalHelper.java:323) at com.ibm.ws.objectgrid.em.GraphTraversalHelper.fluffFetchMD(GraphTraversalHelper.java:343) at com.ibm.ws.objectgrid.em.GraphTraversalHelper.getObjectGraph(GraphTraversalHelper.java:102) at com.ibm.ws.objectgrid.ServerCoreEventProcessor.getFromMap(ServerCoreEventProcessor.java:709) at com.ibm.ws.objectgrid.ServerCoreEventProcessor.processGetRequest(ServerCoreEventProcessor.java:323)Cause: The EntityManager API and COPY_TO_BYTES copy mode use a metadata repository that is embedded in the data grid. When clients connect, the data grid stores the repository identifiers in the client and caches the identifiers for the duration of the client connection. If you restart the data grid, you lose all metadata and the regenerated identifiers do not match the cached identifiers on the client.
Solution: If we are using the EntityManager API or the COPY_TO_BYTES copy mode, disconnect and reconnect all of the clients if the ObjectGrid is stopped and restarted. Disconnecting and reconnecting the clients refreshes the metadata identifier cache. We can disconnect clients using the ObjectGridManager.disconnect method or the ObjectGrid.destroy method.
- Problem: The client hangs during a getObjectGrid method call.
A client might seem to hang when calling the getObjectGrid method on the ObjectGridManager or throw an exception: com.ibm.websphere.projector.MetadataException. The EntityMetadata repository is not available and the timeout threshold is reached.
Cause: The reason is the client is waiting for the entity metadata on the ObjectGrid server to become available.
Solution: This error can occur when a container server has been started, but placement has not yet started. Take the following actions:
- Examine the deployment policy for the ObjectGrid and verify that the number of active containers is greater than or equal to both the numInitialContainers and minSyncReplicas attributes in the deployment policy descriptor file.
- Examine the setting for the placementDeferralInterval property in the container server server properties file to see how much time needs to pass before placement operations occur.
- If you used the xscmd -c suspendBalancing command to stop the balancing of shards for a specific data grid and map set, use the xscmd -c resumeBalancing to start balancing again.
Troubleshooting cache integration
Use this information to troubleshoot issues with the cache integration configuration, including HTTP session and dynamic cache configurations.
- Problem: HTTP session IDs are not being reused.
Cause: We can reuse session IDs. If you create a data grid for session persistence in Version 7.1.1 or later, session ID reuse is automatically enabled. However, if you created prior configurations, this setting might already be set with the wrong value.
Solution: Check the following settings to verify that you have HTTP session ID reuse enabled:
- The reuseSessionId property in the splicer.properties file must be set to true.
- The HttpSessionIdReuse custom property value must be set to true. This custom property might be set on one of the following paths in the WebSphere Application Server administrative console:
- Servers > server_name > Session management > Custom properties
- Dynamic clusters > dynamic_cluster_name > Server template > Session management > Custom properties
- Servers > Server Types > WebSphere application servers > server_name, and then, under Server Infrastructure, click Java and process management > Process definition > JVM > Custom properties
- Servers > Server Types > WebSphere application servers > server_name > Web container settings > Web container
If you update any custom property values, reconfigure eXtreme Scale session management so the splicer.properties file becomes aware of the change.
- Problem: When using a data grid to store HTTP sessions and the transaction load is high, a CWOBJ0006W message displays in the SystemOut.log file.
CWOBJ0006W: An exception occurred: com.ibm.websphere.objectgrid.ObjectGridRuntimeException: java.util.ConcurrentModificationExceptionThis message occurs only when the replicationInterval parameter in the splicer.properties file is set to a value greater than zero and the Web application modifies a List object that was set as an attribute on the HTTPSession.
Solution: Clone the attribute that contains the modified List object and put the cloned attribute into the session object.
- Problem: When running web applications with Servlet 3.0 spec, web application filters and listeners are not invoked by WebSphere eXtreme Scale session management. For example, listeners are not called back when sessions are invalidated using remote container eviction with WebSphere eXtreme Scale.
Cause: WebSphere eXtreme Scale does not identify filters and listeners defined using annotations or programmatically.
Solution: Filters and listeners must be explicitly declared in the web.xml file of the web application.
JAVA: Troubleshooting the JPA cache plug-in
Use this information to troubleshoot issues with your JPA cache plug-in configuration. These problems can occur in both Hibernate and OpenJPA configurations.
- Problem: The following exception displays: CacheException: Failed to get ObjectGrid server.
With either an
EMBEDDED or EMBEDDED_PARTITION ObjectGridType attribute value, the eXtreme Scale cache tries to obtain a server instance from the run time. In a Java Platform, Standard Edition environment, a WXS server with embedded catalog service is started. The embedded catalog service tries to listen to port 2809. If that port is being used by another process, the error occurs.
Solution: If external catalog service endpoints are specified, for example, with the objectGridServer.properties file, this error occurs if the host name or port is specified incorrectly. Correct the port conflict.
- Problem: The following exception displays: CacheException: Failed to get REMOTE ObjectGrid for configured REMOTE ObjectGrid. objectGridName = [ObjectGridName], PU name = [persistenceUnitName]
This error occurs because the cache cannot get the ObjectGrid instance from the provided catalog service end points.
Solution: This problem typically occurs because of an incorrect host name or port.
- Problem: The following exception displays: CacheException: Cannot have two PUs [persistenceUnitName_1, persistenceUnitName_2] configured with same ObjectGridName [ObjectGridName] of EMBEDDED ObjectGridType
This exception results if you have many persistence units configured and the eXtreme Scale caches of these units are configured with the same ObjectGrid name and
EMBEDDED ObjectGridType attribute value. These persistence unit configurations could be in the same or different persistence.xml files.
Solution: You must verify that the ObjectGrid name is unique for each persistence unit when the ObjectGridType attribute value is
EMBEDDED.
- Problem: The following exception displays: CacheException: REMOTE ObjectGrid [ObjectGridName] does not include required BackingMaps [mapName_1, mapName_2,...]
With a REMOTE ObjectGrid type, if the obtained client-side ObjectGrid does not have complete entity backing maps to support the persistence unit cache, this exception occurs. For example, five entity classes are listed in the persistence unit configuration, but the obtained ObjectGrid only has two BackingMaps. Even though the obtained ObjectGrid might have 10 BackingMaps, if any one of the five required entity BackingMaps are not found in the 10 backing maps, this exception still occurs.
Solution: Make sure that your backing map configuration supports the persistence unit cache.
10. Troubleshooting IBM eXtremeMemory
Use the following information to troubleshoot eXtremeMemory.
Problem: If the shared resource, libstdc++.so.5, is not installed, then when you start the container server, IBM eXtremeMemory native libraries do not load.
Linux: Symptom: On a Linux 64-bit operating system, if you try to start a container server with the enableXM server property set to true, and the libstdc++.so.5 shared resource is not installed, you get an error similar to the following example:
00000000 Initialization W CWOBJ0006W: An exception occurred: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:56) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:39) at java.lang.reflect.Constructor.newInstance(Constructor.java:527) at com.ibm.websphere.objectgrid.server.ServerFactory.initialize(ServerFactory.java:350) at com.ibm.websphere.objectgrid.server.ServerFactory$2.run(ServerFactory.java:303) at java.security.AccessController.doPrivileged(AccessController.java:202) at com.ibm.websphere.objectgrid.server.ServerFactory.getInstance(ServerFactory.java:301) at com.ibm.ws.objectgrid.InitializationService.main(InitializationService.java:302) Caused by: com.ibm.websphere.objectgrid.ObjectGridRuntimeException: java.lang.UnsatisfiedLinkError: OffheapMapdbg (Not found in java.library.path) at com.ibm.ws.objectgrid.ServerImpl.<init>(ServerImpl.java:1033) ... 9 more Caused by: java.lang.UnsatisfiedLinkError: OffheapMapdbg (Not found in java.library.path) at java.lang.ClassLoader.loadLibraryWithPath(ClassLoader.java:1011) at java.lang.ClassLoader.loadLibraryWithClassLoader(ClassLoader.java:975) at java.lang.System.loadLibrary(System.java:469) at com.ibm.ws.objectgrid.io.offheap.ObjectGridHashTableOH.initializeNative(ObjectGridHashTableOH.java:112) at com.ibm.ws.objectgrid.io.offheap.ObjectGridHashTableOH.<clinit>(ObjectGridHashTableOH.java:87) at java.lang.J9VMInternals.initializeImpl(Native Method) at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) at com.ibm.ws.objectgrid.ServerImpl.<init>(ServerImpl.java:1028) ... 9 moreCause: The shared resource
libstdc++.so.5 has not been installed.
Diagnosing the problem: To verify that the resource
libstdc++.so.5 is installed, issue the following command from the ObjectGrid/native directory of the installation:
ldd libOffheapMap.soIf you do not have the shared library installed, you get the following error:ldd libOffheapMap.so libstdc++.so.5 => not foundResolving the problem: Use the package installer of your 64-bit Linux distribution to install the required resource file. The package might be listed as
compat-libstdc++-33.x86_64 or libstdc++5. After installing the required resource, verify that the libstdc++5package is installed by issuing the following command from the ObjectGrid directory of the installation:
ldd libOffheapMap.so
11. Troubleshooting administration
Use the following information to troubleshoot administration, including starting and stopping servers, using the xscmd utility, and so on.
- Problem: Administration scripts are missing from the profile_root/bin directory of a WebSphere Application Server installation.
Cause: When you update the installation, new script files do not automatically get installed in the profiles.
Solution: To run a script from your profile_root/bin directory, unaugment and reaugment the profile with the latest release. For more information, see Unaugmenting a profile using the command prompt and Creating and augmenting profiles for WebSphere eXtreme Scale .
- Problem: When running a xscmd command, the following message is printed to the screen:
java.lang.IllegalStateException: Placement service MBean not available. [] at com.ibm.websphere.samples.objectgrid.admin.OGAdmin.main(OGAdmin.java:1449) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:267) Ending at: 2011-11-10 18:13:00.000000484Cause: A connection problem occurred with the catalog server.
Solution: Verify that the catalog servers are running and are available through the network. This message can also occur when you have a catalog service domain defined, but less than two catalog servers are running. The environment is not available until two catalog servers are started.
- Problem: When running a xscmd command, the following message is printed to the screen:
CWXSI0066E: Unmatched argument argument_name was detected.Cause: You entered a command format that the xscmd utility did not recognize.
Solution: Check the format of the command. You might encounter this issue when running regular expressions with the -c findbyKey command.
- Problem: All of start, stop, and xscmd commands fail with a java.lang.UnsupportedClassVersionError error.
For example, you might see one of the following errors when we are using the start, stop orxscmd utility commands:
The java class could not be loaded. java.lang.UnsupportedClassVersionError: (com/ibm/ws/xs/admin/wxscli/WXSAdminCLI) bad major version at offset=6The java class could not be loaded. java.lang.UnsupportedClassVersionError: (com/ibm/ws/objectgrid/server/impl/ProcessLauncher) bad major version at offset=6Cause: The commands are running with an unsupported Java version for the WebSphere eXtreme Scale.
Solution: Update the JAVA_HOME environment variable to point to a supported Java Development Kit (JDK) installation. For supported JDK versions and instructions on updating the JDK, see Java SE considerations .
12. Troubleshooting data monitoring
Use this information to troubleshoot monitoring activities that you complete with the WebSphere eXtreme Scale web console or other utilities to monitor the performance of the application environment.
Problem: We cannot switch between domains with different security settings in the WebSphere eXtreme Scale web console.
We can switch domains between two unsecure domains. We can also switch domains between two secure domains with the same security configured. However, we cannot switch between one unsecure and one secure domain or between two secure domains with different security settings.
Diagnosis: The startOgServer command is used to start two different catalog servers in separate domains. Each catalog server is unaware of the other. However, both catalog servers are started with the same domain name. When you do not specify the domain name, both catalog servers start in different domains with the default name, false Domain. In addition, the monitoring console displays data for only one of the catalog server domains.
Cause: When you switch domains in the monitoring console, we are connected to the second domain. However, no grid data from that domain is displayed, and the first domain grid data is still in view. Therefore, during run time, both catalog servers run in separate domains with the name, false Domain.
Solution: Determine which domain names are used when catalog servers start in the two domains. To identify the domain names, analyze your startOgServer command syntax and investigate what domain is being specified.
Since this problem scenario is not supported, complete the following actions to display the correct catalog service domain statistics:
- Shut down the catalog servers, and verify that they are configured to start with unique domain names.
- Restart your monitor console.
- Optional: If an outage is not possible, consider running a second monitoring console to monitor the second domain.
13. Troubleshooting multiple data center configurations
Use this information to troubleshoot multiple data center configurations, including linking between catalog service domains.
You must use the xscmd utility to troubleshoot your multiple data center configurations.
- Problem: You need to determine if data replication is synchronized across container servers and catalog service domains.
Solution: Run...
xscmd -c showReplicationState
...or...
xscmd.sh -c showDomainReplicationState
These commands display information about the status of replication in the environment.
- Problem: You need to check which catalog service domains are linked to your local catalog service domain.
Solution: Run...
xscmd -c showLinkedDomains
This command lists the foreign catalog service domains that are linking to the local catalog service domain.
- Problem: You want to detect any configuration problems with your primary shard links to catalog service domains, without going through the entire output of the xscmd -c showLinkedPrimaries command.
Solution: Use the -hc or the --linkHealthCheck option with this command. For example...
xscmd -c showLinkedPrimaries -hc
...or...
xscmd -c showLinkedPrimaries --linkHealthCheck
The command verifies that the primary shards have the appropriate number of catalog service domain links. The command lists any primary shards that have the wrong number of links. If they are all linked correctly (for example, your domain is linked to 1 other domain, then all of the individual primary shards are expected to have 1 link), you will get a message saying they are linked:
CWXSI0092I: All primary shards for {0} data grid and {1} map set have the correct number of links to foreign primary shards.If you discover problems, try some of the following possible solutions:
- Review your network and firewall settings to ensure that the servers that are hosting container servers in the domains can communicate with each other.
- Review the SystemOut and FFDC logs for the primary shards with the incorrect links for more specific error messages.
- Dismiss and re-establish the link between the domains.
- Problem: Data is missing in one or more catalog service domains. For example, you might run the xscmd -c establishLink command. When you look at the data for each linked catalog service domain, the data looks different, for example from the xscmd -c showMapSizes command.
Solution: We can troubleshoot this problem with the command...
xscmd -c showLinkedPrimaries
This command prints out each primary shard, and including which foreign primaries are linked.
In the described scenario, you might discover from running the xscmd -c showLinkedPrimaries command that the first catalog service domain primary shards are linked to the second catalog service domain primary shards, but the second catalog service domain does not have links to the first catalog service domain. You might consider rerunning the command...
xscmd -c establishLink
...from the second catalog service domain to the first catalog service domain.
Improve response time and data availability with WebSphere eXtreme Scale multi-master capability
com.ibm.websphere.objectgrid.openJPA package
com.ibm.websphere.objectgrid.hibernate.cache package
14. JAVA: Troubleshooting loaders
Use this information to troubleshoot issues with your database loaders.
- Problem: The loader is unable to communicate with the database. A LoaderNotAvailableException exception occurs.
Explanation: The loader plug-in can fail when it is unable to communicate to the database back end. This failure can happen if the database server or the network connection is down. The write-behind loader queues the updates and tries to push the data changes to the loader periodically. The loader must notify the ObjectGrid run time that there is a database connectivity problem by throwing a LoaderNotAvailableException exception.
Solution: The Loader implementation must be able to distinguish a data failure or a physical loader failure. Data failure should be thrown or rethrown as a LoaderException or an OptimisticCollisionException, but a physical loader failure must be thrown or rethrown as a LoaderNotAvailableException. ObjectGrid handles these two exceptions differently:
- If a LoaderException is caught by the write-behind loader, the write-behind loader considers the exception a failure, such as duplicate key failure. The write-behind loader unbatches the update, and tries the update one record at one time to isolate the data failure. If A {{LoaderException}}is caught again during the one record update, a failed update record is created and logged in the failed update map.
- If a LoaderNotAvailableException is caught by the write-behind loader, the write-behind loader considers it failed because it cannot connect to the database end, for example, the database back-end is down, a database connection is not available, or the network is down. The write-behind loader waits for 15 seconds and then try the batch update to the database again.
The common mistake is to throw a LoaderException while a LoaderNotAvailableException must be thrown. All the records queued in the write-behind loader become failed update records, which defeats the purpose of back-end failure isolation.
- Problem: When using an OpenJPA loader with DB2 in WebSphere Application Server, a closed cursor exception occurs.
The following exception is from DB2 in the org.apache.openjpa.persistence.PersistenceException log file:
[jcc][t4][10120][10898][3.57.82] Invalid operation: result set is closed.Solution: By default, the application server configures the resultSetHoldability custom property with a value of2 (CLOSE_CURSORS_AT_COMMIT). This property causes DB2 to close its resultSet/cursor at transaction boundaries. To remove the exception, change the value of the custom property to 1 (HOLD_CURSORS_OVER_COMMIT). Set the resultSetHoldability custom property on the following path in the WebSphere Application Server cell: Resources > JDBC provider > DB2 Universal JDBC Driver Provider > DataSources > data_source_name > Custom properties > New.
- Problem DB2 displays an exception: The current transaction has been rolled back because of a deadlock or timeout. Reason code "2".. SQLCODE=-911, SQLSTATE=40001, DRIVER=3.50.152
This exception occurs because of a lock contention problem when we are running with OpenJPA with DB2 in WebSphere Application Server. The default isolation level for WebSphere Application Server is Repeatable Read (RR), which obtains long-lived locks with DB2. Solution:
Set the isolation level to Read Committed to reduce the lock contention. Set the webSpherefalse IsolationLevel data source custom property to set the isolation level to 2(TRANSACTION_READ_COMMITTED) on the following path in the WebSphere Application Server cell: Resources > JDBC provider > JDBC_provider > Data sources > data_source_name > Custom properties > New. For more information about the webSpherefalse IsolationLevel custom property and transaction isolation levels, see Requirements for setting data access isolation levels .
- Problem: When using the preload function of the JPALoader or JPAEntityLoader, the following CWOBJ1511 message does not display for the partition in a container server: CWOBJ1511I: GRID_NAME:MAPSET_NAME:PARTITION_ID (primary) is open for business.
Instead, a TargetNotAvailableException exception occurs in the container server, which activates the partition specified by the preloadPartition property.
Solution: Set the preloadMode attribute to true if you use a JPALoader or JPAEntityLoader to preload data into the map. If the preloadPartition property of the JPALoader and JPAEntityLoader is set to a value between 0 and total_number_of_partitions - 1, then the JPALoader and JPAEntityLoader try to preload the data from backend database into the map. The following snippet of code illustrates how the preloadMode attribute is set to enable asynchronous preload:
BackingMap bm = og.defineMap( "map1" ); bm.setPreloadMode( true );We can also set the preloadMode attribute by using an XML file as illustrated in the following example:<backingMap name="map1" preloadMode="true" pluginCollectionRef="map1" lockStrategy="OPTIMISTIC" />
15. Troubleshooting XML configuration
When you configure eXtreme Scale, we can encounter unexpected behavior with your XML files. The following sections describe problems that can occur and solutions.
- Problem: Your deployment policy and ObjectGrid XML files must match.
The deployment policy and ObjectGrid XML files must match. If they do not have matching ObjectGrid names and map names, errors occur.
If the backingMap list in an ObjectGrid XML file does not match the map references list in a deployment policy XML file, an error occurs on the catalog server.
For example, the following ObjectGrid XML file and deployment policy XML file are used to start a container process. The deployment policy file has more map references than are listed in the ObjectGrid XML file.
ObjectGrid.xml - incorrect example <?xml version="1.0" encoding="UTF-8"?> <objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd" xmlns="http://ibm.com/ws/objectgrid/config"> <objectGrids> <objectGrid name="accounting"> <backingMap name="payroll" readOnly="false" /> </objectGrid> </objectGrids> </objectGridConfig>
deploymentPolicy.xml - incorrect example <?xml version="1.0" encoding="UTF-8"?> <deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd" xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy"> <objectgridDeployment objectgridName="accounting"> <mapSet name="mapSet1" numberOfPartitions="4" minSyncReplicas="1" maxSyncReplicas="2" maxAsyncReplicas="1"> <map ref="payroll"/> <map ref="ledger"/> </mapSet> </objectgridDeployment> </deploymentPolicy>
Messages: An error message occurs in the SystemOut.log file when the deployment policy is incompatible with the ObjectGrid XML file. For the preceding example, the following message occurs:
CWOBJ3179E: The map ledger reference in the mapSet mapSet1 of ObjectGrid accounting deployment descriptor file does not reference a valid backing map from the ObjectGrid XML.If the deployment policy is missing map references to backingMaps that are listed in the ObjectGrid XML file, an error message occurs in the SystemOut.log file. For example:CWOBJ3178E: The map ledger in ObjectGrid accounting referenced in the ObjectGrid XML was not found in the deployment descriptor file.Solution: Determine which file has the correct list and alter the relevant code accordingly.
- Problem: Incorrect ObjectGrid names between XML files also causes and error.
The name of the ObjectGrid is referenced in both the ObjectGrid XML file and the deployment policy XML file.
Message: An ObjectGridException occurs with a caused by exception of IncompatibleDeploymentPolicyException. An example follows.
Caused by: com.ibm.websphere.objectgrid.IncompatibleDeploymentPolicyException: The objectgridDeployment with objectGridName "accountin" does not have a corresponding objectGrid in the ObjectGrid XML.
The ObjectGrid XML file is the master list of ObjectGrid names. If a deployment policy has an ObjectGrid name that is not contained in the ObjectGrid XML file, an error occurs.
Solution: Verify details such as the spelling of the ObjectGrid name. Remove any extra names, or add missing ObjectGrid names, to the ObjectGrid XML or deployment policy XML files. In the example message, the objectGridName is misspelled as "accountin" instead of "accounting".
- Problem: Some of the attributes in the XML file can only be assigned certain values. These attributes have acceptable values enumerated by the schema. The following list provides some of the attributes:
- authorizationMechanism attribute on the objectGrid element
- copyMode attribute on the backingMap element
- lockStrategy attribute on the backingMap element
- ttlEvictorType attribute on the backingMap element
- type attribute on the property element
- initialState on the objectGrid element
- eviction Trigger on the backingMap element
If one of these attributes is assigned an invalid value, XML validation fails.
In the following example XML file, an incorrect value of INVALID_COPY_MODE is used:
INVALID_COPY_MODE example <?xml version="1.0" encoding="UTF-8"?> <objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd" xmlns="http://ibm.com/ws/objectgrid/config"> <objectGrids> <objectGrid name="accounting"> <backingMap name="payroll" copyMode="INVALID_COPY_MODE"/> <objectGrid/> </objectGrids> </objectGridConfig>
The following message appears in the log.
CWOBJ2403E: The XML file is invalid. A problem has been detected with < null > at line 5. The error message is cvc-enumeration-valid: Value 'INVALID_COPY_MODE' is not facet-valid with respect to enumeration '[COPY_ON_READ_AND_COMMIT, COPY_ON_READ, COPY_ON_WRITE, NO_COPY, COPY_TO_BYTES]'. It must be a value from the enumeration.
- Problem: Missing or incorrect attributes or tags in an XML file causes errors, such as the following example in which the ObjectGrid XML file is missing the closing < /objectGrid > tag:
missing attributes - example XML <?xml version="1.0" encoding="UTF-8"?> <objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd" xmlns="http://ibm.com/ws/objectgrid/config"> <objectGrids> <objectGrid name="accounting"> <backingMap name="payroll" /> </objectGrids> </objectGridConfig>
Message:
CWOBJ2403E: The XML file is invalid. A problem has been detected with < null > at line 7. The error message is The end-tag for element type "objectGrid" must end with a '>' delimiter.An ObjectGridException about the invalid XML file occurs with the name of the XML file.
Solution: Ensure that the necessary tags and attributes appear in your XML files with correct format.
- Problem: If an XML file is formatted with incorrect or missing syntax, the CWOBJ2403E appears in the log. For example, the following message is displayed when a quotation is missing on one of the XML attributes
CWOBJ2403E: The XML file is invalid. A problem has been detected with < null > at line 7. The error message is Open quote is expected for attribute "maxSyncReplicas" associated with an element type "mapSet".An ObjectGridException about the invalid XML file also occurs.
Solution: Various solutions can be used for a given XML syntax error. Consult relevant documentation about XML script writing.
- Problem: Referencing a nonexistent plug-in collection causes an XML file to be invalid. For example, when using XML to define BackingMap plug-ins, the pluginCollectionRef attribute of the backingMap element must reference a backingMapPluginCollection. The pluginCollectionRef attribute must match the backingMapPluginCollection elements.
Message:
If the pluginCollectionRef attribute does not match any ID attributes of any of the backingMapPluginConfiguration elements, the following message, or one that is similar, is displayed in the log.
[7/14/05 14:02:01:971 CDT] 686c060e XmlErrorHandl E CWOBJ9002E: This is an English only Error message: Invalid XML file. Line: 14; URI: null; Message: Key 'pluginCollectionRef' with value 'bookPlugins' not found for identity constraint of element 'objectGridConfig'.The following XML file is used to produce the error. Notice that the name of the BackingMap book has its pluginCollectionRef attribute set to bookPlugins, and the single backingMapPluginCollection has an ID of collection1.
referencing a non-existent attribute XML - example <?xml version="1.0" encoding="UTF-8"?> <objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ibm.com/ws/objectgrid/config../objectGrid.xsd" xmlns="http://ibm.com/ws/objectgrid/config"> <objectGrids> <objectGrid name="bookstore"> <backingMap name="book" pluginCollectionRef="bookPlugin" /> </objectGrid> </objectGrids> <backingMapPluginCollections> <backingMapPluginCollection id="collection1"> <bean id="Evictor" className="com.ibm.websphere.objectgrid.plugins.builtins.LRUEvictor" /> </backingMapPluginCollection> </backingMapPluginCollections> </objectGridConfig>
Solution:
To fix the problem, ensure that the value of each pluginCollectionRef matches the ID of one of the backingMapPluginCollection elements. Simply change the name of pluginCollectionRef to collection1 to not receive this error. Alternatively, change the ID of the existing backingMapPluginCollection to match the pluginCollectionRef, or add an additional backingMapPluginCollection with an ID that matches the pluginCollectionRef to correct the error.
- Problem: The IBM Software Development Kit (SDK) Version 5 contains an implementation of some Java API for XML Processing (JAXP) function to use for XML validation against a schema. When using an SDK that does not contain this implementation, attempts to validate might fail.
When you attempt to validate XML with an SDK that does not have the necessary implementation, the log contains the following error:
XmlConfigBuild XML validation is enabled SystemErr R com.ibm.websphere.objectgrid SystemErr R at com.ibm.ws.objectgrid.ObjectGridManagerImpl.getObjectGridConfigurations (ObjectGridManagerImpl.java:182) SystemErr R at com.ibm.ws.objectgrid.ObjectGridManagerImpl.createObjectGrid(ObjectGridManagerImpl.java:309) SystemErr R at com.ibm.ws.objectgrid.test.config.DocTest.main(DocTest.java:128) SystemErr R Caused by: java.lang.IllegalArgumentException: No attributes are implemented SystemErr R at org.apache.crimson.jaxp.DocumentBuilderFactoryImpl.setAttribute(DocumentBuilderFactoryImpl.java:93) SystemErr R at com.ibm.ws.objectgrid.config.XmlConfigBuilder.<init>XmlConfigBuilder.java:133) SystemErr R at com.ibm.websphere.objectgrid.ProcessConfigXML$2.runProcessConfigXML.java:99)...The SDK used does not contain an implementation of JAXP function that is necessary to validate XML files against a schema.
Solution: To validate XML by using an SDK that does not contain JAXP implementation, download Apache Xerces, and include its JAR files in the classpath. To avoid this problem, after you download Xerces and include the JAR files in the classpath, we can validate the XML file successfully.
Related reference:
ObjectGrid descriptor XML file
Deployment policy descriptor XML file
Entity metadata descriptor XML file
16. Troubleshooting deadlocks
The following sections describe some of the most common deadlock scenarios and suggestions on how to avoid them.
Implement exception handling in the application.
The following exception displays as a result:
com.ibm.websphere.objectgrid.plugins.LockDeadlockException: MessageThis message represents the string that is passed as a parameter when the exception is created and thrown.
- Problem: LockTimeoutException exception.
Description: When a transaction or client asks for a lock to be granted for a specific map entry, the request often waits for the current client to release the lock before the request is submitted. If the lock request remains idle for an extended period of time, and a lock is never granted, LockTimeoutException exception is created to prevent a deadlock, which is described in more detail in the following section. You are more likely to see this exception when using a pessimistic locking strategy, because the lock never releases until the transaction commits.
Retrieve more details:
The LockTimeoutException exception contains the getLockRequestQueueDetails method, which returns a string. Use this method to see a detailed description of the situation that triggers the exception. The following is an example of code that catches the exception, and displays an error message.
try { ...} catch (LockTimeoutException lte) { System.out.println(lte.getLockRequestQueueDetails());}The output result is:
lock request queue .>[TX:163C269E.0105.4000.E0D7.5B3B090A571D, state = Granted 5348 milli.seconds ago, mode = U] .>[TX:163C2734.0105.4000.E024.5B3B090A571D, state = Waiting for 5348 milli.seconds, mode = U] .>[TX:163C328C.0105.4000.E114.5B3B090A571D, state = Waiting for 1402 milli.seconds, mode = U]If you receive the exception in an ObjectGridException exception catch block, the following code determines the exception and displays the queue details. It also uses the findRootCause utility method.
try { ...} catch (ObjectGridException oe) { Throwable Root = findRootCause( oe ); if (Root instanceof LockTimeoutException) { LockTimeoutException lte = (LockTimeoutException)Root; System.out.println(lte.getLockRequestQueueDetails()); }}Solution: A LockTimeoutException exception prevents possible deadlocks in the application. An exception of this type results when the exception waits a set amount of time. We can set the amount of time that the exception waits using the setLockTimeout(int) method, which is available for the BackingMap. If a deadlock does not actually exist in the application, adjust the lock timeout to avoid the LockTimeoutException.
The following code shows how to create an ObjectGrid object, define a map, and set its LockTimeout value to 30 seconds:
ObjectGrid objGrid = new ObjectGrid(); BackingMap bMap = objGrid.defineMap("MapName"); bMap.setLockTimeout(30);Use the previous hardcoded example to set ObjectGrid and map properties. If you create ObjectGrid from an XML file, set the LockTimeout attribute within the backingMap element. The following is an example of a backingMap element that sets a map LockTimeout value to 30 seconds.
<backingMap name="MapName" lockStrategy="PESSIMISTIC" lockTimeout="30">
- Problem: Single key deadlocks.
Description: The following scenarios describe how deadlocks can occur when a single key is accessed using a S lock and later updated. When this happens from two transactions simultaneously, it results in a deadlock.
Table 1. Single key deadlocks scenario id="d1210285e99" Thread 1 Thread 2 1 session.begin() session.begin() Each thread establishes an independent transaction. 2 map.get(key1) map.get(key1) S lock granted to both transactions for key1. 3 map.update(Key1,v) No U lock. Update performed in transactional cache. 4 map.update(key1,v) No U lock. Update performed in the transactional cache 5 session.commit() Blocked: The S lock for key1 cannot be upgraded to an X lock because Thread 2 has an S lock. 6 session.commit() Deadlock: The S lock for key1 cannot be upgraded to an X lock because T1 has an S lock.
Table 2. Single key deadlocks, continued Thread 1 Thread 2 1 session.begin() session.begin() Each thread establishes an independent transaction. 2 map.get(key1) S lock granted for key1 3 map.getForUpdate(key1,v) S lock is upgraded to a U lock for key1. 4 map.get(key1) S lock granted for key1. 5 map.getForUpdate(key1,v) Blocked: T1 already has U lock. 6 session.commit() Deadlock: The U lock for key1 cannot be upgraded. 7 session.commit() Deadlock: The S lock for key1 cannot be upgraded. Single key deadlocks, continued
If the ObjectMap.getForUpdate is used to avoid the S lock, then the deadlock is avoided:
Thread 1 Thread 2 1 session.begin() session.begin() Each thread establish an independent transaction 2 map.get(key1) S lock granted for key1. 3 map.getForUpdate(key1,v) S lock is upgraded to a U lock for key1 4 map.get(key1) S lock is granted for key1. 5 map.getForUpdate(key1,v) Blocked: Thread 1 already has a U lock. 6 session.commit() Deadlock: The U lock for key1 cannot be upgraded to an X lock because Thread 2 has an S lock.
Solutions:
Table 4. Single key deadlocks, continued Thread 1 Thread 2 1 session.begin() session.begin() Each thread establishes an independent transaction. 2 map.getForUpdate(key1) U lock granted to thread 1 for key1. 3 map.getForUpdate(key1) U lock request is blocked. 4 map.update(key1,v) <blocked> 5 session.commit() <blocked> The U lock for key1 can be successfully upgraded to an X lock. 6 <released> The U lock is finally granted to key1 for thread 2. 7 map.update(key2,v) U lock granted to thread 2 for key2. 8 session.commit() The U lock for key1 can successfully be upgraded to an X lock.
- Use the getForUpdate method instead of get to acquire a U lock instead of an S lock.
- Use a transaction isolation level of read committed to avoid holding S locks. Reducing the transaction isolation level increases the possibility of non-repeatable reads. However, non-repeatable reads from one client are only possible if the transaction cache is explicitly invalidated by the same client.
- Use the optimistic lock strategy. Using the optimistic lock strategy requires handling optimistic collision exceptions.
- Problem: Ordered multiple key deadlock
Description: This scenario describes what happens if two transactions attempt to update the same entry directly and hold S locks to other entries.
Use the ObjectMap.getForUpdate method to avoid the S lock, then we can avoid the deadlock:
Table 5. Ordered multiple key deadlock scenario Thread 1 Thread 2 1 session.begin() session.begin() Each thread establishes an independent transaction. 2 map.get(key1) map.get(key1) S lock granted to both transactions for key1. 3 map.get(key2) map.get(key2) S lock granted to both transactions for key2. 4 map.update(key1,v) No U lock. Update performed in transactional cache. 5 map.update(key2,v) No U lock. Update performed in transactional cache. 6. session.commit() Blocked: The S lock for key 1 cannot be upgraded to an X lock because thread 2 has an S lock. 7 session.commit() Deadlock: The S lock for key 2 cannot be upgraded because thread 1 has an S lock.
Solutions:
Table 6. Ordered multiple key deadlock scenario, continued Thread 1 Thread 2 1 session.begin() session.begin() Each thread establishes an independent transaction. 2 map.getForUpdate(key1) U lock granted to transaction T1 for key1. 3 map.getForUpdate(key1) U lock request is blocked. 4 map.get(key2) <blocked> S lock granted for T1 for key2. 5 map.update(key1,v) <blocked> 6 session.commit() <blocked> The U lock for key1 can be successfully upgraded to an X lock. 7 <released> The U lock is finally granted to key1 for T2 8 map.get(key2) S lock granted to T2 for key2. 9 map.update(key2,v) U lock granted to T2 for key2. 10 session.commit() The U lock for key1 can be successfully upgraded to an X lock.
- Use the getForUpdate method instead of the get method to acquire a U lock directly for the first key. This strategy works only if the method order is deterministic.
- Use a transaction isolation level of read committed to avoid holding S locks. This solution is the easiest to implement if the method order is not deterministic. Reducing the transaction isolation level increases the possibility of non-repeatable reads. However, non-repeatable reads are only possible if the transaction cache is explicitly invalidated.
- Use the optimistic lock strategy. Using the optimistic lock strategy requires handling optimistic collision exceptions.
- Problem: Out of order with U lock
Description: If the order in which keys are requested cannot be guaranteed, then a deadlock can still occur.
Table 7. Out of order with U lock scenario Thread 1 Thread 2 1 session.begin() session.begin() Each thread establishes an independent transaction. 2 map.getforUpdate(key1) map.getForUpdate(key2) U locks successfully granted for key1 and key2. 3 map.get(key2) map.get(key1) S lock granted for key1 and key2. 4 map.update(key1,v) map.update(key2,v) 5 session.commit() The U lock cannot be upgraded to an X lock because T2 has an S lock. 6 session.commit() The U lock cannot be upgraded to an X lock because T1 has an S lock. Solutions:
- Wrap all work with a single global U lock (mutex). This method reduces concurrency, but handles all scenarios when access and order is non-deterministic.
- Use a transaction isolation level of read committed to avoid holding S locks. This solution is the easiest to implement if the method order is not deterministic and provides the greatest amount of concurrency. Reducing the transaction isolation level increases the possibility of non-repeatable reads. However, non-repeatable reads are only possible if the transaction cache is explicitly invalidated.
- Use the optimistic lock strategy. Using the optimistic lock strategy requires handling optimistic collision exceptions.
17. JAVA: Troubleshooting lock timeout exceptions for a multi-partition transaction
The scenario that is described is an example of a multi-partition transaction that is causing a lock timeout exception. Depending on the state of the transaction, the solutions illustrate how we can manually resolve this problem.
Implement exception handling in the application.
The following exception displays as a result:
Caused by: com.ibm.websphere.objectgrid.LockTimeoutException: Local-40000139-DEF8-05EA-E000-64A856931719 timed out waiting for lock mode S to be granted for map name: TS2_MapP, key: key12 granted = X lock request queue ->[WXS-40000139-DEF6-FA84-E000-1CB456931719, state = Granted, requested 73423 milli-seconds ago, marked to keep current mode false, snapshot mode 0, mode = X, thread name = xIOReplicationWorkerThreadPool : 29] ->[Local-40000139-DEF8-05EA-E000-64A856931719, state = Waiting for 5000 milli-seconds, marked to keep current mode false, snapshot mode 0, mode = S, thread name = xIOWorkerThreadPool : 28] dump of all locks for WXS-40000139-DEF6-FA84-E000-1CB456931719 Key: key12, map: TS2_MapP strongest currently granted mode for key is X ->[WXS-40000139-DEF6-FA84-E000-1CB456931719, state = Granted, requested 73423 milli-seconds ago, marked to keep current mode false, snapshot mode 0, mode = X, thread name = xIOReplicationWorkerThreadPool : 29] dump of all locks for Local-40000139-DEF8-05EA-E000-64A856931719This message represents the string that is passed as a parameter when the exception is created and thrown.Problem: You see a lock timeout exception and the holder of the lock is a multi-partition transaction, or, the log folder is increasing with log messages.
Diagnosis:
You will see a log messages repeatedly filling up your log folder such as the following:
00000099 TransactionLog I CWOBJ8705I: Automatic resolution of transaction WXS-40000139-DF01-216D-E002-1CB456931719 at RM:TestGrid:TestSet2:20 is still waiting for a decision. Another attempt to resolve the transaction will occur in 30 seconds.Determine what type of transaction is causing the lock. If the prefix on the transaction identifier is WXS-, then is indicates multi-partition transaction. If the prefix on the transaction identifier is Local-, then this indicates that the transaction is single partition transaction.Cause: The application is likely holding the lock because a commit or rollback did not occur.
Solution: Determine the state of the transaction and how long it was in that state. Use either the command utility xscmd -c listindoubts with option -d (for a detailed output) or use the transaction MBean.
JAVA: Resolving lock timeout exceptions
Using the xscmd -c listindoubt command, we can view the state of a transaction and determine a course of action.
JAVA: Troubleshooting lock timeout exceptions for a multi-partition transaction
Resolving lock timeout exceptions with the xscmd -c listindoubts command
Procedure
- Display the detailed list of transactions in your environment: xscmd -c listindoubt -d
- Take the appropriate actions to resolve the transaction. Problem: Transaction is marked as committed at TM but RMs are indoubt.
[1] WXS-40000139-DEF8-EF60-E002-1CB456931719 Timestamp Partition Role State Container Resync Attempts -------------------------------------------------------------------------------------- 2012-09-19 10:40:19.824 TestSet1:11 TM COMMIT MPTBasic2_C-0 Primary 0 2012-09-19 10:40:19.824 TestSet1:7 RM PREPARED MPTBasic0_C-1 Primary 0 2012-09-19 10:40:19.839 TestSet2:20 RM PREPARED MPTBasic2_C-0 Primary 0 2012-09-19 10:40:19.824 TestSet2:6 RM PREPARED MPTBasic0_C-1 Primary 0Solution: Commit the resource manager (RM) partitions and then forget the transaction.
- Issue the following command to commit the RM partition in transaction WXS-40000139-DEF8-EF60-E002-1CB456931719: xscmd -c listIndoubts -xid WXS-40000139-DEF8-EF60-E002-1CB456931719 -cm -rm
- Issue the following command to forget this transaction: xscmd -c listIndoubts -xid WXS-40000139-DEF8-EF60-E002-1CB456931719 -f
Problem: Transaction is indoubt at all partitions.
[1] WXS-40000139-DEF6-FA84-E000-1CB456931719 Timestamp Partition Role State Container Resync Attempts -------------------------------------------------------------------------------------- 2012-09-19 10:38:11.603 TestSet1:10 RM PREPARED MPTBasic2_C-0 Primary 0 2012-09-19 10:38:11.588 TestSet1:5 TM PREPARED MPTBasic2_C-0 Primary 0 2012-09-19 10:38:11.603 TestSet2:11 RM PREPARED MPTBasic2_C-0 Primary 0 2012-09-19 10:38:11.619 TestSet2:13 RM PREPARED MPTBasic2_C-0 Primary 0Solution: Roll back the TM partition first, and then roll back subsequent RM partitions. Then, forget the transaction.
- Issue the following command to roll back the TM partition in transaction WXS-40000139-DEF6-FA84-E000-1CB456931719: xscmd -c listIndoubts -xid WXS-40000139-DEF6-FA84-E000-1CB456931719 -r -tm
- Issue the following command to roll back the RM partitions in this transaction: xscmd -c listIndoubts -xid WXS-40000139-DEF6-FA84-E000-1CB456931719 -r -rm
- Issue the following command to forget this transaction: xscmd -c listIndoubts -xid WXS-40000139-DEF6-FA84-E000-1CB456931719 -f
Problem: Transaction is indoubt at all RM partitions, but transaction decision is unknown at TM.
[1] WXS-40000139-DEF8-EF31-E000-1CB456931719 Timestamp Partition Role State Container Resync Attempts ----------------------------------------------------------------------------------- 2012-09-19 10:40:19.777 TestSet1:11 RM PREPARED MPTBasic2_C-0 Primary 0 2012-09-19 10:40:19.792 TestSet2:5 RM PREPARED MPTBasic2_C-0 Primary 0 2012-09-19 10:40:19.777 TestSet2:6 RM PREPARED MPTBasic2_C-1 Primary 0Solution: Roll back the RM partitions.
- Issue the following command to roll back the RM partitions in transaction WXS-40000139-DEF8-EF31-E000-1CB456931719: xscmd -c listIndoubts -xid WXS-40000139-DEF8-EF31-E000-1CB456931719 -r
18. Troubleshooting security
Use this information to troubleshoot issues with your security configuration.
- Problem: The client end of the connection requires SSL, with the transportType setting set to SSL-Required. However, the server end of the connection does not support SSL, and has the transportType setting set to TCP/IP. As a result, the following exception gets chained to another exception in the log files:
java.net.ConnectException: connect: Address is invalid on local machine, or port is not valid on remote machine at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:389) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:250) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:237) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:385) at java.net.Socket.connect(Socket.java:540) at com.ibm.rmi.transport.TCPTransportConnection.createSocket(TCPTransportConnection.java:155) at com.ibm.rmi.transport.TCPTransportConnection.createSocket(TCPTransportConnection.java:167)The address in this exception could be a catalog server, container server, or client.Solution: See Configuring secure transport types for a table with the valid security configurations between clients and servers.
- When agent is used, the client sends the agent call to the server, and server sends the response back to the client to acknowledge the agent call. When the agent finishes processing, the server initiates a connection to send the agent results. This makes the container server a client from connect point of view. Therefore, if TLS or SSL is configured, make sure the client public certificate is imported in the server truststore.
- Problem: When users are authorized to access a WXS data grid, those users might also be authorized to perform management operations using the xscmd command or the stopOgServer command. Most data grid deployers restrict administrative access to only a subset of the users who can access grid data.
If you use the following command to access the data grid, you might also be authorized to perform administrative actions, such as listAllJMXAddresses:
./xscmd.sh -user <user> -password <password> <other_parameters>If this operation works for this user, then any xscmd operation might also be performed by the same user.Resolution: When eXtreme Scale components run with WebSphere Application Server, use the WebSphere Application Server administrative console to activate the security manager. Click Security > Global Security, and select the check boxes, Enable administrative security and Use Java 2 Security, to restrict application access to local resources.
Access to the management operations is controlled by the WebSphere Application Server security manager and is granted only to the users who belong to the WebSphere Administrator role. The xscmd command must be run from the WebSphere Application Server directory.
When eXtreme Scale components run in a stand-alone environment, additional steps are required to implement administrative security. Run the catalog servers and container servers using the Java security manager, which requires a policy file.
The policy file resembles the following example:
Remember: There are typically MapPermission entries as well, as documented in Java SE security tutorial - Step 5 .
grant codeBase "file:${objectgrid.home}/lib/*" { permission java.security.AllPermission;}; grant principal javax.security.auth.x500.X500Principal "CN=manager,O=acme,OU=OGSample" { permission javax.management.MBeanPermission "*", "getAttribute,setAttribute,invoke,queryNames";};In this case, only the manager principal is authorized to do administrative operations using the xscmd command. Other lines can be added as necessary to give additional principals MBean permissions. A different type of principal is needed if you use LDAP authentication.
Enter the following command:
UNIX/Linux:startOgServer.sh <arguments> -jvmargs -Djava.security.auth.login.config=jaas.config -Djava.security.manager -Djava.security.policy="auth.policy" -Dobjectgrid.home=$OBJECTGRID_HOME
UNIX/Linux:startXsServer.sh <arguments> -jvmargs -Djava.security.auth.login.config=jaas.config -Djava.security.manager -Djava.security.policy="auth.policy" -Dobjectgrid.home=$OBJECTGRID_HOMEWINDOWS:
startOGServer.bat <arguments> -jvmargs -Djava.security.auth.login.config=jaas.config -Djava.security.manager -Djava.security.policy="auth.policy" -Dobjectgrid.home=%OBJCTGRID_HOME%WINDOWS:startXsServer.bat <arguments> -jvmargs -Djava.security.auth.login.config=jaas.config -Djava.security.manager -Djava.security.policy="auth.policy" -Dobjectgrid.home=%OBJCTGRID_HOME%You specify -Djava.security.policy in this case, instead of -Djava.security.auth.policy.
19. Troubleshooting Liberty profile configurations
Use this information to troubleshoot commonly experienced problems with the Liberty profile.
To help you identify and resolve problems, the product has a unified logging component.
Details of known restrictions that apply when using the Liberty profile are provided in the following two topics in the WebSphere Application Server Information Center:
- Liberty profile: Runtime environment known restrictions
- Liberty profile: Developer Tools known restrictions
- Problem: You experience problems that are not readily explained.
Solution: Check that your Java development kits at Java Version 6 or later, and are at a current service level.
- Problem: The following security error is displayed when you attempt to access an application that redirects to an SSL port, and the SSL port is not available:
CWWKS9105E: Could not determine the SSL port for automatic redirectionSolution: The port might not be available because of a missing SSL configuration or some error in the SSL configuration definition. Check the SSL configuration in server.xml to make sure that it exists and is correct.
20. Collecting data with the IBM Support Assistant Data Collector
Run the IBM Support Assistant Data Collector to collect problem determination data from the WXS environment. By using this tool, we can reduce the amount of time it takes to reproduce a problem with the proper RAS tracing levels set, and reduce the effort required to send the appropriate log information to IBM Support.
Before we run the tool, have the following system configuration information ready to provide to the tool:
- File name for saving the collected data
- java_home directory
- wxs_home directory
- Working directory used by WebSphere eXtreme Scale
- Location of additional scripts files used to start servers
In previous releases of WXS, the IBM Support Assistant Lite tool was used for log gathering for problem determination. The IBM Support Assistant Lite tool continues to be shipped with the product in the wxs_home /isalite_wxs directory. IBM Support Assistant Data Collector is a more interactive tool that installs with Version 8.6 and later. IBM Support Assistant Data Collector improves ease of use of collecting data by remembering various inputs, reducing repetitive typing during console input. For more information, see IBM Support Assistant Data Collector .
- Start the tool. The tool runs in console mode by starting the launch script from the command line. The script for the tool is installed in the wxs_home /isalite_dc directory.
- WINDOWS: isadc.bat
- UNIX/Linux: isadc.sh
- Supply your system information to the tool. At each step, the choices are presented as numbered lists and you input the number of your selection and press the enter key. When input is required, prompts are displayed at which you enter your response and press the enter key. We can find collection details for each problem type in their corresponding MustGather documents. You also can provide the compressed file name and the directory location to which we want to save your bundled information.
- Stop the collector tool by typing the quit option in console mode.
Results
The following environment-related information is bundled in a compressed file that you named for saving the data:
- Gather log files
- Gather eXtreme Scale version information
- Gather Java version information
- Gather information about the wxs_home directory structure, including what files are currently stored in various directories. Actual files are not saved to the compressed file.
- Gather the scripts currently in bin directory.
What to do next
Contact IBM support and provide the compressed file that you generated with the IBM Support Assistant Data Collector.
21. IBM Support Assistant for WebSphere eXtreme Scale
Use the IBM Support Assistant to collect data, analyze symptoms, and access product information.
IBM Support Assistant Lite
IBM Support Assistant Lite for WebSphere eXtreme Scale provides automatic data collection and symptom analysis support for problem determination scenarios.
IBM Support Assistant Lite reduces the amount of time it takes to reproduce a problem with the proper Reliability, Availability, and Serviceability tracing levels set (trace levels are set automatically by the tool) to streamline problem determination. If you need further assistance, IBM Support Assistant Lite also reduces the effort required to send the appropriate log information to IBM Support.
IBM Support Assistant Lite is included in each installation of WXS Version 7.1.0
IBM Support Assistant
IBM Support Assistant (ISA) provides quick access to product, education, and support resources that can help you answer questions and resolve problems with IBM software products on your own, without needing to contact IBM Support. Different product-specific plug-ins let you customize IBM Support Assistant for the particular products you have installed. IBM Support Assistant can also collect system data, log files, and other information to help IBM Support determine the cause of a particular problem.
IBM Support Assistant is a utility to be installed on your workstation, not directly onto the WebSphere eXtreme Scale server system itself. The memory and resource requirements for the Assistant could negatively affect the performance of the WebSphere eXtreme Scale server system. The included portable diagnostic components are designed for minimal impact to the normal operation of a server.
Use IBM Support Assistant to help you in the following ways:
- To search through IBM and non-IBM knowledge and information sources across multiple IBM products to answer a question or solve a problem
- To find additional information through product-specific Web resources; including product and support home pages, customer news groups and forums, skills and training resources and information about troubleshooting and commonly asked questions
- To extend your ability to diagnose product-specific problems with targeted diagnostic tools available in the Support Assistant
- To simplify collection of diagnostic data to help you and IBM resolve your problems (collecting either general or product/symptom-specific data)
- To help in reporting of problem incidents to IBM Support through a customized online interface, including the ability to attach the diagnostic data referenced above or any other information to new or existing incidents
Finally, we can use the built-in Updater facility to obtain support for additional software products and capabilities as they become available. To set up IBM Support Assistant for use with WebSphere eXtreme Scale, first install IBM Support Assistant using the files provided in the downloaded image from the IBM Support Overview Web page at: http://www-947.ibm.com/support/entry/portal/Overview/Software/Other_Software/IBM_Support_Assistant . Next, use IBM Support Assistant to locate and install any product updates. We can also choose to install plug-ins available for other IBM software in your environment. More information and the latest version of the IBM Support Assistant are available from the IBM Support Assistant Web page at: http://www.ibm.com/software/support/isa/ .