Troubleshooting and support for BPM

  1. Overview
  2. Troubleshooting checklist
  3. Messages overview
  4. BPM log files
  5. Transaction log file
  6. Troubleshooting installation and configuration
  7. Troubleshooting the z/OS installation and configuration
  8. Troubleshooting migration
  9. Troubleshooting
  10. ClassCastException when stopping an application containing a microflow
  11. XPath query returns an unexpected value from an array
  12. An activity has stopped because of an unhandled fault (Message:
  13. A microflow is not compensated
  14. A long-running BPEL process appears to have stopped
  15. Invoke a synchronous subprocess in another EAR file fails
  16. Hung threads when a long-running process is invoked synchronously (Message:
  17. Late binding calls the wrong version of a subprocess
  18. Unexpected exception during execution (Message: CWWBA0010E)
  19. Event unknown (Message: CWWBE0037E)
  20. Cannot find nor create a process instance (Message: CWWBA0140E)
  21. The failed state of the process instance does not allow the requested sendMessage action to be performed (Message: CWWBE0126E)
  22. Uninitialized variable or NullPointerException in a Java snippet
  23. Standard fault exception "missingReply" (message: CWWBE0071E)
  24. A fault is not caught by the fault handler
  25. Parallel paths are sequentialized
  26. Copying a nested data object to another data object destroys the reference on the source object
  27. CScope is not available
  28. Event service does not start (message CEIDS0058E)
  29. Error when sending event (message CEIDS0060E)
  30. Error when sending event (ServiceUnavailableException)
  31. Error when sending event (NameNotFoundException)
  32. Error when sending event (message CEIEM0025E)
  33. Error when sending event (message CEIEM0034E)
  34. Event is not valid (message CEIEM0027E)
  35. Synchronization mode not supported (message CEIEM0015E)
  36. Transaction mode not supported (message CEIEM0016E)
  37. Error when querying events (message CEIDS0060E)
  38. Events not being stored in the persistent data store
  39. Events not being received by consumers (no error message)
  40. Events not being received by consumers (NameNotFoundException)
  41. Event group with extended data elements contains no events
  42. Error when querying an event group (message CEIES0048E)
  43. Event catalog pattern query fails on a Windows system
  44. About
  45. Snapshot support
  46. Preparing the operating system before a snapshot
  47. Taking an operating system snapshot
  48. NFS support
  49. Configure the NFS server
  50. Configure the NFS clients


Overview


What are the symptoms of the problem?


Where does the problem occur?


When does the problem occur?

Responding to these types of questions can provide you with a frame of reference in which to investigate the problem.


Under which conditions does the problem occur?


Can the problem be reproduced?


Troubleshooting checklist

  1. Is the configuration supported?

  2. Have you applied the latest fixes?

  3. What is the problem?

  4. Have any error messages been issued?

  5. For additional help in finding error and warning messages, interpreting messages, and configuring log files, see Diagnosing problems with message logs in the Websphere Application Server information center.

  6. Difficult problems can require the use of tracing, which exposes the low-level flow of control and interactions between components. For help in understanding and using traces, see Work with trace in the Websphere Application Server information center.

  7. If the checklist does not guide you to a resolution, you can collect additional diagnostic data. This data is necessary for IBM Support to effectively troubleshoot and assist you in resolving the problem. See Contacting IBM Software Support.



Messages overview

You can find the full text of runtime messages, their explanations, and the recommended recovery actions by searching for the message identifier in the Messages section of the BPM Reference documentation.

Runtime message identifiers consist of a four- or five-character message prefix, followed by a four- or five-character message number, followed by a single-letter message type code. For example, zzzzL1042C. The message type code describes the severity of the error message:

C Severe message.
E Urgent message.
I Informational message.
N Error message.
W Warning message.


IBM Business Process Manager forum on developerWorks

Messages: installation and profile creation


BPM log files


Runtime logs

By default, log, error, and trace information for processes and applications on a process server is written to the SystemOut.log, SystemErr.log, and trace.log files, which are stored in the directory...

Content from the following log files is now captured in the SystemOut.log file:

TW BPD Engine Errors generated as a result of process instance execution on the current server
TW Console Actions that occurred in the Process Admin console
TW Error Java exceptions
TW EventManager Historical information about Event Manager processing
TW Exp/Imp Process export and import transactions in Process Designer
TW Limit Process server limit overruns
TW JavaScript Logging associated with JavaScript log functions like log.info() or log.debug()
WS Inbound Calls to published web services
WS Outbound Data about web services consumed by processes
WS UCA Execution Errors generated by Undercover Agent (UCA) execution


Related concepts:

Profile-specific log files

Log and trace settings

Installation and profile creation log files


Transaction log file

The transaction (tranlog) log file stores critical transactional data that is written to databases. It is an internal file that WebSphere Application Server uses to manage in-flight transactions and attempt to recover them if the server locks up.

CAUTION:

Never delete the transaction log file from a production environment. Deleting this file removes information on in-flight transactions from BPM memory. Without the transaction log file, there is no functionality to recover transactional information. In addition, long-running processes remain in an inconsistent state and you cannot complete the process flow except by deleting running instances. Deleting running instances might cause you to lose operational or business-critical data, which makes the database inconsistent with the message destination. Other inconsistencies that may be caused by deleting the transaction log file includes the following:

Delete the transaction log from a development environment causes the same problems. Because you can re-create business processes, deleting the files from a test environment is not as damaging as deleting them from a production environment.



Troubleshooting installation and configuration

  1. Read any error messages from the installation process.

    See: Error messages: installation and profile creation and augmentation.

  2. If the installation of WebSphere Application Server Network Deployment was not successful, check Troubleshooting installation in the WAS Network Deployment information center .

  3. If the installation of WebSphere Application Server Feature Pack for Service Component Architecture (SCA) with the Service Data Objects (SDO) feature not successful, check Troubleshooting installation in the WAS Network Deployment information center .

  4. If the installation of WebSphere Feature Pack for Web Services was not successful (and installation of WAS Network Deployment was), check Troubleshooting web server plug-ins installation and removal in the WAS ND information center .

    If a problem occurs during an installation of WebSphere Feature Pack for Web Services as part of the BPM installation, the installation process will not continue and an error message will be displayed.

  5. If the installation of IBM Business Process Manager was not successful (and installation of WAS Network Deployment and WebSphere Feature Pack for Web Services were), check other BPM installation log files. See Installation and profile creation log files.

  6. If you have successfully created a server profile, use the Quick Start console or the command-line method to start the server.

  7. Verify the server starts and loads properly by looking for a running Java process and the Open for e-business message in SystemOut.log and SystemErr.log.

    If no Java process exists or if the message is not displayed, examine the same logs for any miscellaneous errors. Correct any errors and try again.

    You can find SystemOut.log and SystemErr.log in:

      profile_root/logs/servername

  8. Use the Quick Start console or the command-line method to stop the server, if it is running.

  9. To verify the ability of the web server to retrieve an application from BPM, see the step "Start the Snoop servlet to verify the ability of the web server to retrieve an application from the Application Server" in Troubleshooting installation in the WAS Network Deployment documentation.

  10. Start the administrative console.

  11. To resolve any IP address caching problems, see the step about resolving any IP address caching problems in Troubleshooting installation in the WAS Network Deployment documentation.


On the product support web site, you can review current information about resolutions to known problems, and you can read documents that can save you time gathering the information that you need to resolve a problem. Before opening a PMR, see the IBM Business Process Manager support page.


Messages and known issues during installation and profile creation

Some of the most commonly found error messages encountered when installing and configuring can be addressed with actions that resolve the underlying problems.

The following IBM Business Process Manager installation and configuration errors appear on Linux, UNIX, and Windows platforms.

For information about messages that might be generated by the installation of WebSphere Application Server Network Deployment, refer to the Business Process Management messages topic.

If you do not see an error message that resembles yours, or if the information provided does not solve your problem, contact IBM Business Process Manager support at IBM for further assistance.



Informational messages

The following messages are for information only and do not require any action:


Configuration errors were detected during the installation

If you are installing either the Advanced or Advanced Process Server editions of IBM Business Process Manager V8.5 with Installation Manager, and you selected to install only the client, you might get the following warning message: The packages are installed with warnings. View Log file

If you see this warning message after installing the client feature of Business Process Manager, either the Advanced or the Advanced Process Server edition, on a Windows system with just the client feature selected, examine the logs. If, in the logs, you see the following message, then you cannot use the Windows add or remove programs function to remove the client: 2464 WARNING 27:29.66 com.ibm.ws.exec.command.ExecCommand "C:\IBM\WebSphere\AppServer/util/bpm_configManagerLauncher.bat" cannot be found

To remove the client, use Installation Manager.


An earlier instance exists

When you perform a typical installation of IBM Business Process Manager after installing it previously on the same drive, you might see the following message:

Delete the indicated folder and click Re-Validate before you continue the installation.

When you perform a custom installation, you might see the following message:

Delete the indicated folder and run the installation again.


Supported IBM JDK was not found. The IBM JDK shipped with this product must be located at install_root/JDK. Correct this problem and try again.

If you use symbolic links to point to the IBM Java™ Development Kit (JDK) shipped with BPM, or to a JDK found in the PATH environment variable on your system, IBM SDK for Java validation might fail, resulting in a failed installation. This problem is caused by the way IBM SDK for Java validation code detects whether the JDK shipped with BPM is the current JDK used for installation.

To resolve this problem, do not use symbolic links in JVMs supplied with the installation image of IBM Business Process Manager and remove symbolic links from all JVMs that appear in your system's PATH environment variable.


Warning: Cannot convert string "<type_name>" to type FontStruct

If you install the web server plug-ins for WAS, you also install the ikeyman utility. The ikeyman utility is part of the Global Services Kit 7 (GSKit7).

If you issue the ikeyman.sh script on a Linux system, you might see the following message:

Warning: Cannot convert string    "-monotype-arial-regular-r-normal--*-140-*-*-p-*-iso8859-1"
   to type FontStruct

You can safely ignore the warning and use the ikeyman utility.

CWWBB0627E error during installation with DB2 Express on Linux systems

When you install BPM, IBM Business Monitor, or WebSphere Enterprise Service Bus with DB2 Express on a 32- or 64-bit Linux system, the server does not start successfully. Also, the SystemOut.log file includes CWWBB0627E: Failed to create the database schema.

The problem occurs if you used the Administration server (DAS) user name, which defaults to bpmadmin, instead of the Instance user name, which defaults to bpminst.

This problem can occur in the following situations:

If the problem occurs, the SystemOut.log file contains the following information:

[5/24/11 10:40:27:131 CDT] 00000000 CreateSchemaM E   CWWBB0627E: Failed to create the database schema.
[5/24/11 10:40:27:227 CDT] 00000000 ProcessEngine E
[5/24/11 10:40:27:274 CDT] 00000000 ManagerAdmin  I   TRAS0018I: The trace state has changed. The new trace state is *=info.
[5/24/11 10:40:27:499 CDT] 00000000 CreateSchemaM I   CWWBB0625I: Started creating the database schema.
[5/24/11 10:40:27:502 CDT] 00000000 CreateSchemaM I   CWWBB0658I: Schema qualifier is: 'BPEDB'.
[5/24/11 10:40:27:909 CDT] 00000000 CreateSchemaM I   CWWBB0614E: Database schema creation or migration step failure   CREATE TABLE BPEDB.PROCESS_TEMPLATE_B_T ( PTID CHAR(16) FOR BIT DATA NOT NULL , NAME VARCHAR(220) NOT NULL ,
  DEFINITION_NAME VARCHAR(220) , DISPLAY_NAME VARCHAR(64) , APPLICATION_NAME VARCHAR(220) , DISPLAY_ID INTEGER
  NOT NULL , DISPLAY_ID_EXT VARCHAR(32) , DESCRIPTION VARCHAR(254) , DOCUMENTATION CLOB(4096) , EXECUTION_MODE
  INTEGER NOT NULL , IS_SHARED SMALLINT NOT NULL , IS_AD_HOC SMALLINT NOT NULL , STATE INTEGER NOT NULL ,
  VALID_FROM TIMESTAMP NOT NULL , TARGET_NAMESPACE VARCHAR(250) , CREATED TIMESTAMP NOT NULL , AUTO_DELETE   SMALLINT NOT NULL , EXTENDED_AUTO_DELETE INTEGER NOT NULL , VERSION VARCHAR(32) , SCHEMA_VERSION INTEGER NOT   NULL , ABSTRACT_BASE_NAME VARCHAR(254) , S_BEAN_LOOKUP_NAME VARCHAR(254) , S_BEAN60_LOOKUP_NAME VARCHAR(254) ,
  E_BEAN_LOOKUP_NAME VARCHAR(254) , PROCESS_BASE_NAME VARCHAR(254) , S_BEAN_HOME_NAME VARCHAR(254) ,
  E_BEAN_HOME_NAME VARCHAR(254) , BPEWS_UTID CHAR(16) FOR BIT DATA , WPC_UTID CHAR(16) FOR BIT DATA , BPMN_UTID   CHAR(16) FOR BIT DATA , BUSINESS_RELEVANCE SMALLINT NOT NULL , ADMINISTRATOR_QTID CHAR(16) FOR BIT DATA ,
  READER_QTID CHAR(16) FOR BIT DATA , A_TKTID CHAR(16) FOR BIT DATA , A_TKTIDFOR_ACTS CHAR(16) FOR BIT DATA ,
  COMPENSATION_SPHERE INTEGER NOT NULL , AUTONOMY INTEGER NOT NULL , CAN_CALL SMALLINT NOT NULL , CAN_INITIATE   SMALLINT NOT NULL , CONTINUE_ON_ERROR SMALLINT NOT NULL , IGNORE_MISSING_DATA INTEGER NOT NULL , EAR_VERSION   INTEGER NOT NULL , LANGUAGE_TYPE INTEGER NOT NULL , DEPLOY_TYPE INTEGER NOT NULL , MESSAGE_DIGEST VARCHAR(20)
  FOR BIT DATA , CUSTOM_TEXT1 VARCHAR(64) , CUSTOM_TEXT2 VARCHAR(64) , CUSTOM_TEXT3 VARCHAR(64) , CUSTOM_TEXT4
  VARCHAR(64) , CUSTOM_TEXT5 VARCHAR(64) , CUSTOM_TEXT6 VARCHAR(64) , CUSTOM_TEXT7 VARCHAR(64) , CUSTOM_TEXT8
  VARCHAR(64) , PRIMARY KEY ( PTID ) ) IN BPETS8K: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error:
  SQLCODE=-204, SQLSTATE=42704, SQLERRMC=BPETS8K, DRIVER=3.61.65.
[5/24/11 10:40:27:912 CDT] 00000000 CreateSchemaM E   CWWBB0627E: Failed to create the database schema.
[5/24/11 10:40:27:912 CDT] 00000000 CreateSchemaM E   CWWBB0627E: Failed to create the database schema.
[5/24/11 10:40:27:948 CDT] 00000000 TraceBPE      E   

To solve the problem, you can either create a new profile, or you can use the usermod command to add the Administrative server (DAS) user name:


Installation and profile creation log files

Various log files are created during installation and uninstallation of IBM Business Process Manager and during profile creation, augmentation, and deletion. Consult the applicable logs if problems occur during these procedures.

Table 1 shows the log file names, locations, and descriptions for success and failure for BPM.

Some directory paths, file names, and indicator values in Table 1 contain spaces to allow the entries to fit in the table cells. The actual directory paths, file names, and indicator values do not contain spaces.

The variable install_root represents the installation directory of IBM Business Process Manager. The variable profile_root represents the root location of a profile.

For more information see Installation directories for the product and profiles.

Log name and location Log description
Agent data location/logs

Typically: C:\Documents and Settings\All Users\Application Data\IBM\Installation Manager\logs Agent data location/logs

Typically: /var/ibm/InstallationManager/logs

Installation Manager log file directory under the Agent data location. For more information on the Agent data location refer to the Installation Manager documentation.

Contains log information for Business Process Manager and WAS installations and uninstallations.

    install_root/logs/wbi/install/installconfig_server.log

Logs configuration actions that run at the end of the installation process to configure components, install system applications, and create Windows shortcuts and registry entries.

    install_root/logs/manageprofiles/pmt.log

Logs all events from PMT.

    install_root/logs/manageprofiles/profile_name_create.log

  • Traces all events that occur during the creation of the named profile.

  • Created when a profile is created during a typical installation, custom installation (only Advanced PS), when using PMT, or when using manageprofiles.sh.

    install_root/logs/manageprofiles/profile_name_augment.log

  • Traces all events that occur during the augmentation of the named profile.

  • Created when a profile is augmented, when using PMT, or when using manageprofiles.sh.

    install_root/logs/manageprofiles/profile_name_delete.log

  • Traces all events that occur during the deletion of the named profile.

  • Created when profile deletion is performed with manageprofiles.sh.

    install_root/logs/wbi/uninstall/uninstallconfig_server.log

Logs all configuration actions that run during uninstallation events relating to IBM Business Process Manager.



Launching Installation Manager directly on 64-bit systems

You can launch Installation Manager directly, for BPM, IBM Business Monitor, or IBM WebSphere Enterprise Service Bus, on 64-bit systems.

To launch Installation Manager directly, go to image_location/IM and do one of the following:

This will start Installation Manager with the correct products preselected.


Warnings about GTK or ulimit on Linux or UNIX when installing or migrating

On the Linux or UNIX operating system, when you are installing or migrating, you might see a warning about 32-bit GTK libraries or increasing your ulimit.


GTK warning

If you are on a 64-bit system, you might receive the following message:

If you see this message, your server does not have the 32-bit version of the GTK library installed, or the library is an incorrect version. You must update your server with the correct version of the 32-bit GTK library before you continue the installation. You can get the library from the DVD or official website of your operating system.


ulimit warning

If you receive the following warning message during the prerequisite checking, use the platform-specific steps below to increase the ulimit number.

The required value is calculated based on the version of WebSphere Application Server and the configuration that you are installing.

  1. Set the maximum number of open files using the following steps:

    1. Open /etc/security/limits.

    2. Edit or add the default section and include this line:

      nofiles = recommended_value

      Save and close the file.

    3. Log off and log in again.

    1. Open /etc/security/limits.conf.

    2. Locate the nofile parameter and increase the value. If a line containing the nofile parameter does not exist, add the following lines to the file:

      * hard nofile recommended_value

      * soft nofile recommended_value

      Save and close the file.

    3. Log off and log in again.

    1. Open /etc/system and add the following line to the end of the file:

      set rlim_fd_max=8800

      Save and close the file.

    2. Log off and log in again.

  2. Restart the computer.

  3. Restart the installer.


Troubleshooting problems creating database tables

While trying to create database tables for the Business Process Manager database, and the Performance Data Warehouse database, you might get errors or exceptions that prevent you from creating the tables. Your bootstrap operation also fails.

The errors might have occurred because the "NOT LOGGED" tag was removed from some of the tables. The message indicates that you have the wrong version of DB2:

DB21034E  The command was processed as an SQL statement because it was not a valid Command Line Processor command.  During SQL processing it returned:
SQL0355N  The column "RECORD", as defined, is too large to be logged.
SQLSTATE=42993

  1. Upgrade the database to the supported version (DB2 V9.7 fix pack 4 or above )
  2. Drop the existing Process Server and Performance Data Warehouse databases. This step is required because the databases are not complete. The error indicates that some of the tables are missing.

  3. Create the Process Server and Performance Data Warehouse databases again.

The tables for the databases are created successfully without any errors.


Troubleshooting Oracle transaction recovery messages

You must apply special grants for Oracle transaction recovery to work correctly. Servers that are configured to use an Oracle database might log the following errors in the SystemOut log file:

[4/19/12 13:44:50:062 EDT] 00000007 WSRdbXaResour E   DSRA0304E:  XAException occurred. XAException contents and details are: The cause is               : null.
[4/19/12 13:44:50:062 EDT] 00000007 WSRdbXaResour E   DSRA0302E:  XAException occurred.  Error code is: XAER_RMERR (-3).  Exception is: <null>

If there is a system failure, or the server was not stopped properly during a distributed transaction, the WAS transaction manager attempts to clean up any failed transactions that are found in the transaction logs. The Oracle database requires that you have special permissions for transaction recovery. The previous error occurs when a user that attempts to run the recover method does not have sufficient privileges.

To resolve these issues, runs as user SYS:

where user_name is the user name for the Oracle user that is configured during deployment environment creation.

Repeat the previous steps for each Oracle user defined during deployment environment creation.



Error running bootstrap command or creating profiles with SQL Server databases

If you are using Microsoft SQL Server databases, and you create the BPMDB or PDWDB databases with a case-sensitive collation attribute, then when you use bootstrapProcessServerData to load the databases with configuration data, you will get an error. If you are using PMT or manageProfiles create the profile, the profile will be created with partial success, and the profile log file will indicate failure running bootstrapProcessServer.ant.

When you create SQL Server databases for Process Server (BPMDB) and Performance Data Warehouse (PDWDB), they must not be case-sensitive. The COLLATE attribute value must indicate case-insensitivity by using CI rather than CS:

osql -b -S hostname -U db_user_account -P db_user_password -Q "CREATE DATABASE database_name COLLATE SQL_Latin1_General_CP1_CI_AS"
If you get log errors and partial profile creation success, or failure when using bootstrapProcessServerData, check the creation commands for the databases. If they have COLLATE attribute values such as SQL_Latin1_General_CP1_CS_AS, change the CS (indicates case-sensitive) to CI (not case-sensitive), as in SQL_Latin1_General_CP1_CI_AS.


Problems testing a connection to a data source in a network deployment

In a network deployment environment, testing a connection to the cell-level jdbc/WPSDB data source can fail, with the error message UndefinedVariableException: Undefined Variable variable_name, where variable_name is a variable name such as WAS_INSTALL_ROOT, DB2_JCC_DRIVER_PATH, UNIVERSAL_JDBC_DRIVER_PATH or PUREQUERY_PATH. However, this does not necessarily indicate that you will have run time errors.

The test connection service provided by WebSphere Application Server often works only if the variables that contain path infomation (such as WAS_INSTALL_ROOT and ORACLE_DRIVER_INSTALL_PATH) are set on cell scope, which is not the case in BPM network deployments. See Test connection service .

If you attempt to test the data source connection, for example in the administrative console using Resources > JDBC > Data sources, and you get a message saying the test connection operation failed with the exception com.ibm.wsspi.runtime.variable.UndefinedVariableException: Undefined Variable variable_name, it does not necessarily indicate there will be a problem accessing the data source at run time. Ensure the location of your JDBC driver files is accessible to every client that must use the data source, and configure the variable with the full path of that location. Disregard the test connection error unless you are also experiencing trouble connecting to the data store at run time.


Troubleshooting memory issues for Business Process Manager servers

If you are using IBM Business Process Manager with a 32-bit operating system or JVM, you might notice issues with server performance and availability, including OutOfMemory exceptions. These happen when the operating system or JVM does not have enough accessible memory to support all the running processes.

If the server performance is severely degraded or if you see an OutOfMemory exception, consider switching to a 64-bit operating system or JVM. Otherwise, use the information in the following table to identify possible solutions for specific memory issues.

Observable memory issues and suggested actions

Observed behavior Possible actions
Severely degraded server performance (swapping occurs)

  • Ensure there is enough physical memory to support IBM Business Process Manager and all other running processes. Use the system memory analysis tool for your platform ( vmstat for Linix or UNIX platforms, or Task Manager for Windows platforms).

    Note that some 32-bit operating systems can only use a specific amount of physical memory.

  • Stop any running processes that are not critical to Business Process Manager ( extra browser windows or other installed applications).

  • Establish a remote DB2 connection and use it instead of the local DB2 instance. Stop all local DB2 processes on the local machine.

  • If you are authoring processes, use a remote unit test environment and Process Center instead of running them on your local machine.

Severely degraded server performance (no swapping) combined with excessive garbage collection activity

  • Increase the maximum Java heap size. The heap size needs to be large enough to contain all of the required Java objects, but small enough to prevent bottlenecks from large amounts of native memory.

  • Tune the JVM heap.

For more information on tuning the heap size and garbage collection settings, see Tuning the IBM virtual machine for Java and Java virtual machine settings at the end of this topic.

An OutOfMemory exception

  • If the exception is immediately preceded by frequent garbage collection activity that does not free space in the Java heap, use the information in the previous row to increase the maximum Java heap size, tune the JVM heap, or both.

  • If the exception is not immediately preceded by garbage collection activity, it can indicate the native memory for a resource has been exhausted. Decrease the maximum Java heap size to allow more room for native memory in the JVM address space.

  • If the exception occurs when you try to download Process Designer from the Process Center console.

    1. Exit the Process Center console.

    2. Use the WebSphere administrative console to set a custom property for the web container:

      1. Click Servers > Server Types > WebSphere application servers > serverName > Web container > Custom properties.

      2. Add the com.ibm.ws.webcontainer.channelwritetype property and set its value to synch.

        Save changes and exit the administrative console.

    3. Restart the Process Center console and resume downloading Process Designer.

Memory issues; for example, the instance occupies much of the available memory

Reduce the branch context cache size. Reducing the cache size from 64 to 16 makes a huge difference in the amount of memory the instance uses, although you are trading off some performance for the improvement in memory use.

To change the cache size, locate the 00static.xml file in PROFILE_HOME\config\cells\cell_name\nodes\node_name\servers\server_name\process-server\config/system/ In it, you can edit the value for branch-context-max-cache-size.


Technote: How to resolve memory issues for BPM V7.5 servers

Troubleshooting native memory issues

MustGather: Out of Memory errors with WAS on AIX, Linux, or Windows

Performance degredation MustGather documents IBM Business Process Manager (BPM) Express V8.5 and V8.5.0 fix pack 1 (8.5.0.1)


Connection factories are not properly configured

When use the default configuration for BPM (BPM) Express V8.5 and V8.5.0 Fix Pack 1, you see errors and experience problems when you use Process Designer. You might also experience issues when you try to connect to the BPM service integration bus (SIBus).

In Process Designer, this issue is displayed in the following error message in the ae.log file:

javax.jms.JMSException: CWSIA0241E: An exception was received during the call to the method JmsManagedConnectionFactoryImpl.createConnection: com.ibm.websphere.sib.exception.SIResourceException: CWSIT0006E: It was not possible to contact any of the specified bootstrap servers. Please see the linked exception for further details. Bootstrap connections were attempted to: [localhost:7286:BootstrapSecureMessaging].at com.ibm.ws.sib.api.jms.impl.  JmsManagedConnectionFactoryImpl.createConnection
...
at com.lombardisoftware.client.ae.server.
    ServerNodeImpl.reconnectJMS

To resolve this issue, apply Fix Pack 1 to the BPM Express V8.5 or V8.5.0 environment and complete the following steps:

  1. Locate the host name for the Process Center server that Process Designer can access.

  2. Locate the port that is needed to connect securely to the SIBus.

    1. In the administrative console, go to the server settings page and click Server > Server Types > WebSphere application servers > server_name. The server_name variable refers to the name of your server.

    2. On the right side of the window, expand Communications > Ports and connect to the SIB_ENDPOINT_SECURE_ADDRESS port.

  3. Go to the connection factory settings by clicking Resources > JMS > Topic connection factories > TWClientConnectionFactory.

  4. In the Connection section of the connection factory settings, find Provider Endpoints and enter the appropriate value. The value is in the host_name:port_number:BootstrapSecureMessaging format. The host_name variable is the externally accessible host name for the BPM server and the port_number variable is the SIB_ENDPOINT_SECURE_ADDRESS.

    Because BPM Express allows only a single-server environment, there is only one entry. For BPM environments that use clustering, the list is a comma-separated list of all the endpoints for the messaging cluster.

    The TWClientConnectionFactory endpoint is the main endpoint that receives outside connections into an BPMExpress environment. However, you might need to complete the steps for the following product connection factories that also do not have an accessible endpoint. Depending on the type, these endpoints are in the same administrative console section under Topic connection factories, Queue connection factories, or Connection factories.

    • QueueConnectionFactory
    • TopicConnectionFactory
    • cacheMessageConnectionFactory
    • eventMgrMessageConnectionFactory
    • TWClientConnectionFactory
    • TWClientConnectionFactoryNoTX
    • TaskChangeConnectionFactory
    • bpm.pal.service.deployCF
    • PortalWebMessagingTopicConnectionFactory
    • DataDefLoaderConnectionFactory
    • ViewManagerConnectionFactory
    • PostLoadCalculationConnectionFactory
    • RepresentationManagerConnectionFactory
    • bpm.BPDDocMig.service.deployCF

    Save the configuration.

  5. Restart the server.

If you have further SIBus bootstrap issues, verify the host and port are accessible to the client. Also, verify that firewall and network issues do not exist.


Troubleshooting the launchpad application or Quick Start

If the launchpad application or Quick Start does not start, try the following troubleshooting tips.


Troubleshooting the launchpad application

Restart the launchpad after you make any changes.


Troubleshooting Quick Start

If links from the Quick Start console fail to open browser windows, or the Quick Start console fails to launch, or immediately quits, on the Microsoft Windows operating system when Mozilla Firefox is set as the default browser, try the following workarounds.



Quick Start console fails to start on the Windows operating system

To run Quick Start on Windows 7, Windows Vista, or Windows Server 2008, use the administrator privilege. This is required for both administrative and non-administrative users.

For example:

  1. Right-click the Quick Start program shortcut at Start > Programs > IBM > Business Process Manager Advanced 8.5 (or Business Process Manager Advanced Process Server 8.5) > Deployment Environments > <profile_name> > <env_name> > - Quick Start.

  2. Click Run as administrator.



Diagnosing a failing Ant configuration script

Determine whether a product installation problem on an operating system is caused by a failing Apache Ant configuration script.

Start diagnosing installation problems by looking at the troubleshooting procedure. See Troubleshooting installation and configuration. After the installation completes successfully, several Ant scripts configure the product. The following procedure describes what to do when an Ant script fails. When the installation log does not indicate a failure, determine how to correct any problems with failing Ant configuration scripts.

The installation_root/logs/wbi/install/installconfig_server.log file, when present, describes any failure of an Ant script. Determine if any of the following configuration scripts failed. If so, use the configuration script recovery procedures. Use the investigative action to manually verify the following configuration scripts ran successfully during the configuration of the BPM product. If any script failed, use the recovery action steps to complete the function of the script.

To diagnose failed Ant configuration scripts.

After you correct any installation errors and any Ant script configuration errors by performing the corrective actions in this procedure, the installation is complete.


Start the Quick Start console.



DB2 log file error: SQL1092N "USERID does not have the authority to perform the requested command or operation"

After you install IBM DB2 Express, if use the domain user ID to create a new database, and tables, you might see an error in the DB2 log files. Follow these steps to enable the domain user ID to access the database.

You cannot use the domain user ID to run the db2cmd command to create a new database, and tables. If you do, you might see this error in the DB2 log files:

DB2 cannot look up the domain user ID "USERID" as an authorization ID. It ignores the local group for the domain user ID. Even if you add the domain user ID to the local DB2ADMNS group, DB2 does not have the authority to perform database operations.


Resolve the problem

To enable the domain user ID to access the database.

  1. Add the domain user ID to the local group DB2ADMNS.

  2. Open the DB2 command window and run the following commands from the prompt:
    db2set DB2_GRP_LOOKUP=LOCAL,TOKENLOCAL       
    db2 update dbm cfg using sysadm_group DB2ADMNS db2stop                                       
    db2start

  3. Restart the DB2 Windows services with the login ID set to the domain user ID.

For additional information, refer to the following DB2 information center document: Considerations for Windows 2008 and Windows Vista or higher: User Access Control feature.


Profile creation fails on Windows operating system

The Profile Management Tool or manageprofiles command on IBM Business Process Manager might fail on the Microsoft Windows operating system if it is launched without administrative privilege.

To install or run PMT on Windows 7, Windows Vista, or Windows Server 2008, elevate your Microsoft Windows user account privileges. If you do not use the correct privileges the profile creation will fail and you will receive an SQL1092N message in the log which indicates the user does not have authority to perform the requested actions. Whether you are an administrative user or a non-administrative user, right-click the pmt.bat file and select Run as administrator.

Alternatively, use the runas command at the command line. For example, the following command can run be executed from the install_root\bin\ProfileManagement directory:

Non-administrative users are prompted for the administrator password.

To install or run the manageprofiles command on Windows 7, Windows Vista, or Windows Server 2008, elevate your Microsoft Windows user account privileges using the runas command. Remember to put double quotation marks around the manageprofiles command and all parameters. For example, the following command can run be executed from the install_root\bin directory:

Non-administrative users are prompted for the administrator password.


Recovering from profile creation failure after using BPMConfig

If the profile creation step fails after running theBPMConfig command, you need to delete the profiles and drop the databases that you created.

To resolve the problem.

  1. For each profile that you attempted to create, run the manageprofiles command to delete the profiles. For example, manageprofiles -delete -profileName profile_name.
  2. Delete the profile folders.

  3. From the BPM_HOME/bin, folder, run manageProfiles -validateAndUpdateRegistry
  4. Drop the databases if you have already created them.


Successful installation reported after profile creation failure

If the profile creation step fails during a custom installation, the failure is not recognized by Installation Manager, which reports a successful installation. This problem occurs only on Windows platforms.

This problem applies to IBM Business Process Manager - Process Server, IBM Business Monitor, or WebSphere Enterprise Service Bus.

The failure of the profile creation step is indicated in the profile creation log. For example, you might see the following error: INSTCONFFAILED: Profile augmentation failed.

To resolve the problem.

  1. Check the profile creation log, which is located in the <install_root>/logs/manageprofiles directory, to determine the cause of the error.

  2. Complete one of the following steps:

    • Delete the profile that contains the error. Then, use the manageprofiles command or PMT to create a new profile.
    • Install the product with a custom installation, but do not create a profile. Then, after installation, use the manageprofiles command or PMT to create the profile.


Recovering from profile creation or augmentation failure

The Profile Management Tool can experience failures when creating new or when augmenting existing profiles. The same can occur using manageprofiles.sh. If such a failure occurs, first check the log files as described in this topic, then follow the recovery instructions described, depending on the situation.


Log files

All manageprofiles log files are in install_root/logs/manageprofiles. Look at the following log files in the order given. Each log file must contain the entry "INSTCONFSUCCESS." If a file does not include this entry, a failure was detected. Look at the log files to determine why a failure was encountered and to determine a remedy.

  1. The log file profile_name_create_error.log (where profile_name is the name of the profile).

    Look at this file only if you were creating a new profile, not augmenting an existing one.

      install_root/logs/manageprofiles/profile_name_create_error.log

    Search for the text Configuration action succeeded or Configuration action failed.

    There can be multiple occurrences of Configuration action failed. Investigate and remedy each one. Also review the log files described in the following options, if the profile was created.

    Additional information is available in the manageprofiles directory in the pmt.log, which logs all events that occur when a default profile is created during a complete installation using PMT.

  2. Log file profile_name_augment_error.log (where profile_name is the name of the profile).

    This log file is located in the following directories:

      install_root/logs/manageprofiles/profile_name_augment_error.log

    Search for the text Configuration action succeeded or Configuration action failed.

    There can be multiple occurrences of Configuration action failed. Investigate and remedy each one. Also review the log files described in the following options, if the profile was created.

    To know the status of a profile you created during installation, runs:

      install_root/bin/logProfileErrors.sh

  3. Individual profile template action log files.

    If you discovered false values in the log files described in the preceding options, review the log files in the following directories:

      install_root/logs/manageprofiles/profile_name

    These log files do not follow a consistent naming convention, but typically, each is the name of the Apache Ant script that failed followed by .log. For example, suppose the following entry is in the profile_name_augment.log file:

    <messages>Result of executing E:\o0536.15\profileTemplates\default.wbicore\actions\saveParamsWbiCore.ant
    was:false</messages>

    First look at the surrounding entries in the profile_name_augment.log file in the install_root/logs/manageprofiles directory. If you cannot determine the cause of the failure from the surrounding entries, look for the corresponding log file for any failing Ant script entries. In this case, the log file created by the saveParamsWbiCore.ant script is saveParamsWbiCore.ant.log. Look at that file to investigate why the failure occurred.


Recovery for creation failure

After you determine why profile creation failed and have addressed the cause of the failure, you can try to create the profile again.

When you create a profile, it first creates a WebSphere Application Server profile, and then augments it with BPM profile templates to create the BPM profile. Even if you encountered a profile creation failure, a profile can exist that does not have all the needed augmentations.

To determine if the profile exists, run the install_root/bin/manageprofiles -listProfiles command. If the profile name you used for creation does not exist, you can recreate the profile. If the profile name you used for creation exists, then the profile was created and you have encountered an augmentation failure. For tips on recovering from an augmentation failure, see Recovery for augmentation failure.


Recovery for augmentation failure

For recovery from a profile augmentation failure follow the steps below:



Cluster member startup timeout errors reported in deployment manager log

If the deployment environment startup takes longer than the default timeout setting, you will see an exception in the deployment manager log. Provided the cluster members eventually start, you can ignore the exception.

Depending on their performance levels, some cluster members might not start within the timeout limits for the connector. If this happens, the following message appears in the deployment manager log:

[timestamp] 0000005a Cluster       E   WWLM0058E: Cluster member PSDELucia.WebApp.linux-tcisNode01.0 did not start properly. 
   javax.management.JMRuntimeException: ADMN0034E: The service is unable to obtain a valid administrative client to connect process    "linux-tcisNode01" from process "dmgr", because of exception: com.ibm.websphere.management.exception.ConnectorException:
   java.net.SocketTimeoutException: Async operation timed out  at com.ibm.ws.management.AdminServiceImpl$1.run(AdminServiceImpl.java:1370)
 at com.ibm.ws.security.util.AccessController.doPrivileged(AccessController.java:118)
 at com.ibm.ws.management.AdminServiceImpl.invoke(AdminServiceImpl.java:1228)
 at com.ibm.ws.management.wlm.Cluster.launchMember(Cluster.java:2160)
 at com.ibm.ws.management.wlm.Cluster$MemberStateChange.run(Cluster.java:2964)
 at java.lang.Thread.run(Thread.java:769)
If you see this message, check the cluster member log to confirm that it is eventually starting. If it is, you can ignore the exception.


Reinstallation cannot create new profile when using the Typical installation and configuration option

If you try to reinstall BPM to the same location using the Typical installation and configuration option, or if you try to reinstall after a failed uninstall, the installation might fail because a new profile cannot be created.

If databases were created for the test environment, the databases must be dropped before you can create a new profile.

If the databases are not automatically dropped during uninstall, you must drop them manually.



Troubleshooting the Business Process Choreographer configuration

Use this topic to solve problems relating to the configuration of Business Process Choreographer and its Business Flow Manager, or Human Task Manager components.

The purpose of this section is to aid you in understanding why the configuration of Business Flow Manager or Human Task Manager is not working as expected and to help you resolve the problem. The following tasks focus on problem determination and finding solutions to problems that might occur during configuration.


Business Process Choreographer log files

Where to find the log files for your Business Process Choreographer configuration.


Profile creation

The profile actions for Business Process Choreographer write to the bpcaugment.log file in the logs directory of the profile tool. You can find more detailed traces in the bpcaugment.wsadmin.log file in the same directory:


Administrative scripts

The administrative scripts in the admin subdirectory of the ProcessChoreographer directory do not write their own log files. All of the Business Process Choreographer scripts that are run using wsadmin are logged in the application server log files and in the wsadmin.traceout file in the logs directory of the profile tool. However, because this file is overwritten each time that wsadmin is invoked, make sure that you either use one of the -tracefile or -appendtrace options, or save the log file before invoking wsadmin again.


Use the wsadmin scripting tool


Troubleshooting the Business Process Choreographer database, and data source

Use this task to solve problems with the Business Process Choreographer database, and data source.

Both Business Flow Manager and Human Task Manager need a database. Without the database, enterprise applications that contain BPEL processes and human tasks will not work.



REST API: The URL is not configured correctly

The REST API must be configured correctly, otherwise you get an error when you try to use the graphical process widget in the Business Process Choreographer Explorer, Business Process Archive Explorer, or Business Space.


Reason

This can have the following causes:


Resolution

To correct this problem:



Connection timeout when running a wsadmin script

Run a wsadmin administrative script that calls an MBean in a network deployment environment results in several connections between server processes, such as the deployment manager, the node agent, and the cluster member using Java Management Extensions (JMX) connectors. Some Business Process Choreographer administrative scripts can run for longer than the default connection timeout specified for the connector used. This behavior results in messages similar to the following example: WASX7017E: Exception received while running file "/install_root/ProcessChoreographer/admin/script_name"; exception information: javax.management.JMRuntimeException: ADMN0034E: The service is unable to obtain a valid administrative client to connect process "nodeagent" from process "dmgr", because of exception: com.ibm.websphere.management.exception.ConnectorException: ADMC0009E: The system failed to make the SOAP RPC call: invoke.


Reason

Some Business Process Choreographer administrative scripts can perform database operations on large numbers of database objects. This means they can run for longer than the default connection timeout. When a wsadmin script is run with a connection to the server, the following connection timeouts can happen:


Resolution

You have the following options:


Java Management Extensions connector properties


6.0.x Business Process Choreographer API client fails in a V8.5 environment

You did not migrate your 6.0.x Business Process Choreographer API client when you upgraded to IBM Business Process Manager Advanced Version V8.5. When you try to run your client in the V8.5 environment, the client fails.


Symptom

Exceptions similar to the following example are written to the SystemOut.log file:

[9/6/07 21:05:27:093 PDT] 00000045 ExceptionUtil E CNTR0020E: EJB threw an unexpected (non-declared) exception during invocation of method "processMessage" on bean "BeanId(validateDataApp#validateDataEJB.jar#component.validateItem, null)".
Exception data: javax.ejb.AccessLocalException: ;
nested exception is: com.ibm.websphere.csi.CSIAccessException:
SECJ0053E: Authorization failed for /UNAUTHENTICATED while invoking (Home)com/ibm/bpe/api/BusinessFlowManagerHome create:4
securityName: /UNAUTHENTICATED;accessID: UNAUTHENTICATED is not granted any of the required roles: BPEAPIUser com.ibm.websphere.csi.CSIAccessException: SECJ0053E: Authorization failed for /UNAUTHENTICATED while invoking (Home)com/ibm/bpe/api/BusinessFlowManagerHome
create:4 securityName: /UNAUTHENTICATED;accessID: UNAUTHENTICATED is not granted any of the required roles: BPEAPIUser at com.ibm.ws.security.core.SecurityCollaborator.performAuthorization(SecurityCollaborator.java:484)
at com.ibm.ws.security.core.EJSSecurityCollaborator.preInvoke(EJSSecurityCollaborator.java:218)
at com.ibm.ejs.container.EJSContainer.preInvokeForStatelessSessionCreate(EJSContainer.java:3646)
at com.ibm.ejs.container.EJSContainer.preInvoke(EJSContainer.java:2868)
at com.ibm.bpe.api.EJSLocalStatelessGenericBusinessFlowManagerEJBHome_a412961d.create(Unknown Source)


Reason

If you have written a client that uses Business Process Choreographer APIs without first authenticating the user, you should modify the client to perform a login before using the APIs. After migration, the Java™ EE roles BPEAPIUser and TaskAPIUser are set to the value Everyone, which maintains compatibility with earlier versions by maintaining the 6.0.x behavior of not requiring a login when application security is enabled. For new installations these roles default to the value AllAuthenticated. The use of Everyone to map Java EE roles BPEAPIUser and TaskAPIUser is deprecated.


Resolution

Modify your API client to force the user to log on to the client before they use the APIs.

As a temporary workaround, you can change the mappings for the BPEAPIUser and the TaskAPIUser roles. To change the mapping:

  1. In the administrative console, click Applications > Enterprise Applications > BPEContainer_suffix, and under Detail Properties click Security role to user/group mapping
  2. Change the BPEAPIUser role from AllAuthenticated to Everyone, and click OK.
  3. Repeat step 2 for the TaskContainer_suffix and the TaskAPIUser role.

  4. After you have modified your client, change these roles back to AllAuthenticated to prevent unauthenticated users accessing the APIs.



Resolve a DB2 process load issue

You can encounter unexpected process load issues when running DB2

The following error message displays when DB2 encounters a process load issue:

CWLLG2068E: An unexpected exception occurred during an attempt to generate the next primary key.  
Error: org.springframework.jdbc.UncategorizedSQLException: PreparedStatementCallback; uncategorized SQLException for SQL [update lsw_pri_key set high_key = ? where table_id = ?]; SQL state [57011]; error code [-964]; DB2 SQL Error: SQLCODE=-964, SQLSTATE=57011, SQLERRMC=null, DRIVER=3.61.65; nested exception is com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-964, SQLSTATE=57011, SQLERRMC=null, DRIVER=3.61.65
Business Process Manager databases are maintained by an administrator. If your Business Process Manager databases are maintained by a database administrator, gather the server log files and FFDC entries, and then consult the database administrator. For IBM DB2, the database administrator must adjust the settings for LOGFILSIZ, LOGPRIMARY and LOGSECOND based on the environment.

BPMDB and PDWDB are the default database names. If you used different database names during the installation, substitute the database name.

To increase the LOGFILSIZ, use the following command, where xxxx is the new value for the LOGFILSIZ: db2 update database config for BPMDB using LOGFILSIZ xxxx

Increasing the number of primary and secondary logs is done with a similar command, where yy and zz are the new values:

You must update these settings for both the BPMDB and the PDWDB.

Increasing the LOGPRIMARY value also increases the disk requirements for the log files because the primary log files are preallocated during the very first connection to the database. Each log file has a size that is equal to LOGFILSIZ. You can use the database system monitor to help size the primary log files. See the IBM DB2 documentation for more information about these values and how to monitor them. The correct value for these parameters is specific to the environment. When you increase these values, increase them in small increments from their current settings until the problem is resolved.


Uninstall multiple server and tooling products causes errors

Uninstall multiple products ( IBM Business Process Manager and IBM Integration Designer) at the same time might cause Installation Manager errors or warnings.


Types of errors and warnings

The types of errors or warnings you receive vary, depending on the combination of products that you attempt to uninstall:


Resolve the problem

To resolve the problem, uninstall each server product individually.

You can also manually delete the remaining installation folders after the installation failure occurs.


Install a snapshot fails when single sign-on has been configured

Install a snapshot from IBM Process Center to Process Server can fail if single sign-on has been configured.

If you receive a message similar to the following one in the Process Center systemOut.log file after installing a snapshot then it is an indication that your single sign-on configuration needs to be updated.

8/29/12 2:35:14:235 EDT] 00001a91 HttpMethodBas W org.apache.commons.httpclient.HttpMethodBase processCookieHeaders Cookie rejected: "$Version=0; LtpaToken=Mj0K0SVfNfOK6r6+Oy6sDNAeIw0IKn5ghkYNA8KFUZuTy2SuI3bgE/EmquYoznVI3SakYJ9J3QfyqLR80/b9k46hioz/qBRGZgh7ZpMv7GE5DCKjuSkHganqoZKvIBseI222h6zDC8Ea0jelJWAc7IQEqH0Pgpg5hJOmdt258llWxuSL9scuz+leejgDnSJE3kzThjSMvBlxYbk6J7DF8OiTHguxSwlNS25Deud11mN3MI1L1O7vFx3FzEQ6PLdKi+4d8HYQ48755KjzNO1f4Q1/MywQWRCvXQszugmid/5batRcpgS998Hhe44OuibLeElViY+jsYQ31u/qpIB9s8yni7fx8c/k; $Path=/; $Domain=.ibm.com". Domain attribute ".ibm.com" violates RFC 2109: host minus domain may not contain any dots [8/29/12 2:35:14:237 EDT] 00001a91 AbstractSuppo I CWLLG0154I: v1 of the clone Remote login succeeded. [8/29/12 2:35:14:248 EDT] 00001a91 AbstractSuppo I CWLLG0714I: v1 of the clone Sending export. [8/29/12 2:35:14:950 EDT] 00001a91 AbstractSuppo E CWLLG0155E: v1 of the clone Installation failed. [8/29/12 2:35:15:005 EDT] 00001a91 GovernanceSer E com.lombardisoftware.server.ejb.governance.GovernanceEventServiceCore deploySnapshotFromGovernance CWLLG3512E: Governance service 'Install Snapshot' failed due the following error: 'CWLLG0155E:The install failed. Check server logs on APSEN01 for more information.'

To correct the problem, fully qualify the domain name in your single sign-on configuration on WAS. For example, instead of using mycorporation.com as the domain, use rtp.raleigh.mycorporation.com.

If Lightweight Third-Party Authentication (LTPA) is being used, you may also want to re-import the LTPA keys into WebSphere Application Server.


Install a snapshot fails after message confirms installation

Install a snapshot on IBM Process Center has a message that confirms the snapshot is installed and then moments later issues a message claiming the installation failed.

You may see the following sequence of messages in Process Center when you are installing a snapshot. First you see Installation is progressing which is followed by Currently installed. After these messages comes another message Installation terminated with an exception.

This sequence can happen when you are using IBM Business Process Manager Advanced. During the installation of a snapshot on a process server, the content of the snapshot is first imported into the process server and then the advanced artifacts, such as SCA modules and libraries, are deployed to the process server if they are present. If this advanced deployment of a snapshot fails, you receive message CWLLG2163E, which indicates the process application snapshot content is imported on the process server but advanced content failed to install. You can attempt to install the advanced content of the same snapshot again from the Process Admin Console by activating the snapshot. successful installation of BPM on Windows Server 2012 or Windows 8


Quick Start shortcut missing from Start Programs menu after successful installation of BPM on Windows Server 2012 or Windows 8

After the successful installation of IBM Business Process Manager on Microsoft Windows Server 2012 or Windows 8, when you click Start -> Programs -> IBM -> Business Process Manager 8.5, the Quick Start shortcut might be missing from the displayed menu. Other user-specific shortcuts, such as the profile shortcuts, might be missing also.

This is a known issue with Windows Server 2012 and Windows 8. A fix is currently not available.

To resolve the problem, switch to the All Applications view in the Metro UI, where the shortcuts are available. On the Metro UI screen, right-click, then click All Applications.


Troubleshooting the z/OS installation and configuration

The installation and augmentation jobs or scripts log records in the following ways:

Make sure that you have installed and configured WebSphere Application Server for z/OS successfully. See the installation troubleshooting information in the WebSphere Application Server Information Center if you are having trouble installing and configuring WebSphere Application Server for z/OS.

For current information available from the IBM Support Portal on known problems and their resolution, see the IBM Business Process Manager Advanced Support page.



Correcting file ownership or permission problems with IBM Installation Manager

If you accidentally start an Installation Manager instance from the wrong user ID, some files might end up with ownerships that prevent normal use of the Installation Manager.

To correct this problem, log on to a super user or other privileged user ID and reset the file ownership and permissions for the Installation Manager binary files and agent data. For example:

chown IMADMIN:IMGROUP /InstallationManager/bin

chmod 775 /InstallationManager/bin

and

chown IMADMIN:IMGROUP /InstallationManager/appdata

chmod 775 /InstallationManager/appdata

If the users of a group-mode Installation Manager do not have umask set to allow group-write permission on created files, you might also have to set the permissions and owners when switching from one user ID to another. You might also need to set permissions and owners for the product files that you install with the Installation Manager to ensure that maintenance can be performed from other user IDs in the group.



IBM Business Process Manager configuration errors

If you experience a problem with one of the configuration tasks then there will be three main sources of information about the problem.

  1. The error messages issued by the task
  2. Error messages in the WebSphere deployment manager or application server job logs. If you are federating a node you might also find messages in the node agent job logs
  3. Log files in the UNIX file system

Wherever possible, the cause and solution to each problem is also documented with the symptoms. The problems described here were experienced when starting the server after completing the installation procedure for BPM. In the examples of error messages, the messages have been made easier to read by changing the places where line breaks occur. Therefore, if you see these errors in your system the messages will have a slightly different layout.



Failure in loading T2 native library db2jcct2zos

This error can occur when using the DB2 Universal Driver connector, and WebSphere Application Server cannot load some external DB2 modules from SDSNLOAD or SDSNLOD2.


Error message: BBOO0220E:

error message: BBOO0220E:

CWSIS0002E: The messaging engine encountered an exception while starting.
Exception: com.ibm.ws.sib.msgstore.PersistenceException:
CWSIS1501E: The data source has produced an unexpected exception:
java.sql.SQLException: Failure in loading T2 native library db2jcct2zos, reason: java.lang.UnsatisfiedLinkError:
/pp/db2v8/UK14852/jcc/lib/libdb2jcct2zos.so:
EDC5157I An internal error has occurred. (errno2=0x0BDF03B2)DSRA0010E:
SQL State = null, Error Code = -99,999DSRA0010E: SQL State = null, Error Code = -99,999
com.ibm.ws.sib.utils.ras.SibMessage
com.ibm.ws.sib.utils.ras.SibMessage

There are a number of possible causes of a failure to load libdb2jcct2zos.so. A common error is the absence of the DB2 libraries from the STEPLIB of the WAS server processes. A failure like this can also be a symptom of a larger problem such as the DB2 Universal Driver not being been fully configured in the DB2 system you are accessing.

Check that all the steps for installing the DB2 Universal Driver have been performed for your DB2 system.

The installation instructions for the DB2 Universal Driver can be found in the DB2 Information Center.



DataSource has a null RelationalResourceAdapter property

The error shown in the example in this topic is caused by a redundant data source that is left behind after running the augmentation job. You can safely delete this data source by using the WebSphere administrative console. Be careful not to delete the JDBC provider that has a very similar name.

error message: BBOO0222I: DSRA8208I: JDBC driver type : 2
com.ibm.ws.exception.RuntimeWarning:
com.ibm.ws.runtime.component.binder.ResourceBindingException:
invalid configuration passed to resource binding logic. REASON: Invalid Configuration!
The DataSource: DB2 Universal JDBC Driver DataSource has a null RelationalResourceAdapter property.

To remove the redundant data source:

  1. Log in to the WebSphere administrative console and navigate to Resources > JDBC Providers.

  2. Set the scope to Server and click Apply.

  3. Click the JDBC provider called DB2 Universal JDBC Driver Provider.

  4. Click the link to Datasources on the right.
  5. You should see a list of three data sources. Select the check box next to DB2 Universal JDBC Driver Datasource and click the Delete button.

    Save your configuration changes and restart the WebSphere server.



SQLCODE = -471

This error can occur when the Universal Driver has not been properly configured in a DB2 system.

In the SYSIBM.SYSROUTINES table, the WLM_ENVIRONMENT for SYSIBM.SYSTABLES has a WLM name that does not match that being used in the stored procedure address space JCL.

The installation instructions for the DB2 Universal Driver can be found in the DB2 Information Center .

ExtendedMessage: BBOO0220E:

CWSIS0002E: The messaging engine encountered an exception while starting.
Exception: com.ibm.ws.sib.msgstore.PersistenceException:
CWSIS1501E: The data source has produced an unexpected exception:
com.ibm.db2.jcc.t2zos.y:[IBM/DB2][T2zos/2.9.32]
v.readExecuteCallInternal: nativeExecuteCall:5587:
DB2 engine SQL error, SQLCODE = -471, SQLSTATE = 55023,
error tokens = SYSIBM.SQLTABLES;00E7900C



SQL code -204 and -516

This error can be caused if the currentSchema property does not match the schema name of the tables and indexes that you created. The error messages show the JCA authentication alias that is being used.

error message: BBOO0220E: SCHD0125E: Unexpected exception while processing the acquireLease operation:
com.ibm.ws.leasemanager.LeaseException: SCHD0300E:
Error during Database operation, localized message is _
:nativePrepareInto:1377:
DB2 engine SQL error, SQLCODE = -204, SQLSTATE = 42704,
error tokens = MDDBU.WSCH_LMGR,
Vendor Error Code is -204, ANSI-92 SQLState is 42704, cause:
[IBM/DB2][T2zos/2.9.32]T2zosPreparedStatement.readDescribeInput_
:nativeDescribeInput:2006:
DB2 engine SQL error, SQLCODE = -516, SQLSTATE = 26501,
error tokens =


..
..
com.ibm.db2.jcc.t2zos.y:
[IBM/DB2][T2zos/2.9.32]T2zosPreparedStatement.readDescribeInput:2006:
DB2 engine SQL error, SQLCODE = -516, SQLSTATE = 26501, ...



Repeated SIB messages about acquiring and losing locks

This error can occur after correcting the DB2 Universal Driver configuration and restarting the server. The error messages are repeated continuously in the adjunct region.

ExtendedMessage: BBOO0222I:

CWSIS1538I: The messaging engine, ME_UUID=68E9550CE7780888,
INC_UUID=5f244052b02f04b4,
is attempting to obtain an exclusive lock on the data store.
..
..
ExtendedMessage: BBOO0222I:

CWSIS1546I: The messaging engine, ME_UUID=68E9550CE7780888,
INC_UUID=5f244052b02f04b4,
has lost an existing lock or failed to gain an initial lock on the database 

These error messages indicate there is a problem accessing the data store. Check the fixWPSvars.jacl had created (jdbc/MEdatasource) in the datasource. Check the datasource has an JCA authentication alias associated with it. If there is no JCA authentication alias associated with the datasource the database access defaults to the user ID of the servant region and tries to find tables called MKASRU which do not exist.

This error can occur because the -sibauth option has not been coded when running fixWPSvars.jacl. You can fix this in a number of ways:

The following steps explain how to create a new JCA authentication alias to be used by the SIB to access DB2:

  1. Open the WebSphere administrative console and navigate to SecurityGlobal security.

  2. Click the link to J2C Authentication data under Additional Properties.

  3. Click the New button.

  4. Enter a name for the alias, and enter the user ID and the password for the alias.

  5. Click OK.

  6. Click Save.

The following steps explain how to associate the Service Integration Bus with the authentication alias that you have created:

  1. Open the WebSphere administrative console and navigate to Service integrationBuses.

  2. Click the bus.

  3. On the next panel, click Messaging engines.

  4. Click the hyperlink to the messaging engine.

  5. Click the link to Data store under Additional Properties.

  6. Expand the drop-down list box in the Authentication alias field and select the alias you created earlier.

  7. Click OK and then save the change to the configuration.

  8. Stop and restart the server.



Verification errors

When you verify the installation, you might encounter some problems, which are described in this section.



Resources not seen in the administrative console

When you are checking that applications you have installed exist in the system, you may not see them listed under the installed applications section. If you do not see the applications listed, log out of the administrative console and log back in.

If you do not see Service Integration Buses that you have configured, log out of the administrative console and log back in.



Resolve "Topic not found" errors in Process Portal

After you log on to Process Portal, you might see a Topic not found error when you click the Help icon, or when you switch from Process Portal to the Business Space Welcome page.

To resolve this issue:

  1. Stop the server or cluster.
  2. Go to the WAS_HOME/profiles/profile_name/config/BusinessSpace/help directory on the deployment manager node:

    cd /WebSphere/V8T8DM/DeploymentManager/profiles/default/config/BusinessSpace/help

  3. Edit the file named bspace_help.link and change the Help1 value to be an absolute path:

      Help1=/WebSphere/V8T8DM/DeploymentManager/profiles/default/config/BusinessSpace/help

  4. Synchronize the cluster nodes.

  5. Restart the server or cluster.



Troubleshooting migration

Consider the following possibilities:

If you ran all the migration commands on the source environment and created a snapshot before the migration failed, you do not need to rerun all the migration commands again from the beginning. If the migration failed on the target, for example at the database upgrade step or when you ran the BPMMigrate command, you can restore the database, and rerun the commands on the target only.


SOAP invocation timeout

If you are using a SOAP connection, the migration command can take longer to complete than the specified SOAP timeout value. You might see an exception like...

To prevent this problem, modify the value of the property...

...in soap.client.props.


Enable tracing

To diagnose the reason for a migration failure, you can enable tracing and rerun a command. To enable tracing for migration commands:

  1. Locate the logging.properties file.

    For the DBUpgrade command-line utility, the file is in BPM_home_8.5/util/dbUpgrade. For example:

      /opt/ibm/WebSphere/AppServer/util/dbUgrade/logging.properties

    For the following command-line utilities:

    • BPMExtractDBConfiguration
    • BPMExtractSourceInformation
    • BPMManageApplications
    • BPMMergeFileRegistry
    • BPMMigrate

    The file is in BPM_home_8.5/util/migration/resources. For example:

      /opt/ibm/WebSphere/AppServer/util/migration/resources/logging.properties

  2. Set the log level in the logging.properties file. The default log level is FINE for both global logging level and file output log level. Change both properties to FINEST to capture more detail in the log. For example:
      # default global logging level. Logging level possible values: FINEST, FINER, FINE, INFO, WARNING, SEVERE.
      .level = FINEST
      # file output properties
      com.ibm.bpm.migration.logging.NonBlockingFileHandler.level = FINEST

If you see exceptions for any of the following commands, change the log level to FINEST, run the command again, and then check the results, as described for each command.


BPMExtractDBConfiguration troubleshooting

This command extracts data source information. If you see an exception when you run this command, you can diagnose the problem by changing the log level to FINEST and running the command again.

Change the log level to FINEST as described in "Troubleshooting migration." After you run the command again, check the log file named BPMExtractDBConfiguration_timestamp.log. If you cannot find the cause of the problem, you can provide the log to IBM support.

The BPMExtractDBConfiguration command reads all the data source information from the WebSphere Application Server configuration under the cell scope or cluster scope.



Related reference:

BPMExtractDBConfiguration command-line utility


BPMExtractSourceInformation troubleshooting

This command takes a snapshot of the source environment. If you see an exception when you run this command, you can diagnose the problem by changing the log level to FINEST and running the command again.

Change the log level to FINEST as described in "Troubleshooting migration." After you run the command again, check the log file named BPMExtractSourceInformation_timestamp.log.

If you cannot find the cause of the problem, you can provide the log to IBM support.



Related reference:

BPMExtractSourceInformation command-line utility


BPMManageApplications troubleshooting

This command disables or enables the automatic starting of applications and schedulers. If you see an exception when you run this command, you can diagnose the problem by changing the log level to FINEST and running the command again.

Change the log level to FINEST as described in "Troubleshooting migration." After you run the command again, check the log file named BPMManageApplications_timestamp.log. If you cannot find the cause of the problem, you can provide the log to IBM support.

To check the BPMManageApplications command was successful, perform the following actions:



Related reference:

BPMManageApplications command-line utility


BPMGenerateUpgradeSchemaScripts troubleshooting

This command generates SQL scripts and upgradeSchema scripts. If you see an exception when you run this command, you can diagnose the problem by changing the log level to FINEST and running the command again.

Change the log level to FINEST as described in "Troubleshooting migration." After you run the command again, check the log file named BPMGenerateUpgradeSchemaScripts_timestamp.log. The file is found in deployment_manager_profile/logs/. If you cannot find the cause of the problem, you can provide the log to IBM support.

The command reads database information from the properties file specified by the target.config.property.file property in the migration.properties file. If the generated SQL scripts are not what you expect, check the value of that property or the specified file to make sure the settings are correct.



Related reference:

BPMGenerateUpgradeSchemaScripts command-line utility


DBUpgrade troubleshooting

This command upgrades the databases. If you see an exception when you run this command, you can diagnose the problem by changing the log level to FINEST and running the command again.

Change the log level to FINEST as described in "Troubleshooting migration." After you run the command again, check the log file named DBUpgrade_timestamp.log. The file is found in deployment_manager_profile/logs/. If you cannot find the cause of the problem, you can provide the log to IBM support.

The command reads topology information from the properties file specified by the target.config.property.file property in the migration.properties file. The command reads database information from the WebSphere Application Server data source, so if you find the wrong database connection is used when you run DBUpgrade, check the data source configuration is correct.

The DBUpgrade command automatically upgrades the schema and data for Process Server and Performance Data Warehouse, except for DB2 for z/OS databases. If your source version is WebSphere Process Server or BPM Advanced, it also updates the topology information in the Business Process Choreographer database.

For DB2 for z/OS databases, the schema upgrade for Process Server and Performance Data Warehouse must be done manually.


DB2 SQL error

If you are running DB2, the DBUpgrade command might fail with the following database exception:

This error occurs when the database transaction log is full. Increase the size of the database transaction log and run the DBUpgrade command again.


Database customizations

If you customized the Process Server database for performance tuning, you might get exceptions when you run the DBUpgrade command. For example, if you added an index to the column of a table, the command also tries to add an index and you see a database exception similar to the following exception (for Oracle):

You must drop the customized index first and run DBUpgrade again.

It is better to check before you run the migration to see whether there are conflicts between what is already customized and what DBUpgrade does. Where there are conflicts, remove the customizations before you run the command. For the schema changes that DBUpgrade makes, see the upgradeSchema_ProcessServer.sql file. This file is in target_deployment_manager_profile/dbscripts/Upgrade/de_name/database_type/ProcessServer_database_name.

Find the phases in the file that correspond to your source. Each phase starts with /* START of phase ProcUpgradeToversion */ and ends with /* END of phase ProcUpgradeToversion */. If your source version is V7.5.1, the database is upgraded to V8.0.0 first, then to V8.0.1, and finally to V8.5.0. In that case, you would read through three phases in the file.


Not enough disk space to run the command

If you have a large amount of data, you might run out of disk space when you run the DBUpgrade command. This command migrates the BLOB data in the LSW_BPD_INSTANCE_DATA, LSW_TASK_EXECUTION_CONTEXT table, and also reorganizes the database. It requires about twice as much disk space as the original database.

DB2

With a DB2 database, you might see errors similar to the following error:
Executing upgrade step: Enable LOGGED for LOB columns Error executing SQL statement: DB2 SQL Error: SQLCODE=-2216, SQLSTATE=01H52, SQLERRMC=-289, DRIVER=4.11.69
SQL statement that failed: call sysproc.admin_cmd('reorg table LSW_BPD_INSTANCE_DATA')
Verify there is enough free space on the database table space disk.

Oracle

With an Oracle database, you might see errors similar to the following error:

    java.sql.BatchUpdateException: ORA-01653: unable to extend table schema_name.LSW_TASK_EXECUTION_CONTEXT by 1024 in tablespace tablespace_name

If the table space was set to AUTOEXTEND size, make sure there is enough free disk space on the table space disk. If the table space was not set to AUTOEXTEND size, you might need to resize or add another data file.


Out-of-memory error

If you get an out-of-memory error that indicates too many or too large data records, you can increase the heap size of the JVM for the DBUpgrade command, or decrease the number of records to be updated as a batch.


Low performance

If you are migrating from a version earlier than V8.0 and have an issue with low performance, and if you find the database server input and output is not the issue, you can try increasing the number of threads to handle serializing and deserializing Java objects and updating the database. Complete the following steps:

  1. Open the BPM_home/util/dbUpgrade/upgrade.properties file.

  2. Increase the value of the worker.thread.size property. By default, the value is 1. The maximum number of threads depends on the processor number of the operating system that has IBM Business Process Manager installed.



Related reference:

DBUpgrade command-line utility


BPMMigrate troubleshooting

This command imports the migration snapshot. If you see an exception when you run this command, you can diagnose the problem by changing the log level to FINEST and running the command again.

Change the log level to FINEST as described in "Troubleshooting migration." After you run the command again, check the log file named BPMMigrate_timestamp.log. The file is found in snapshot_folder/logs/. If you cannot find the cause of the problem, you can provide the log to IBM support.

For the scheduler migration, the BPMMigrate command runs the AdminTask that is registered during server startup and uses the scheduler service to re-create each scheduler task. The trace for the AdminTask is saved in snapshot_folder/logs/. The log for the scheduler migration is saved on one of the active nodes, in BPM_home_8.5/profiles/custom_profile/logs. Collect both the trace file and the log file for analysis if an exception occurs.

Verify the messaging engine is started before you run BPMMigrate. Otherwise, the command fails when it tries to migrate the service integration bus messages.

The command migrates the 100SourceCustomMerged.xml file (if it exists) from the snapshot folder to the target environment. If you customized the process-center-install-group or offline-install-group, the customized settings are commented out during the migration. The file is renamed to 101CustomMigrated.xml, then copied to each node in the target environment.



Related reference:

BPMMigrate command-line utility


Troubleshooting the deployment environment

When processing appears sluggish or requests fail, use a focused approach to determine the source of the problem in the environment. The approach described is for non-standalone server environments.

You must be logged into the administrative console of the deployment manager to perform this task.

.

Investigate the state of the deployment environment if you notice any of the following symptoms:

  1. Display the topology layout that describes this deployment environment to determine the status of the topology.
  2. Display the topology to determine the state of the various roles in the topology. Note the roles with unexpected states or warning for further investigation.

  3. Locate the nodes that are causing the error state for each role.

  4. Make sure all nodes are synchronized.

    On the Nodes page in the administrative console select any unsynchronized nodes and click Synchronize.

  5. Verify the messaging engines associated with all the buses are running.

    If they are not running, stop and start the messaging engines.

  6. Locate the logs associated with the nodes in error and view the logs for error messages.
  7. Take any actions prescribed by the error messages to affect the correction.
  8. Correct any errors and restart the affected nodes.

The nodes previously in error start and the status of the topology becomes "running."


Restart any affected applications


Log files

Deployment environment status information

Aggregated status information

Topology status information


AIS does not refresh automatically in the Inspector view

When you run a Business Process Diagram (BPD) in BPM Advanced, the process status does not automatically update in the Inspector view. An Advanced Integration Service (AIS) can take some time to run, depending on the service implementation.

Even fast request and response services can seem to take a long time the first time they are called and the service is initializing.


Resolve the problem

If the process state in the Inspector view does not seem to progress, click the Refresh icon and wait for the AIS call to complete and the BPD to resume. Click the Refresh icon again if necessary.

If the AIS still does not complete, check with its implementation owner. The AIS might be waiting for something, such as a human task or an error to be fixed.



AIS does not participate in the same transaction as business process

In IBM Business Process Manager Advanced, process navigation of business process definitions (BPDs) does not participate in the same transaction context as an advanced integration service (AIS). Therefore, a runtime failure in the BPD navigation that causes the BPD transaction to roll back does not roll back the transaction under which the currently executing AIS might be running. As a result, the AIS might be executed a second time.


Overview

BPD process navigation is not part of the same transaction as an invoked AIS. If a BPD process navigation invokes an AIS and then experiences a runtime failure (not an error in the business process logic), the process navigation becomes unavailable while the AIS is running. When the AIS commits its transaction, it cannot indicate to the process navigation the AIS has concluded. When the process navigation later resumes, the AIS is invoked again because the process navigation is unaware of the previously successful invocation.


Resolve the problem

To resolve this problem, place checks within the AIS application logic to ensure that a second invocation does not corrupted the state of an application.



SSL fails when host name configuration fails

IBM Business Process Manager uses host name verification for outbound connections that use SSL. Connections are refused if the host name the server connects to does not match the common name (CN) in the SSL certificate. This problem is most likely to occur when the initial configuration used localhost as a host name.

Host name verification was introduced as a security update in BPM 8.0.1.1 and is applied in later releases. The update was also provided as an interim fix for earlier releases. When SSL fails verification, you receive this exception message:

For more information about host names, see the related links at the end of this topic.


Certificate configuration

When a connection is established to a secure port, the initial handshake involves verifying the certificates. When you connect to a remote server over HTTPS, BPM expects the common name in the SSL certificate of the remote server to match the host name of the computer that it connected to. However, there are several scenarios in which BPM connects to itself using HTTPS. Therefore, BPM must be set up with a certificate that has a common name that matches the host name that BPM uses when it connects to itself.

When a profile is created, IBM WebSphere Application Server by default generates a self-signed root certificate that is valid for 15 years. In a distributed environment, a certificate is generated for each node and signed with the root certificate. The common name (CN) in the certificate is the same as the host name specified during profile creation.

SSL is a point-to-point connection. The common name in the certificate must match the host name of the computer that is trying to connect. When BPM is configured to connect to itself through a web server, the web server must be set up with a certificate that has a common name that must match the host name used by BPM to connect to this web server.


Problem scenarios and solutions

One problem occurs if you install and test BPM using localhost as your host name. Later, if you try to connect with an external name, for example https://myname.mycompany.com:9443/bpm/rest, or if you try to connect from another computer, the verification fails. The failure of the connection generates an error in Process Inspector and the Process Admin Console. To avoid this problem, the configuration documentation warns against using localhost as the host name. Particularly, the configuration documentation warns against using localhost for environments that are spread across multiple computers.

If you have a locally installed Process Center that you use for your own development purposes, set up the environment with a host name such as bpm.company.com. On the Windows operating system, set this environment in your Windows hosts file. Always use that host name to access the Process Center server.

In a production environment, always access clusters through the HTTP server. The HTTP server must be accessible from BPM and must have a fully qualified host name with a matching certificate.


Naming considerations for profiles, nodes, servers, hosts, and cells

Create a self-signed certificate

Create a certificate authority request


Authors cannot drill down in a report

When a report is present in Coach technology and you run it in playback mode using IBM Process Center for BPM Advanced, you might not be able to drill down in the report.


Symptom and cause

When you try to drill down in a report, you might see the following message:

As an author, you are granted temporary access to reports when you run them in playback mode in Process Center, regardless of the permissions on the report. However, you are not automatically granted permission to drill down in the report.


Resolve the problem

In Process Center, grant yourself permission to drill down in reports.


Manage access to the Process Center repository


Error occurs when importing process applications

When you try to import process applications and toolkits into IBM Process Center for BPM Advanced, the import process might fail if you do not have proper rights in your operating system.

When the import process fails, the log file contains information that is similar to the following message:

This error occurs when you do not have permission to write to the temporary directory the operating system uses during the import process.

You might also experience DB2 for z/OS deadlock timeouts (SQLCODE = -913 00C9008E) under significant network latency or heavy concurrent loads; for example:

To resolve this issue, increase the value of the DB2 for z/OS system parameter for resource timeouts (IRLMRWT). This value is measured in seconds. See RESOURCE TIMEOUT field (IRLMRWT subsystem parameter) in the DB2 for z/OS Information Center.



Grant permission to write to the temp directory

If an import process fails because the system cannot find the path specified, modify permission settings for the temporary directory the operating system uses during the import process.

Complete the following steps appropriate for your operating system:



Process Portal does not support automatic session rollover

Process Portal does not support automatic session rollover if one of the nodes in a cluster becomes unavailable.


Overview

When you are working in a network deployment environment with a proxy, you are not directly connecting to a node in the cluster. If the node being accessed becomes unavailable while running a task, the submission of the Coach fails.

You might receive an error message similar to the following text:


Resolve the problem

To resolve this problem, log back in to Process Portal. You are then associated with a new node and can complete the task on that node.



Failure when sending tracking definitions

Install a snapshot from IBM Process Center to Process Server is successful, however there are errors in SystemOut.log (CWLLG2229E and sql error com.microsoft.sqlserver.jdbc.SQLServerException: The specified schema name "sa" either does not exist or you do not have permission to use it). Tracking definitions are not sent successfully.

In Microsoft SQL Server, the default schema name associated with a user must be the same as the user name. For example, if the Performance Data Warehouse database user name is perfDB then the default schema name associated with the user perfDB must also be named perfDB. When the Performance Data Warehouse database user name is sa, the default schema name is required to besa. However, sa is a super user in Microsoft SQL server and the default schema for the super user is dbo and this schema name cannot be changed.

Create an ordinary database user and assign the required rights to the user instead of using a super user, such as sa.

To recover from this situation, fix the Performance Data Warehouse data source to work properly on SQL Server by following these steps:

  1. On the Process Server network deployment environment, change the Performance Data Warehouse data source to use the Performance Data Warehouse user name perfDB instead of the username sa .
  2. Stop the Process Server deployment environment.
  3. Drop the Process Server Performance Data Warehouse database on SQL Server.

  4. Use db scripts, recreate the Performance Data Warehouse database with the user perfDB instead of the user sa.
  5. The Performance Data Warehouse tables in the database are created under the user schema perfDB.

  6. Restart the Process Server deployment environment.

  7. In the Process Admin console, run Update tracking definitions for each process application that is deployed. This may take some time to complete.



11. Troubleshooting NIST SP800-131a environment configurations

If you are configuring IBM Business Process Manager to support the National Institute of Standards and Technology (NIST) SP800-131a security standard, you might observe one or more of the following configuration issues.

Potential NIST SP800-131a environment configuration issues and suggested actions

Symptom Potential cause What to do
Synchronization in the administrative console fails after conversion of certificates The existing connection is using old certificates Run the syncNode command

The deployment manager, node agent, or node cannot be stopped after you have made the changes to support NIST

PROFILE_DIR/properties/ssl.client.props file properties have not been updated Update the PROFILE_DIR/properties/ssl.client.props file properties as they are not transferred during node synchronization
Browser cannot access the Process Admin console The TLS 1.2 protocol might not be enabled in the browser

  • Use the ping command to test access to the specified host name

  • Verify that TLS version 1.2 is enabled

  • Use the grep command to verify the SSL port is correct for the listening parameter in the SystemOut file
  • Test the connection using a different browser. For example, Opera 12

The Process Portal is empty when only Microsoft Internet Explorer is used

  • Internet Explorer 9 is running in Internet Explorer 7 Browser Mode
  • Internet Explorer 9 is running in Quirks Document Mode

  • In Internet Explorer 9, press F12 and check the Browser Mode
  • In Internet Explorer 9, press F12 and check the Document Mode

Process Designer login fails, resulting in apeer not authenticated error message

  • The resources\ssl.client.props file is using SSL_TLS instead of TLSv1.2

  • The Process Center signer certificate is not specified in the C:\IBM\ProcessDesigner\v8.5\etc\trust.p12 file

  • Edit C:\IBM\ProcessDesigner\v8.5\resources\ssl.client.props and do the three-line change (see above), then restart PD
  • The time stamp for the etc\trust.p12 file must be later than the time at which you enabled NIST strict mode in the administrative console. To correct this situation, you can download Process Designer from Process Center again. The compressed file contains the appropriate etc\trust.p12 version.

Process Designer login displays an empty page with the This program cannot display the webpage message Process Designer uses Microsoft Internet Explorer to open the Process Center page, which is the default perspective. As a result, this error appears when TLS version 1.2 is not enabled in Microsoft Internet Explorer Verify that TLS version 1.2 is supported and enabled in Microsoft Internet Explorer and reopen Process Designer
A configured Process Server does not show up in the Servers tab of the Process Designer After the certificates were converted to the NIST SP 800-131a standard, Process Center and Process Server signers were not added to the Process Center and Process Server truststore Confirm the Process Server signer certificate is added to the Process Center truststore, and the Process Center signer certificate is added to the Process Server truststore
When Firefox is used, Process Designer playback fails and returns the The connection was interrupted error message Firefox does not support TLS version 1.2 Change the default browser
When Microsoft Internet Explorer is used, Process Designer playback fails and returns the Internet Explorer cannot display the webpage error message TLS version 1.2 is not enabled in Microsoft Internet Explorer Change the default browser


Enable a NIST SP800-131a compliant environment


12. Troubleshooting service module deployment failures

This topic describes the steps to take to determine the cause of a problem when deploying a service module. It also presents some possible solutions.

This topic assumes the following things:

The task of troubleshooting a deployment begins after you receive notification of an error. There are various symptoms of a failed deployment that you have to inspect before taking action.

  1. Determine if the application installation failed.

    Examine the SystemOut.log file for messages that specify the cause of failure. Some of the reasons an application might not install include the following:

    • You are attempting to install an application on multiple servers in the same Network Deployment cell.
    • An application has the same name as an existing module on the Network Deployment cell to which you are installing the application.
    • You are attempting to deploy Java EE modules within an EAR file to different target servers.

    If the installation has failed and the application contains services, you must remove any SIBus destinations or JCA activation specifications created before the failure before attempting to reinstall the application. The simplest way to remove these artifacts is to click Save > Discard all after the failure. If you inadvertently save the changes, you must manually remove the SIBus destinations and JCA activation specifications (see Deleting SIBus destinations and Deleting JCA activation specifications).

  2. If the application is installed correctly, examine it to determine if it started successfully.

    If the application did not start successfully, the failure occurred when the server attempted to initiate the resources for the application.

    1. Examine the SystemOut.log file for messages that will direct you on how to proceed.
    2. Determine if resources required by the application are available or have started successfully.

      Resources that are not started prevent an application from running. This protects against lost information. The reasons for a resource not starting include:

      • Bindings are specified incorrectly
      • Resources are not configured correctly
      • Resources are not included in the resource archive (RAR) file
      • Web resources not included in the web services archive (WAR) file

    3. Determine if any components are missing.

      The reason for missing a component is an incorrectly built enterprise archive (EAR) file. Verify the all of the components required by the module are in the correct folders on the test system on which you built the Java™ archive (JAR) file. "Preparing to deploy to a server" contains additional information.

  3. Examine the application to see if there is information flowing through it.

    Even a running application can fail to process information. Reasons for this are similar to those mentioned in step 2.b.

    1. Determine if the application uses any services contained in another application. Verify the other application is installed and has started successfully.
    2. Determine if the import and export bindings for devices contained in other applications used by the failing application are configured correctly. Use the administrative console to examine and correct the bindings.

  4. Correct the problem and restart the application.



Delete JCA activation specifications

The system builds JCA application specifications when installing an application that contains services. There are occasions when you must delete these specifications before reinstalling the application.

If you are deleting the specification because of a failed application installation, make sure the module in the Java™ Naming and Directory Interface (JNDI) name matches the name of the module that failed to install. The second part of the JNDI name is the name of the module that implemented the destination. For example in sca/SimpleBOCrsmA/ActivationSpec, SimpleBOCrsmA is the module name.

When security and role-based authorization are enabled, you must be logged in as administrator or configurator to perform this task.

Delete JCA activation specifications when you inadvertently saved a configuration after installing an application that contains services and do not require the specifications.

  1. Locate the activation specification to delete.

    The specifications are contained in the resource adapter panel. Navigate to this panel by clicking Resources > Resource adapters.

    1. Locate the Platform Messaging Component SPI Resource Adapter.

      To locate this adapter, you must be at the node scope for a standalone server or at the server scope in a deployment environment.

  2. Display the JCA activation specifications associated with the Platform Messaging Component SPI Resource Adapter.

    Click the resource adapter name and the next panel displays the associated specifications.

  3. Delete all of the specifications with a JNDI Name that matches the module name that you are deleting.

    1. Click the check box next to the appropriate specifications.

    2. Click Delete.

The system removes selected specifications from the display.


Save the changes.




Related tasks:

Delete SIBus destinations


Delete SIBus destinations

Service integration bus (SIBus) destinations are used to hold messages being processed by SCA modules. If a problem occurs, you might have to remove bus destinations to resolve the problem.

If you are deleting the destination because of a failed application installation, make sure the module in the destination name matches the name of the module that failed to install. The second part of the destination is the name of the module that implemented the destination. For example in sca/SimpleBOCrsmA/component/test/sca/cros/simple/cust/Customer, SimpleBOCrsmA is the module name.

When security and role-based authorization are enabled, you must be logged in as administrator or configurator to perform this task.

Delete SIBus destinations when you inadvertently saved a configuration after installing an application that contains services or you no longer need the destinations.

This task deletes the destination from the SCA system bus only. You must also remove the entries from the application bus before reinstalling an application that contains services (see Deleting JCA activation specifications.)

  1. Log into the administrative console.
  2. Display the destinations on the SCA system bus.

    1. In the navigation pane, click Service integration > buses

    2. In the content pane, click SCA.SYSTEM.cell_name.Bus

    3. Under Destination resources, click Destinations

  3. Select the check box next to each destination with a module name that matches the module that you are removing.

  4. Click Delete.

The panel displays only the remaining destinations.


Delete the JCA activation specifications related to the module that created these destinations.




Related tasks:

Delete JCA activation specifications


13. Process Designer window is blank

After you log in to Process Designer, you might see a blank white Process Designer window, a partially displayed view, or an http error. Refresh your browser, or configure additional security in Internet Explorer V8 or V9.

Press F5 to refresh the browser. If the issue persists, additional security configuration might be required in Internet Explorer V8 or V9. To cause the page to load correctly:

  1. From the browser window, click Tools > Internet Options.

  2. Click the Advanced tab and scroll to the Security section in the Settings list.

  3. Select Enable native XMLHTTP support.

  4. Click Apply and then click OK.


Resolve browser display problems with administration tools


14. Troubleshooting a failure to access help topics

By default, IBM Business Process Manager is configured to access help topics from the IBM documentation website. If you are working behind a firewall, you might find that links from the product to help files do not resolve. In that case, you need to either revise the proxy settings in the product or download and install the help contents to your local system.


Reconfiguring proxy settings

The product documentation is found at http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/index.jsp. If you can access the product documentation at that URL in a browser, but links to help topics from the product fail, you probably have a proxy server between BPM and the documentation site. Check to see whether the browser is configured to use a proxy server. If it is, configure the WebSphere server to communicate with that same proxy server.

In the WAS admin console, set the http.proxyHost and http.proxyPort properties to point to the proxy host and port that are in use. You can find instructions here: http://setgetweb.com/p/WAS85x/ae/twbs_configaddhttppropertiesadmin.html and http://setgetweb.com/p/WAS85x/ae/txml_configaddhttppropertieswsadmin.html


Downloading and installing IBM Business Process Manager documentation

If you need to work offline, you need to download the documentation files and install them into the same location as the BPM product. Downloading the documentation files requires a working Internet connection. When there is a new release of the product, you need to update your installed documentation to keep it current.

You can find the information center for BPM on the web at http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/index.jsp. If using the embedded help system that is installed with the BPM product, you are using this site to access the latest documentation. For example, if you click the Help link in Process Designer, the embedded help system accesses the help documentation at that link.

If you are using IBM Business Process Manager V8.5.0.0, you need to install an interim fix before you download the documentation. This step is not necessary if you are using V8.5.0.1 or later. The fix is available on Fix Central: http://www.ibm.com/support/fixcentral. After you select the product group, product, installed version, and platform, click Continue and then select APAR or SPR, enter JR48590, and click Continue. When you download fix packages, ensure that you also download the readme file for each fix. Review each readme file for installation instructions and information about the fix.

To install the documentation for BPM Standard to a local directory, follow these steps. If you are using IBM Business Process Manager Advanced, you must follow these steps and then install the documentation for IBM Integration Designer separately.

  1. In a browser, navigate to the following web address: http://publib.boulder.ibm.com/dmndhelp/downloads/v8r5m0/localhelp_bpm85.zip.

    Save the file to a local temp directory.

  2. Extract the contents of the compressed file to the BPM installation directory by entering the following commands:

    1. cd BPM_install_directory (where BPM_install_directory represents the directory where you previously installed the BPM Standard product).
    2. unzip localhelp_bpm85.zip. This command installs the documentation into BPM_install_directory/BPM/Lombardi/doc.local.

  3. If you need to update the documentation later, then you first need to remove the documentation that you previously installed locally.

    1. In a command window, enter cd BPM_install_directory.
    2. To remove the files, for Windows, enter del BPM/Lombardi/doc.local; for UNIX, enter rm -fr BPM/Lombardi/doc.local.

    3. After you remove the old files in that way, follow steps 1 - 3 to install the update files.

  4. Restartyour IBM Business Process Manager server. The documentation is available from the Help links of the various BPM components (such as Process Server, Process Designer, and Process Center).



15. Troubleshooting administration tasks and tools

Use the information in this group of topics to identify and resolve problems that can occur while you are administering the runtime environment.



Profile-specific log files

There are log files detailing the characteristics and runtime activities of individual profiles. These log files are located within the profile directory for each profile.These log files are located within the logs directory of the profile path.

There are a number of log files created for each profile. Some of these logs describe the parameters used for the creation of the profile. These types of log files generally remain unchanged once the profile is fully configured. Other profile-specific logs are continually updated to capture error, warning, and information messages emitted during run time. Some of these log files are also used to capture a Common Base Event (that might include business object data) that is selected for monitoring.

The table below specifies the different types of profile-specific log files and the locations where you can find them within the product. Within the table, the variable installation_root represents the installation directory of IBM Business Process Manager. The variable profile_root represents the root location of a profile.

See Installation directories for the product and profiles.

Profile-specific log files updated during runtime

Log Contents
First failure data capture (ffdc) log and exception files (common to all profile types) are found in these directories:

    profile_root/logs/ffdc
Contains the ffdc log and exception files for individual profiles. There are two types of ffdc logs: a single log file with a compilation of all the errors encountered during the profile runtime, and numerous text files with details such as stack traces and other information. The naming conventions for the different types of profiles are given for both files, as follows:

  • Deployment manager profile:

    • Log file - deployment_manager_name_exception.log.
    • Text files - deployment_manager_name_hex_id_date_time.txt.

  • Managed-node profile:

    • Log file(s) - node_agent_name_exception.log and server_name_exception.log.
    • Text files - node_agent_name(or)server_name_hex_id_date_time.txt.

  • Stand-alone profile:

    • Log file - server_name_exception.log.
    • Text files - server_name_hex_id_date_time.txt.

Deployment manager logs (deployment manager profiles only) are found in these directories:

    profile_root/logs/deployment_manager_name
You will work primarily with four log files in this directory:You will work primarily with two log files in this directory:

  • startServer.log -

    Contains the system parameters detected on the system and the messages emitted by the deployment manager during the start process

  • stopServer.log -

    Contains the system parameters detected on the system and the messages emitted when the deployment manager is shut down.

  • SystemErr.log -

    Contains error and exception messages generated by the deployment manager during runtime. Continually updated while server is running.

  • SystemOut.log -

    Contains all messages, including error, warning, and information messages generated by the deployment manager during runtime. Continually updated while server is running.

Node agent logs (managed-node profiles only) are found in these directories:

    profile_root/logs/node_agent_name
You will work primarily with four log files in this directory:You will work primarily with two log files in this directory:

  • startServer.log -

    Contains the system parameters detected on the system and the messages emitted by the node agent during the start process

  • stopServer.log -

    Contains the system parameters detected on the system and the messages emitted when the node agent is shut down.

  • SystemErr.log -

    Contains error and exception messages generated by the node agent during runtime. Continually updated while node agent is running.

  • SystemOut.log -

    Contains all messages, including error, warning, and information messages generated by the node agent during runtime. Continually updated while the node agent is running.

Server logs (managed-node and stand-alone profiles only) are found in these directories:

    profile_root/logs/server_name
You will work primarily with four log files in this directory:You will work primarily with two log files in this directory:

  • startServer.log -

    Contains the system parameters detected on the system and the messages emitted by the server during the start process

  • stopServer.log -

    Contains the system parameters detected on the system and the messages emitted when the server is shut down.

  • SystemErr.log -

    Contains error and exception messages generated by the server during runtime. Continually updated while server is running.

  • SystemOut.log -

    Contains all messages, including error, warning, and information messages generated by the server during runtime. Also contains any events being monitoring that are emitted from the Common Event Infrastructure (CEI), in Common Base Event format. These events may also include the level of business object data (FINE, FINER, or FINEST) specified for the monitor. Continually updated while the server is running.

Node federation log files are found in these directories (only applies to non-deployment manager profiles):

    profile_root/logs
Two log files are generated when you attempt to federate a node to a deployment manager:

  • addNode.log -

    contains the pertinent server environment information and messages generated when you attempt to federate the profile.

  • isFederated.log -

    lists the commands used by the deployment manager to federate the profile.

Two log files are generated when you attempt to federate a custom, augmented, or stand-alone profile to a deployment manager:

  • addNode.log -

    contains the pertinent server environment information and messages generated when you attempt to federate the profile.

  • isFederated.log -

    lists the commands used by the deployment manager to federate the profile.

The location of the Integrated Solutions Console application deployment log file is listed here (only for deployment manager and stand-alone profiles):

    profile_root/logs/iscinstall.log
The iscinstall.log file contains information regarding the deployment of the administrative console application in a deployment manager or stand-alone profile.
The location of the Installation Verification Tool log file is listed here (only for deployment manager and stand-alone profiles):

    profile_root/logs/ivtClient.log

This log file contains the output generated by the Installation Verification Tool. You can start this program from the Quick start console after you create a deployment manager or stand-alone profile. The log contains basic configuration information and the messages that are displayed when you run the tool.
The location of the log file detailing the commands generated for a profile creation is listed here:

    profile_root/logs/updateserverpolicy.log
This file contains the sequence of commands used by the product to set server environment variables and create a profile. All profile types will contain this file.These files contain the output from profile creation. All profile types will contain this file.




Related concepts:

BPM log files

Installation and profile creation log files


Resolve browser display problems with administration tools

If you use Microsoft Internet Explorer V8 or higher to access Process Center console, Process Admin Console, Performance Admin console, or IBM Process Portal, you can have trouble viewing the page contents in the browser.

The following information guides you through the process of dealing with various browser display problems that may occur when using IBM Business Process Manager.




Troubleshooting the failed event manager

This topic discusses problems that you can encounter while using the failed event manager.

This topic does not discuss how to use the failed event manager to find, modify, resubmit, or delete failed events on the system. For information about managing failed events, see Work with failed events in BPM.

Select the problem you are experiencing from the table below:

Problem Refer to the following
I am having trouble entering values in the Search page's By Date tab Values in the By Date and From Date field automatically change to default if entered incorrectly
I am having trouble deleting expired events Use the Delete Expired Events function appears to suspend the failed event manager
I am having trouble with failed events not being created Failed events are not being created
I am having trouble retrieving or deleting large numbers of failed events The server fails when retrieving or deleting large numbers of failed events
I am having trouble clearing a large number of failed events The server fails when clearing a large number of failed events


Values in the By Date and From Date field automatically change to default if entered incorrectly

The Search page's From Date and To Date fields require correctly formatted locale-dependent values. Any inconsistency in the value's format ( including four digits in the year instead of 2, or omitting the time) will cause the failed event manager to issue the following warning and substitute a default value in the field:

CWMAN0017E: The date entered could not be parsed correctly: your_incorrectly_formatted_date. Date: default_date is being used.

The default value of the From Date field is defined as January 1, 1970, 00:00:00 GMT.

The actual default value shown in your failed event manager implementation will vary depending on your locale and time zone. For example, the From Date field defaults to 12/31/69 7:00 PM for a workstation with an en_US locale in the Eastern Standard Time (EST) time zone. The default value for the To Date field is always the current date and time, formatted for your locale and time zone.

To avoid this problem, always enter your dates and times carefully, following the example provided above each field.


Use the Delete Expired Events function appears to suspend the failed event manager

If using the Delete Expired Events button in situations where there are many failed events in the current search results, or where those events contain a large amount of business data, the failed event manager can appear to be suspended indefinitely.

In this situation, the failed event manager is not suspended: it is working through the large data set, and will refresh the results set as soon as the command completes.


Failed events are not being created

If the Recovery subsystem is not creating failed events, go through the following checklist of potential causes:


The server fails when retrieving or deleting large numbers of failed events

The server can fail if use the failed event manager to retrieve or delete a large number of failed events at once. To prevent this problem, be sure to check the total failed event count on the main page of the failed event manager before performing search or delete actions. If you have a large number of events, do not try to retrieve or delete all of them at once. Instead, use the Search failed events option to return a subset of failed events that match specific criteria. You can then delete all of the events in the filtered result set without causing a server failure.

There are several factors in the environment that can affect the number of failed events the server can return or delete in a single request, including other processes running on the same machine as the server and the amount of available memory.


The server fails when clearing a large number of failed events

The server can fail if you try to clear a large number of failed events using the Clear All option. To prevent this from happening, set a limit for the number of failed events that can be cleared using the JVM property failedEventLimit. If the failed event count is higher than that limit, the Clear All option returns an error, and you must delete the failed events (up to the limit specified) one at a time.





Troubleshooting store-and-forward processing

This topic discusses problems that you can encounter with store-and-forward processing.

Select the problem you are experiencing from the table below:

Problem Refer to the following
I am having problems setting the store-and-forward qualifier Store-and-forward qualifier processing only works on asynchronous interfaces
Qualifying runtime exceptions are occurring, but events are not getting stored Store is not activated by qualifying runtime exceptions
Messages are still being processed even though the Store and Forward widget shows the state is set to Store (Network deployment environment) In a network deployment environment, messages are being processed even though the store-and-forward state is set to Store
The Store and Forward widget shows the state is set to Forward, but messages are not being processed by all members of the cluster. (Network deployment environment) In a network deployment environment, messages are not getting processed by all members of the cluster even though the store-and-forward state is set to Forward


Store-and-forward qualifier processing only works on asynchronous interfaces

The store-and-forward qualifier must be specified on an asynchronous interface. The store cannot be activated if the interface is called synchronously.

Here are some guidelines (with respect to components) to help you determine if the interface is being called synchronously or asynchronously.

Also, consider these restrictions:


Store is not activated by qualifying runtime exceptions

If the store is not being activated by qualifying runtime exceptions, check the following.


In a network deployment environment, messages are being processed even though the store-and-forward state is set to Store

Messages might continue to be processed by some members of a cluster, despite the state being set to Store, if the state is not set to Store for each member of the cluster. To fix this problem, confirm the state is set to Store for each member of the cluster in the Store and Forward widget. If any members of the cluster are set to Forward, change them to Store.

This might also happen if one of the members of the cluster is forced to restart. Since the Store state is not persistent, it reverts to the Forward state at restart. To fix this problem, change the state to Store for the module in the Store and Forward widget.

When the service becomes available again, you should not set the state to Store immediately if you want new events to be processed. If you set the state to Store before new events have the chance to be processed, they will be stored in the queue.


In a network deployment environment, messages are not getting processed by all members of the cluster even though the store-and-forward state is set to Forward

Messages might continue to be stored by some members of a cluster, despite the state being set to Forward, if the store-and-forward state is not set to Forward for each member of the cluster. To fix this problem, confirm the state is set to Forward for the module in the Store and Forward widget. If any members of the cluster are set to Store, change them to Forward.




Troubleshooting BPEL processes and human tasks

Use this topic to solve problems relating to BPEL processes and human tasks.

The following tasks focus on troubleshooting problems that can happen during the execution of a BPEL process or task.



Related concepts: Understanding the startup behavior of Business Process Choreographer:

Recovery from infrastructure failures


Related tasks:

Query and replay failed messages, using the administrative console

Tuning BPEL processes

Connection timeout when running a wsadmin script


Enable tracing for Business Process Choreographer

You can configure trace settings for Business Process Choreographer to help diagnose problems with the processing of BPEL processes and human tasks.


Trace settings

Business Process Choreographer tracing uses the standard application server tracing mechanism. This must be enabled in the normal way.

You can specify the following trace settings for Business Process Choreographer.

Trace settings for Business Process Choreographer

Trace string What is logged
com.ibm.bpe.* All available trace information for BPEL processes
com.ibm.bpe.basic.navigation.* Major navigation steps and state changes for BPEL processes
com.ibm.bpe.basic.api* Time and sequence of methods called by the Business Flow Manager API
com.ibm.task.* All available trace information for human tasks
com.ibm.task.basic.api* Time and sequence of methods called by the Human Task Manager API
com.ibm.task.basic.core* Major steps in the processing of human tasks, state changes, and calls to SPIs
com.ibm.ws.staffsupport.* All available trace information for the people directory providers

To get a complete set of trace information for Business Process Choreographer, specify the following trace settings:


What to send to support

After enabling tracing, re-create your problem, then provide the following files:


Work with trace in WAS

Diagnostic trace service settings


Troubleshooting the execution of BPEL processes

This describes the solutions to common problems with BPEL process execution.

In Business Process Choreographer Explorer, you can search for error message codes on the IBM technical support pages.

  1. On the error page, click the Search for more information link. This starts a search for the error code on the IBM technical support site. This site only provides information in English.
  2. Copy the error message code that is shown on the error page to the clipboard. The error code has the format CWWBcnnnnc, where each c is a character and nnnn is a 4-digit number. Go to the technical support page.
  3. Paste the error code into the Additional search terms field and click Go.


Solutions to specific problems are in the following topics.



Related tasks:

Work with process-related or task-related messages

Troubleshooting Business Process Choreographer Explorer or Business Process Archive Explorer


ClassCastException when stopping an application containing a microflow

The SystemOut.log file contains ClassCastException exceptions around the time when an application containing a microflow had been stopped.


Reason

When an application is stopped, the classes contained in the EAR file are removed from the class path. However, microflow instances that need these classes may still be executing.


Resolution

Perform the following actions:

  1. Stop the microflow process template first. From now on, it is not possible to start new microflow instances from that template.
  2. Wait for at least the maximum duration of the microflow execution so that any running instances can complete.
  3. Stop the application.



XPath query returns an unexpected value from an array

Use an XPath query to access a member in an array returns an unexpected value.


Reason

A common cause for this problem is assuming the first element in the array has an index value of zero. In XPath queries in arrays, the first element has the index value one.


Resolution

Check that your use of index values into arrays start with element one.



An activity has stopped because of an unhandled fault (Message: CWWBE0057I)

The system log contains a CWWBE0057I message, the process is in the state "running", but it does not proceed its navigation on the current path.


Reason

An activity is put in a stopped state, if all of the following happen:


Resolution

The solution to this problem requires actions at two levels:

  1. An administrator must repair the stopped activity instance manually. For example, to force complete or force retry the stopped activity instance.
  2. The reason for the failure must be investigated. In some cases the failure is caused by a modeling error that must be corrected in the model.



Related concepts:

Continue-on-error behavior


Related tasks:

Repairing stopped activities

Repairing stopped activities using the process state view


A microflow is not compensated

A microflow has called a service, and the process fails, but the undo service is not called.


Resolution

To trigger the compensation of a microflow, the following conditions must be met:

  1. Log on to the Business Process Choreographer Explorer and click Failed Compensations to check whether the compensation service has failed and needs to be repaired.
  2. The compensation of a microflow is triggered only when the transaction for the microflow is rolled back. Check whether this is the case.
  3. The compensationSphere attribute of the microflow must be set to required.
  4. A compensation service is run only if the corresponding forward service has not participated in the microflow's transaction. Ensure the forward service does not participate in the navigation transaction, for example, on the reference of the process component, set the Service Component Architecture (SCA) qualifier suspendTransaction to True.



Related concepts:

Compensation handling in BPEL processes


Related tasks:

Administer compensation for microflows


A long-running BPEL process appears to have stopped

A long-running process is in the state running, but it appears that it is doing nothing.


Reason

There are various possible reasons for such behavior:

  1. A navigation message has been retried too many times and has been moved to the retention or hold queue.
  2. A reply message from the Service Component Architecture (SCA) infrastructure failed repeatedly.
  3. The process is waiting for an event, timeout, or for a long-running invocation or task to return.
  4. An activity in the process is in the stopped state.


Resolution

Each of the possible reasons requires different corrective actions:

  1. Use the failed event manager console to display details about a failed message and to replay it.
  2. Check if there are any failed message in the failed event management view of the administrative console.

    • If there are any failed events from Service Component Architecture (SCA) reply messages, reactivate the messages.
    • Otherwise, either force complete or force retry the long-running activity.

  3. Check if there are activities in the stopped state, and repair these activities. If your system log contains a CWWBE0057I message you might also need to correct your model as described in Message: CWWBE0057I.



Related concepts:

Finding failed events

Manage events


Related tasks:

Forcing the completion of activities


Invoking a synchronous subprocess in another EAR file fails

When a long-running process calls another process synchronously, and the subprocess is located in another enterprise archive (EAR) file, the subprocess invocation fails.

Example of the resulting exception:

com.ibm.ws.sca.internal.ejb.util.EJBStubAdapter com.ibm.ws.sca.internal.ejb.util.EJBStubAdapter#003
Exception:
java.rmi.AccessException: CORBA NO_PERMISSION 0x49424307 No; nested exception is:
org.omg.CORBA.NO_PERMISSION: The WSCredential does not contain a forwardable token.
Enable Identity Assertion for this scenario.
vmcid: 0x49424000 minor code: 307 completed: No at com.ibm.CORBA.iiop.UtilDelegateImpl.mapSystemException(UtilDelegateImpl.java:202)
at javax.rmi.CORBA.Util.mapSystemException(Util.java:84)


Reason

Because the subprocess invocation leads to a remote EJB method call, Common Secure Interoperability Version 2 (CSIv2) identity assertion must be enabled when calling a synchronous subprocess in another EAR file.


Resolution

Configure CSIv2 inbound authentication and CSIv2 outbound authentication.



Hung threads when a long-running process is invoked synchronously (Message: WSVR0605W)

A long-running process invokes another long-running process synchronously. Under heavy workload conditions, the thread monitor reports hung threads in the SystemOut.log file (message WSVR0605W).


Reason

A long-running process that is called synchronously can often cause hung threads. A long-running process usually spans several transactions and needs a free thread to continue with its navigation. If all of the available threads are involved in the navigation step of the parent process that invokes the subprocess, the system becomes unresponsive. Because of the lack of free threads, the subprocess cannot complete.


Resolution

A long-running process should always invoke another long-running process asynchronously, even if the processes are separated by another component. For example, if a long-running process invokes a mediation and this mediation invokes another long-running process, then ensure the preferred interaction style of the mediation is asynchronous.



Related concepts:

Transactional behavior of long-running BPEL processes


Related tasks:

Defining transactional behavior in Integration Designer


Late binding calls the wrong version of a subprocess

A parent process invokes a subprocess using late binding. Both processes are in the same module. A new version of the subprocess is created by copying the module and changing the valid-from timestamp. After the module is deployed, the running instances of the parent process continue to invoke the old version of the subprocess instead of the new version.


Reason

In late binding, the process template name of the subprocess is specified as part of the reference partner properties of the invoke activity in the parent process. Business Process Choreographer determines the version of the process that is currently valid at run time.

A common reason for late binding using the wrong version of a subprocess is the module that contains the subprocess does not have a Service Component Architecture (SCA) export. Without an export, processes in other modules are not visible to the parent process and it always invokes the version of the subprocess that is in the same module.


Resolution

In the assembly editor in Integration Designer, generate an SCA export with SCA native binding for the new version of the subprocess.



Related concepts:

Lifecycle management of BPEL subprocesses

Invoking different versions of a BPEL process


Unexpected exception during execution (Message: CWWBA0010E)

Either the queue manager is not running or the Business Process Choreographer configuration contains the wrong database password.


Resolution

Check the following items:

  1. If the systemout.log file contains "javax.jms.JMSException: MQJMS2005: failed to create MQQueueManager", start the queue manager.

  2. Verify the database administrator password stored in the Business Process Choreographer configuration matches the one set in the database.



Event unknown (Message: CWWBE0037E)

An attempt to send an event to a process instance or to start a new process instance results in a "CWWBE0037E: Event unknown." exception.


Reason

A common reason for this error is that a message is sent to a process but the receive or pick activity has already been navigated, so the message cannot be consumed by this process instance again.


Resolution

To correct this problem:

For more information about using correlation sets in BPEL processes, see technote 1171649.



Cannot find nor create a process instance (Message: CWWBA0140E)

An attempt to send an event to a process instance results in a 'CreateRejectedException' message.


Reason

A common reason for this error is that a message is sent to a receive or pick activity that cannot instantiate a new process instance because its createInstance attribute is set to no and the values that are passed with the message for the correlation set which is used by this activity do not match any existing process instances.


Resolution

To correct this problem you must pass a correlation set value that matches an existing process instance.

For more information about using correlation sets in BPEL processes, see Correlation sets in BPEL processes.



The failed state of the process instance does not allow the requested sendMessage action to be performed (Message: CWWBE0126E)

An attempt to send an event to a process instance results in an 'EngineProcessWrongStateException' message.


Reason

A common reason for this error is that a message is sent to a receive or pick activity to create a new process instance, but a new process instance cannot be instantiated. This situation occurs if the values that are passed with the message for the correlation set used by this activity match an existing process instance, which is already in the failed state.


Resolution

To correct this problem either delete the existing process instance, or pass a correlation set value that does not match an existing process instance. For more information about using correlation sets in BPEL processes, see Correlation sets in BPEL processes.



Uninitialized variable or NullPointerException in a Java snippet

Use an uninitialized variable in a BPEL process can result in diverse exceptions.


Symptoms

Exceptions such as:


Reason

All variables in a BPEL process have the value null when a process is started, the variables are not pre-initialized. Using an uninitialized variable inside a Java snippet or Java expression leads to a NullPointerException.


Resolution

The variable must be initialized before it is used. This can be done by specifying an initial value when you define the variable, specifying an assign activity, for example, the variable needs to occur on the to-spec of an assign, or the variable can be initialized inside a Java snippet.



Standard fault exception "missingReply" (message: CWWBE0071E)

The execution of a microflow or long-running process results in a BPEL standard fault "missingReply" (message: CWWBE0071E), or this error is found in the system log or SystemOut.log file.


Reason

A two-way operation must send a reply. This error is generated if the process ends without navigating the reply activity. This can happen in any of the following circumstances:


Resolution

Correct the model to ensure that a reply activity is always performed before the process ends.



A fault is not caught by the fault handler

A fault handler is attached to an invoke activity to catch specific faults that are thrown by the invoked service. However, even if the invoked service returns the expected fault, the fault handler is not run.


Reason

A common reason for this problem is the fault handler does not have a fault variable to catch the data that is associated with the fault. If a fault has associated fault data, it is caught by a fault handler only when one of the following situations apply:


Resolution

Add a fault variable to the fault handler. Ensure the data type of the fault variable matches the type of the data that is associated with the fault.



Related concepts:

Fault handling

Retrieval of fault data for BPEL processes


Related tasks:

Dealing with faults in your process in Integration Designer


Parallel paths are sequentialized

There are two or more parallel invoke activities inside a flow activity, but the invoke activities are run sequentially.


Resolution



Related concepts:

Transactional behavior of long-running BPEL processes


Copying a nested data object to another data object destroys the reference on the source object

A data object, Father, contains another data object, Child. Inside a Java snippet or client application, the object containing Child is fetched and set on a substructure of data object, Mother. The reference to Child in data object Father disappears.


Reason

The reference to Child is moved from Father to Mother.


Resolution

When such a data transformation is performed in a Java snippet or client application, and you want to retain the reference in Father, copy the data object before it is assigned to another object. The following code snippet illustrates how to do this:

BOCopy copyService = (BOCopy)ServiceManager.INSTANCE.locateService
                      ("com/ibm/websphere/bo/BOCopy");
DataObject Child = Father.get("Child");
DataObject BCopy = copyService.copy(Child);
Mother.set("Child", BCopy);



CScope is not available

Start a microflow or running a navigation step in a long-running process fails with an assertion, saying: 'postcondition violation !(cscope != null) '.


Reason

In certain situations, the process engine uses the compensation service, but it was not enabled.


Resolution

Enable the compensation service.



Related tasks:

Administer the compensation service for a server


Work with process-related or task-related messages

Describes how to get more information about Business Process Choreographer messages that are written to the display or a log file.

Messages that belong to Business Process Choreographer are prefixed with either CWWB for process-related messages, or CWTK for task-related messages. The format of these messages is PrefixComponentNumberTypeCode. The type code can be:

I

Information message

W

Warning message

E

Error message

When processes and tasks run, messages are either displayed in Business Process Choreographer Explorer, or they are added to the SystemOut.log file and traces. If the message text provided in these files is not enough to help you solve your problem, you can use the WebSphere Application Server symptom database to find more information. To view Business Process Choreographer messages, check the activity.log file by using the WebSphere log analyzer.

  1. Start the WebSphere log analyzer.

      install_root/bin/waslogbr.sh

  2. To check for the newest version of the symptom database.

      File > Update database > WebSphere Application Server Symptom Database

  3. Optional: Load the activity log.

    1. Select the activity log file

      • profile_root/profiles/profile_name/logs/activity.log

    2. Click Open.



Troubleshooting the administration of BPEL processes and human tasks

This article describes how to solve some common problems with BPEL processes and human tasks.

The following information can help you to debug problems with your BPEL processes and human tasks.



Troubleshooting escalation emails

Use this information to solve problems relating to escalation emails

Escalations are triggered when human tasks do not progress as expected. The escalation creates work items. It can also send emails to the users that are affected by the escalation. If you are having problems with escalation emails, use the information here to help you to solve the problems.



Troubleshooting people assignment

Use the following information to help solve problems relating to the assignment of people to authorization roles. This information covers the following problems:

You can also search for additional information in the Technical support search page.

User cannot administer or monitor process, scope, or activity instances, and no administrative tasks are created

If process administration is restricted to system administrators, instance-based administration is disabled, and all administrative actions on processes, scopes, and activities are limited to users in the BPESystemAdministrator role. For more information about this administration mode, see Alternative administration modes for BPEL processes.

If the Business Flow Manager has been switched to run in the alternate mode, you might need to perform one of the following actions:

  • Make sure that all users and programs that perform administrative actions are using user IDs that are in the appropriate role. For example, BPESystemAdministrator or BPESystemmonitor.
  • Restore instance-based administration, by turning the alternate process administration authorization mode off. How to turn it off is described in Optimizing BPEL process administration.

Errors during the deployment of the people directory provider

If you are using the LDAP people directory provider, deployment might fail due to incorrect values of the provider configuration parameters.

  • Make sure that all mandatory parameters are set.
  • To set the baseDN parameter to the root of the LDAP directory tree, specify an empty string; set the baseDN parameter to two apostrophe (') characters (''). Do not use double quotation marks ("). Failure to set the baseDN parameter results in a NullPointerException exception at deployment time.

Entries in the people directory are not reflected in work item assignments

The maximum number of user IDs retrieved by a people query is specified by the Threshold variable, which is defined in the XSL transformation file in use. The sample XSL transformation file used for the LDAP people directory provider is LDAPTransformation.xsl.

    install-root/ProcessChoreographer/Staff

This file is in the install-root/ProcessChoreographer/Staff directory. The default Threshold value is 1000000, therefore by default the threshold value is of no realistic importance. Do not lower this value without careful consideration.

  1. Create a new people directory provider configuration, providing your own version of the XSL file.
  2. Adapt the following entry in the XSL file according to your needs:

      <xsl:variable name="Threshold">1000000</xsl:variable>

Changes to the people directory are not immediately reflected in work-item assignments

Business Process Choreographer caches the results of people assignments evaluated against a people directory, such as an LDAP server, in the runtime database. When changes occur in the people directory, these are not immediately reflected in the database cache.

The Administration guide describes three ways to refresh this cache:

  • Refresh people query results, using the administrative console. Use this method if you have major changes and need to refresh the results for almost all people queries.
  • Refresh people query results, using administrative commands. Use this method if you write administration scripts using the wsadmin tool, or to immediately refresh all or a subset of the people query results.
  • Refresh people query results, using the refresh daemon. Use this method to set up a regular and automatic refresh of all expired people query results.

None of these methods can refresh the group membership association of a user for the Group verb. This group membership is cached in the user's login session (WebSphere security LTPA token), which by default expires after two hours. The group membership list of the process starter ID used for process navigation, is never refreshed.

Unexpected people assignments for tasks or process instances

Default people assignments are performed if you do not define people assignment criteria for certain roles for your tasks, or if people assignment fails or returns no result. These defaults might result in unexpected user authorization; for example, a process starter might receive process administrator rights. In addition, many authorizations are inherited by dependent artifacts. For example, the process administrator may also become the administrator of all inline tasks.

The following tables illustrate which defaults apply for which situation:

Roles for BPEL processes

Roles for BPEL processes If the role is not defined in the process model ... If the role is defined in the process model, but people assignment fails or does not return proper results ...
Process administrator Process starter becomes process administrator An exception occurs and the process is not started:

EngineAdministratorCannotBeResolvedException

Process reader No reader No reader

Roles for inline human tasks and their escalations

Roles for inline human tasks and their escalations If the role is not defined in the task model ... If the role is defined in the task model, but people assignment fails or does not return proper results ...
Task administrator Only inheritance applies Only inheritance applies
Task potential starter; applies to invocation tasks only Everybody becomes potential starter An exception occurs and the process is not started
Task potential owner Everybody becomes potential owner Administrators become potential owners
Task editor No editor No editor
Task reader Only inheritance applies Only inheritance applies
Escalation receiver Administrators become escalation receivers Administrators become escalation receivers

The following inheritance rules apply for inline tasks:

Roles for stand-alone human tasks and their escalations

Roles for stand-alone human tasks and their escalations If the role is not defined in the task model ... If the role is defined in task model, but people assignment fails or does not return correct results ...
Task administrator Originator becomes administrator The task is not started
Task potential instance creator Everybody becomes potential instance creator An exception is thrown and the task is not created
Task potential starter Originator becomes potential starter An exception is thrown and the task is not started
Potential owner Everybody becomes potential owner Administrators become potential owners
Editor No editor No editor
Reader Only inheritance applies Only inheritance applies
Escalation receiver Administrators become escalation receivers Administrators become escalation receivers

The following inheritance rules apply for stand-alone tasks:

When a method is invoked using the Business Flow Manager API, members of the BPESystemAdministrator role have administrator authorization, and members of the BPESystemMonitor role have reader authorization.

When a method is invoked using the Human Task Manager API, members of the TaskSystemAdministrator role have administrator authorization, and members of the TaskSystemMonitor role have reader authorization.

Stopped human tasks

If you encounter one or more of the following problems:

  • Human tasks cannot be claimed, even though the BPEL process started navigating successfully.
  • The SystemOut.log file contains the following message: CWWB0057I: Activity 'MyStaffActivity' of processes 'MyProcess' has been stopped because of an unhandled failure...

These problems indicate that administrative security might not be enabled. Human tasks and processes that use people authorization require that security is enabled and the user registry is configured. Take the following steps:

  1. Check that administrative security is enabled. In the administrative console, go to Security > Global security and make sure the Enable administrative security check box is selected.
  2. Check the user registry is configured. In the administrative console, go to Security > User Registries and check the Active user registry attribute.

  3. Restart the activity, if stopped.

Error and warning messages relating to people assignment

Some common errors can occur when accessing a people directory during people assignment. To see details for these errors, you can enable tracing with the following trace settings: com.ibm.bpe.*=all: com.ibm.task.*=all:com.ibm.ws.staffsupport.ws.*=all

The following common error situations are indicated by warning or error messages:

  • Could not connect to LDAP server in the trace.log file indicates failure to connect to the LDAP server. Check the network settings, the configuration (especially the provider URL) for the people directory provider you use, and verify whether your LDAP server requires an SSL connection.
  • javax.xml.transform.TransformerException: org.xml.sax.SAXParseException: Element type "xsl:template" must be followed by either attribute specifications, ">" or "/>" in the System.out or System.err files indicates the LDAPTransformation.xsl file cannot be read. Check your people assignment configuration and the configured XSLT file for errors.
  • LDAP object not found. dn: uid=unknown,cn=users,dc=ibm,dc=com [LDAP: error code 32 - No Such Object] in the trace.log file indicates that an LDAP entry cannot be found. Check the task model's people assignment criteria (verb) parameters and the LDAP directory content for mismatches in the task model.
  • Requested attribute "uid" not found in: uid=test222,cn=users,dc=ibm,dc=com in the trace.log file indicates that an attribute cannot be found in the queried LDAP object. Check the task model's people assignment criteria (verb) parameters and the LDAP directory content for mismatches in the task model. Also check the XSLT file of your people assignment configuration for errors.

Enable additional messages about people assignment decisions

You can set a custom property to log additional messages in the SystemOut.log. The messages record the following events:

  • If people resolution did not find any users for a task role, and default users were selected.

  • If you are using VMM, warnings when specified entities or specific attributes could not be found in the VMM people directory.

  • If you are using substitution, logs decisions whether or not users have been substituted.

Because these messages can significantly increase the amount of data in SystemOut.log, only enable these additional messages for testing or debugging purposes.

To enable the staff diagnosis feature perform the following steps:

  1. Use the administrative console, click Servers > Clusters > WebSphere application server clusters > cluster_name, then on the Configuration tab, in the Business Process Manager section, expand Business Process Choreographer, and click Human Task Manager.

  2. On the Configuration tab, set the value for the custom property Staff.Diagnosis to one of the following values:

    off

    Never writes additional people assignment information.

    on

    Always writes additional people assignment information.

    development_mode

    Only writes additional people assignment information when the server is running in development mode. this is the default value.

  3. Restart the server.

The following messages are generated:

  • Core.StaffDiagMsgIsEnabled=CWTKE0057I: The output of people (staff) resolution diagnosis messages is enabled. Indicates the diagnosis feature is enabled. This message is generated when the Human Task Manager is started.
  • Core.EverybodyIsPotInstanceCreator=CWTKE0047I: Everybody is potential instance creator for task {0}. Indicates that Everybody became the potential instance creator because no potential instance creator is defined.
  • Core.OriginatorBecomesPotStarter=CWTKE0046I: Originator becomes potential starter of task {0}. For stand-alone tasks only: Indicates the originator became the potential starter because no potential starter is defined.
  • Core.EverybodyIsPotentialStarter=CWTKE0045I: Everybody is potential starter of task {0}. For inline tasks only: Indicates that Everybody became the potential starter because no potential starter is defined.
  • Core.OriginatorBecomesAdministrator=CWTKE0044I: Originator becomes administrator of task {0}. Indicates the originator became the administrator because no administrator is defined.
  • Core.EscalationReceiverDoesNotExist=CWTKE0043W: Administrator(s) will be the escalation receiver(s) of the escalation {0}. Indicates the administrators became the escalation receivers because staff resolution for the escalation receivers either failed or returned an empty list. If no escalation receiver is defined, the default is Everybody, and a trace message is written.
  • Core.EverybodyIsPotentialOwner=CWTKE0014I: Everybody is potential owner of task {0}. Indicates that Everybody became the potential owner because no potential owner is defined.
  • Core.PotentialOwnerDoesNotExist=CWTKE0015W: Administrator(s) will be the potential owner(s) of the task {0}. Indicates the administrators became the potential owners because staff resolution for the potential owners either failed or returned an empty list. If no potential owner is defined, the default is Everybody, and a trace message is written.
  • StaffPlugin.VMMEntityNotFound=CWWBS0457W: The VMM entity could not be found, received VMM message is ''{0}''. Indicates that a specified VMM entity (a group or person) was not found in the people directory and the reason. People or groups that cannot be found in the people directory are not included in the people resolution result.
  • StaffPlugin.VMMEntityAttributeNotFound=CWWBS0454W: VMM entity ''{0}'' has no attribute with name ''{1}'' of type ''{2}''. Indicates that a specified attribute was not found when searching for a VMM entity (person) in the people directory. If no user email address is found, the user cannot receive email notifications for escalations. If no user preferredLanguage is found, the default language setting is used. If no substitution attributes (isAbsent or substitutes) are found when reading, an attempt is made to initialize the attributes. If no substitution attributes are found when writing or updating, an exception is generated.
  • StaffPlugin.VMMResultIsEmpty=CWWBS0456W: The VMM invocation returned no requested result entities. Indicates that a (get or search) invocation of VMM did not return any entities. No users are included in the people resolution result.

Issues with group work items and the "Group" people assignment criteria

If you are using the Group people assignment criteria, the following situations can occur:

  • Group members are not authorized, although the group name is specified:

    • Set the group short name when using the Local OS registry for WebSphere security, and the group dn when using the LDAP registry.

    • Make sure that you respect the case sensitivity of the group name.

    One possible reason for this situation is that you have configured the LDAP user registry for WebSphere security and selected the Ignore case for authorization option. If so, either clear the option, or specify LDAP group dn in all uppercase.

  • Changes in group membership are not immediately reflected in authorization. This might happen, when the affected user is still logged on. The group membership of a user is cached in her login session, and (by default) expires after two hours. You can either wait for the login session to expire (default is two hours), or restart the application server. The refresh methods offered by Human Task Manager do not apply for this people assignment criteria. Note the group membership list of the process starter is never refreshed.

Cleanup of stored people assignment results

People assignment results are stored in the database. All stored people assignment results are subject to people assignment refreshes. If the task template that contains the task instance that leads to the computation of a people assignment result is deleted, the stored people assignment result is deleted as well. However, the stored people assignment results are not deleted if only the task instances that are using the stored people assignment results are deleted.

To avoid large numbers of stored and unnecessary people assignment results in the database, take the following steps in the context of a task template:

  1. Assess whether your people assignment criteria definitions lead to shared or unshared people assignment results.

  2. If unshared assignment results occur, consider putting a cleanup procedure in place for people assignment results. Base the cleanup interval on the expected number of task instances, and the unshared people assignment results per cleanup interval. For more information on how to apply a script-based cleanup procedure, refer to Remove unused people query results, using administrative commands.

Adapted XSL transformation file has no effect

When adapting an XSL transformation file, the server needs to be restarted before the changes take effect. In addition, the adapted XSL file is applied only to newly deployed processes and tasks. The changes have no effect on processes and tasks that have been deployed before the XSL file was changed.



Troubleshooting Business Process Choreographer Explorer or Business Process Archive Explorer

Use this information to solve problems relating to Business Process Choreographer Explorer or Business Process Archive Explorer. Use the following information to solve problems relating to accessing or using Business Process Choreographer Explorer or Business Process Archive Explorer.

Errors while trying to access Business Process Choreographer Explorer or Business Process Archive Explorer from a browser

If you try to access Business Process Choreographer Explorer or Business Process Archive Explorer with a browser, but get an error message instead of the login page, try the following actions:

  • Use the administrative console to make sure the web client application BPCExplorer_scope or BPCArchiveExplorer_scopeis deployed and running on the server.

  • In the administrative console, on the page for the application, under "View Deployment Descriptor", verify the context root is the one you used when setting up the Business Process Choreographer Explorer or Business Process Archive Explorer.

  • Make sure that your virtual host configuration is correct. By default, the web modules of the Business Process Choreographer applications are configured for the default_host virtual host. Verify the host names and ports that you use to access the Business Process Choreographer Explorer or Business Process Archive Explorer are associated with the host alias.

Error message when using Business Process Choreographer Explorer or Business Process Archive Explorer

If you get an error message when using Business Process Choreographer Explorer or Business Process Archive Explorer, click the Search for more information link on the error page.

This starts a search for the error code on the IBM technical support site. This site only provides information in English. Copy the error message code that is shown on the Business Process Choreographer Explorer or Business Process Archive Explorer Error page to the clipboard. The error code has the format CWWBcnnnnc, where each c is a character and nnnn is a 4-digit number. Go to the technical support page. Paste the error code into the Additional search terms field, and click Go.

Error message StandardFaultException with the standard fault missingReply (message CWWBE0071E)

If you get a StandardFaultException error with the standard fault missingReply (message CWWBE0071E), this is a symptom of a problem with your process model. For more information about solving this, see Troubleshooting the administration of BPEL processes and human tasks.

Some items not displayed when you log on to Business Process Choreographer Explorer or Business Process Archive Explorer

If you can log on to Business Process Choreographer Explorer or Business Process Archive Explorer but some items are not displayed, or if certain actions are not enabled, this indicates a problem with your authorization. Possible solutions to this problem include:

  • Use the administrative console to ensure that WebSphere administrative security is enabled.
  • Check that you are logged onto Business Process Choreographer Explorer or Business Process Archive Explorer using the correct identity. Depending on the authorization granted to the user ID, the administrative views and options are might be not visible or not enabled.

  • Use IBM Integration Designer to check or modify the authorization settings defined in the BPEL process.

Error message CWWBU0001E or a communication error with the HTMConnection function

If you get the error message CWWBU0001E: "A communication error occurred when the BFMConnection function was called" or "A communication error occurred when the HTMConnection function was called", use the following information to help resolve the problem.

This error can indicate the process container or human task container has been stopped, and the client could not connect to the server. Verify the process container and the human task container are running and accessible. The nested exception might contain further details about the problem.

Error message WWBU0024E

If you get the error message WWBU0024E: "Could not establish a connection to local business process EJB" with a reason "Naming Exception", use the following information to help resolve the problem.

This error is thrown if users attempt to log on while the process container or Business Process Archive Manager is not running. Verify the application BPEContainer_InstallScope or BPArchiveMgr_InstallScope is running, where InstallScope is either the cluster_name or nodename_servername.



Troubleshooting Common Event Infrastructure

These topics provide troubleshooting information for the event service based on the task or activity you were doing when you encountered the problem.



Problems during startup

Troubleshoot problems with starting up your Common Event Infrastructure server.



Event service does not start (message CEIDS0058E)

The event service does not start and outputs message CEIDS0058E to the WebSphere log file.


Cause

The event service uses SQL statements qualified with the user name. This error indicates the user name used by the event service to connect to the event database is not the same as the user ID that was used to create the database.


Remedy

The user ID used to connect to the event database must be the same one used to create the event database. To correct this problem:

    • For a single server, select Servers > Application servers > server_name.

    • For a cluster, select Servers > Clusters > cluster_name.

  1. From the Configuration tab, select Business Integration > Common Event Infrastructure > Common Event Infrastructure Server.
  2. Change the specified user ID and password to match those used to create the database.

    Save the configuration changes.

  3. Restart the server.



Problems when sending events

Troubleshoot problems with sending events on your Common Event Infrastructure server.



Error when sending event (message CEIDS0060E)

My event source encounters an error when trying to send an event, and message CEIDS0060E appears in the WebSphere log file.


Cause

The event service uses metadata stored in the event database to map Common Base Event elements and attributes to database tables and columns. This information is read from the database the first time an application attempts to use the event service after startup.

The metadata tables are populated when the event database is created. This error occurs if the tables do not contain the required metadata at run time.


Remedy

To correct this problem, you need to re-create the required metadata. When the event database is created, the database configuration administrative command also generates a database script that can be used to repopulate the metadata at a later time. The name of this script depends on the database type:

Database type Script name
DB2 ins_metadata.db2
Informix ins_metadata.sql
Oracle ins_metadata.ora
SQL Server ins_metadata.mssql
DB2 UDB for iSeries ins_metadata.db2
DB2 for z/OS ins_metatdata.ddl

By default, the script is created in the profile_root/dbscripts/CEI_database_namenode directory. You can run this script at any time.

By default, the script, ins_metadata.ddl, is created in the profile_root/dbscripts/CEI_Databasename/ddl directory. You can run this script at any time using the DB2 SQL processor.

To re-create the metadata, use the appropriate SQL processor to run the script:

After repopulating the metadata, restart the server.



Error when sending event (ServiceUnavailableException)

My event source application encounters an error when trying to send an event to the event server. The log file indicates a ServiceUnavailableException with the message "A communication failure occurred while attempting to obtain an initial context with the provider URL."


Cause

This problem indicates the event source application cannot connect to the event server. This might be caused by either of the following conditions:


Remedy

To correct this problem, follow these steps:

  1. To check the status of the event server, go to the /WebSphere/V8R5/DeploymentManager/default/binprofile_root/bin directory and run the serverStatus command:

      serverStatus servername

  2. If the event server is not running, use the startServer command to start it:

      startServer servername

  3. Check the host name and Remote Method Invocation (RMI) port for the server containing the application that cannot connect to the event server. Confirm the same values are specified in the JNDI URL configured for the event source application. If the CEI server is located on another server, then the JNDI needs to be resolved with that remote deployment target.



Error when sending event (NameNotFoundException)

My event source application encounters an error when trying to send an event to the event service. The log file indicates a NameNotFoundException with a message like "First component in name events/configuration/emitter/Default not found."


Cause

This problem indicates the event service is not available and might be caused by either of the following conditions:


Remedy

To deploy the event service:

  1. Start the wsadmin tool.

  2. Use the AdminTask object to run the deployEventService administrative command.

  3. Restart the server.

To enable the event service using the wsadmin tool:

  1. Start the wsadmin tool.

  2. Use the AdminTask object to run the enableEventService administrative command.

  3. Restart the server.

To enable the event service using the administrative console:

  1. Click Applications > Application Types > WebSphere enterprise applications > server > Container Services > Common Event Infrastructure Service.

  2. Select the Enable service at server startup property.

  3. Click OK to save your changes.

  4. Restart the server.



Error when sending event (message CEIEM0025E)

My event source application encounters an error when trying to send an event to the event server. The log file indicates a DuplicateGlobalInstanceIdException.


Cause

This problem indicates the emitter submitted the event, but the event service rejected it because another event already exists with the same global instance identifier. Each event must have a unique global instance identifier, specified by the globalInstanceId property.


Remedy

To correct this problem, do one of the following:



Error when sending event (message CEIEM0034E)

My event source encounters an error when trying to send an event to the event service. The log file indicates an EmitterException with the message "The JNDI lookup of a JMS queue failed because the JNDI name defined in the emitter profile is not bound in the JNDI."


Cause

This problem indicates the JMS transmission configuration being used by the emitter specifies one or more JMS resources that are not defined in the JMS configuration.


Remedy

To correct this problem:

  1. In the administrative console, click Service integration > Common Event Infrastructure > Event emitter factories > emitter_factory > JMS transmission settings. Make sure you are viewing the JMS transmission for the emitter factory used by your event source application.
  2. Check the values specified for the Queue JNDI name and Queue connection factory JNDI name properties. Verify the specified JNDI names exist in the JNDI namespace and are valid JMS objects. If necessary, modify these properties or create the required JMS resources.



Event is not valid (message CEIEM0027E)

My event source is trying to send an event, but the emitter does not submit it to the event service and outputs message CEIEM0027E to the log file ("The emitter did not send the event to the event server because the Common Base Event is not valid").


Cause

This message indicates that one or more of the event properties contain data that does not conform to the Common Base Event specification. There are many ways in which event data might not be valid, including the following:


Remedy

To correct this problem:

  1. Check the detailed exception message in the log file to determine which event property is not valid. For example, this messages indicates the length of the global instance identifier (ABC) is not valid:
    Exception: org.eclipse.hyades.logging.events.cbe.ValidationException
      : IWAT0206E The length of the identifier in the specified Common      Base Event property is outside the valid range of 32 to 64
         characters.
      Property: CommonBaseEvent.globalInstanceId
      Value: ABC
  2. Correct the event content at the source so it conforms to the Common Base Event specification.
  3. Resubmit the event.



Synchronization mode not supported (message CEIEM0015E)

My event source is trying to send an event, but the emitter does not submit it to the event service and outputs message CEIEM0015E to the log file ("The emitter does not support the specified synchronization mode").


Cause

This problem indicates the parameters passed by the event source when sending the event specify a synchronization mode that is not supported by the emitter. This can be caused be either of the following conditions:


Remedy

If the exception message indicates that your event source is specifying a synchronization mode that is not valid (IllegalArgumentException), check the method call that is trying to send the event. Verify the method parameters specify one of the valid synchronization modes:

These constants are defined by the com.ibm.events.emitter.SynchronizationMode interface.

If the exception message indicates the specified synchronization mode is not supported by the emitter (SynchronizationModeNotSupportedException), check the emitter factory configuration:

  1. In the administrative console, click Service Integration > Common Event Infrastructure > Event Emitter Factories > emitter_factory. Make sure you are viewing the emitter factory used by the event source application.
  2. Check the emitter factory settings to see which synchronization modes are supported:

    • If the Support Event Service transmission property is selected, synchronous mode is supported.

    • If the Support JMS transmission property is selected, asynchronous mode is supported.

    Querying transaction modes: An event source can programmatically query the supported transaction modes for a particular emitter by using the isSynchronizationModeSupported() method. Refer to the Javadoc API documentation for more information.

  3. If the emitter does not support the synchronization mode you are trying to use, either change the emitter factory configuration or modify your event source to use a supported synchronization mode.



Transaction mode not supported (message CEIEM0016E)

My event source is trying to send an event, but the emitter does not submit it to the event service and outputs message CEIEM0016E to the log file ("The emitter does not support the specified transaction mode").


Cause

This problem indicates the parameters passed by the event source when sending the event specify a transaction mode that is not supported by the emitter. This can be caused be either of the following conditions:


Remedy

To correct this problem, check the method call that is trying to send the event and make sure the method parameters specify the correct transaction mode:



Problems when receiving or querying events

Troubleshoot problems involving receiving and querying events with your Common Event Infrastructure server.



Error when querying events (message CEIDS0060E)

My event consumer encounters an error when trying to query events from the event service, and message CEIDS0060E appears in the WebSphere log file.


Cause

The event service uses metadata stored in the event database to map Common Base Event elements and attributes to database tables and columns. This information is read from the database the first time an application attempts to use the event service after startup.

The metadata tables are populated when the event database is created. This error occurs if the tables do not contain the required metadata at run time.


Remedy

To correct this problem, you need to re-create the required metadata. When the event database is created, the database configuration administrative command also generates a database script that can be used to repopulate the metadata at a later time. The name of this script depends on the database type:

Database type Script name
DB2 ins_metadata.db2
Informix ins_metadata.sql
Oracle ins_metadata.ora
SQL Server ins_metadata.mssql
DB2 UDB for iSeries ins_metadata.db2
DB2 for z/OS ins_metatdata.ddl

By default, the script is created in the profile_root/dbscripts/CEI_database_namenode directory. You can run this script at any time.

By default, the script, ins_metadata.ddl, is created in the profile_root/dbscripts/CEI_Databasename/ddl directory. You can run this script at any time using the DB2 SQL processor.

To re-create the metadata, use the appropriate SQL processor to run the script:

After repopulating the metadata, restart the server.



Events not being stored in the persistent data store

My event source application is successfully submitting events to the emitter, but when an event source queries the events, they are not in the persistent data store.


Cause

This problem indicates the emitter is not sending events to the event service, or the event service is not storing the events to the persistent data store. This can be caused be any of the following conditions:


Remedy

To verify the persistent data store is enabled for the event service:

  1. In the administrative console, click Service integration > Common Event Infrastructure > Event service > Event services > event_service.

  2. Verify the Enable event data store check box is selected.

  3. Click OK to save any changes.

To verify the event group is configured for event persistence:

  1. In the administrative console, click Service integration > Common Event Infrastructure > Event service > Event services > event_service > Event groups > event_group.

  2. Verify the Persist events to event data store check box is selected.

  3. Click OK to save any changes.

Multiple event groups: An event might belong to multiple event groups. If any applicable event group is configured for persistence, and the data store is enabled, the event is stored in the data store.

To check the filter settings:

  1. In the administrative console, click Service integration > Common Event Infrastructure > Event emitter factories > emitter_factory > Event filter. (Make sure you are viewing the settings for the emitter factory your event source application is using.)
  2. Check to see whether the filter configuration string excludes the events you are trying to send to consumers. If so, you can either modify the filter configuration string or modify the event data so the events are not filtered out.

  3. Click OK to save any changes.



Events not being received by consumers (no error message)

My event source application is successfully submitting events to the emitter, but the events are not received by consumers using the JMS interface.


Cause

This problem can be caused be any of the following conditions:


Remedy

The remedy for this problem depends upon the underlying cause.



Events not being received by consumers (NameNotFoundException)

My event source application is successfully submitting events to the emitter, but the events are not published to consumers using the JMS interface, and the log file indicates a NameNotFoundException.


Cause

This problem indicates the event group configuration specifies one or more JMS resources that do not exist.


Remedy

To correct this problem:

  1. In the administrative console, click Service integration > Common Event Infrastructure > Event service > Event services > event_service > Event groups > event_group.

    Multiple event groups: An event might belong to more than one event group.

  2. Check the values of the Topic JNDI name and Topic connection factory JNDI name properties. Verify the specified JMS resources exist. If necessary, use the configuration interface of your JMS provider to create the necessary resources.



Event group with extended data elements contains no events

I have defined an event group that specifies extended data element predicates, but queries on the event group do not return the expected events.


Cause

The event data might be valid XML but not conform with the Common Base Event specification. This can cause unexpected results without any error messages.

Consider an event with the following content:

<?xml version="1.0" encoding="ASCII"?>
<!-- Event that will match the XPath expression CommonBaseEvent         [@globalInstanceId] -->
<CommonBaseEvent
        xmlns:xsi="http://www.w3.org/TR/xmlschema-1/"
        xmlns:="http://www.ibm.com/AC/commonbaseevent1_0_1"
        version="1.0.1"
        creationTime="2005-10-17T12:00:01Z"
        severity="10"
        priority="60"
    >
    <situation categoryName="RequestSituation">
        <situationType xsi:type="RequestSituation"
            reasoningScope="INTERNAL"
            successDisposition="Suceeded"
            situationQualifier="TEST"
        />
    </situation>
    <sourceComponentId
        component="component"
        subComponent="subcomponent"
        componentIdType="componentIdType"
        location="localhost"
        locationType="Hostname"
        componentType="sourceComponentType"
    />
    <extendedDataElement name="color" type="string">
        <values>red</values>
    </extendedDataElement>
</CommonBaseEvent>
This event contains a single extended data element with a single child element.

Now consider an event group definition configured with the following XPath event selector string:

CommonBaseEvent[extendedDataElements[@name='color' and @type='string'
        and @values='red']]
This event selector fails to match the event because the XML definition of the event contains a misspelling. In the event data, the extendedDataElements element is misspelled as extendedDataElement. Because this is well-formed XML, it does not cause an error; instead, it is treated as an any element, which is not searchable.


Remedy

Verify the XML data for submitted events conforms to the Common Base Event specification.



Error when querying an event group (message CEIES0048E)

My event consumer application encounters an error when trying to query events from an event group. The log file indicates an EventGroupNotDefinedException and shows message CEIES0048E ("The event group is not defined in the event group list the event server instance is using.")


Cause

This problem indicates the event consumer application performed a query using the EventAccess bean, but the consumer specified an event group name that does not correspond to any existing event group.


Remedy

To correct this problem:

  1. In the administrative console, click Service integration > Common Event Infrastructure > Event service > Event services > event_service > Event groups. The table shows a list of all event groups defined for the event service.

  2. Verify the event source specifies a defined event group name in the parameters of the query method call.



Miscellaneous problems

Troubleshoot miscellaneous problems with your Common Event Infrastructure server.



Event catalog pattern query fails on a Windows system

I am trying to do a pattern query for event definitions on a Windows system using the eventcatalog command. For example: eventcatalog -listdefinitions -name EVENT% -pattern. I don't get the expected results.


Cause

The percent character (%) is a reserved character in the Windows command-line interface and is not passed properly to the eventcatalog command.


Remedy

On Windows systems, escape the percent character character by typing %%:



Troubleshooting the business process rules manager

Some of the problems you might encounter using the business process rules manager are login errors, login conflicts, and access conflicts.

You can take various steps to troubleshoot these problems.



Resolve login errors

A log in error occurs upon logging in. The login error message is as follows: Unable to process login. Check user ID and password and try again.

Login errors occur only when administrative security is enabled and either the user ID, password, or both, are incorrect.

To resolve login errors.

  1. Click OK on the error message to return to the Login page.

  2. Enter the valid User ID and Password.

    • If passwords are case sensitive, make sure that Caps Lock key is not on.

    • Verify the user ID and password are spelled correctly.
    • Check with the system administrator to be sure the user ID and password are correct.

  3. Click Login.


If you resolve the login error, you will now be able to log in to the business process rules manager. If the error is not resolved, contact your system administrator.



Resolve login conflict errors

A login conflict error occurs when another user with the same user ID is already logged in to the application.

The login conflict message is as follows:

Another user is currently logged in with the same User ID. Select from the following options:

Usually this error occurs when a user closed the browser without logging out. When this condition occurs, the next attempted login before the session timeout expires results in a login conflict.

Login conflict errors occur only when administrative security is enabled.

To resolve login conflict errors, select from the following three options:



Resolve access conflict errors

An access conflict error occurs when a business rule is updated in the data source by one user at the same time another user is updating the same rule.

This error is reported when you publish your local changes to the repository.

To correct access conflict errors, perform the following actions:



16. Troubleshooting WAS

Because IBM Business Process Manager is built on IBM WebSphere Application Server, the function that you are having problems with might be provided by the underlying WAS. You might want to consult troubleshooting information in the WebSphere Application Server documentation.

More specifically, IBM Business Process Manager is built on WAS, Network Deployment.

For more information about troubleshooting in WebSphere Application Server, Network Deployment, see "Troubleshooting and support" in the WAS, Network Deployment, Information Center.


Troubleshooting and support


17. Tools for troubleshooting applications

IBM Business Process Manager and Integration Designer include several tools you can use to troubleshoot applications that you develop and deploy on the server.

During development of applications, you can use debugging tools in Integration Designer. You can implement runtime troubleshooting capabilities into applications using logging, tracing, and service component event monitoring. Administrators of running applications can use the failed event manager to view, modify, resubmit, and delete failed operations between Service Component Architecture (SCA) components.



Debugging applications in IBM Integration Designer

To debug applications running on IBM Business Process Manager, use your application development tool, such as IBM Integration Designer. For more information about debugging applications, see "Component debugging".


Component debugging


Use logging, tracing, and monitoring in applications

Designers and developers of applications that run on IBM Business Process Manager can use capabilities such as monitoring and logging that add troubleshooting features to applications.

IBM Business Process Manager is built on IBM WebSphere Application Server, Network Deployment. See the topic "Adding logging and tracing to your application" in the WebSphere Application Server Information Center. To use logging, tracing, and monitoring with applications, perform the steps in the Procedure section.


Add logging and tracing to your application

Monitor events


Tracing problems related to persistence

When you encounter problems related to persisting data at run time, traces can be useful in analyzing the source of the problem. Business processes model the flow of business in an organization. The model at run time stores and retrieves data such as employee names and bank accounts. The storage of data and access to data is transparent to the business process user but it is important for the business process application developer to understand how database, and EJBs store that data in the background. Moreover, when performance problems arise linked to persistence, the application developer needs to have appropriate traces to diagnose where the problems likely reside.

A detailed log setting can be used to capture a trace of persistence problems. To set this detailed log on, follow these steps:

  1. In the Integrated Solutions Console, expand Troubleshooting and select Logs and trace.

  2. In the Logging and tracing page, click your server name. On the following page, click Change log detail levels. These log levels let you control the events that are captured by the logs.

  3. Select either Configuration or Runtime. Adding a trace using the Configuration tab will require restarting the server. Adding a trace using the Runtime tab will take effect immediately.

  4. Expand All Components. Scroll down the list to WLE.wle_repocore.* Selecting this item or some of its subsets results in logging traces of persistence-related elements in the repository. You can choose to log all messages and traces or a specific level.
  5. These logs can help you or those in support analyzing performance problems linked to persistence. Once selected, remember to save your changes to logging and tracing in the Integrated Solutions Console.


IBM Support Assistant Data Collector

Use IBM Support Assistant Data Collector, which is installed with BPM, you can search for information, investigate problems, and submit a problem report to IBM. You might have more than one version of IBM Support Assistant on your system. Regardless of which version of IBM Support Assistant you use, if you run the commands provided in the following topics, you will collect the appropriate information for BPM.

The IBM Support Assistant Data Collector is not supported for z/OS.



Related tasks:

Contacting IBM Software Support

Collecting data in graphical mode with IBM Support Assistant Data Collector

You can use a web-based graphical user interface to collect data with IBM Support Assistant Data Collector. The IBM Support Assistant Data Collector is not supported for z/OS.

To run the data collector from a web browser:

  1. Ensure that your Java environment is configured correctly:

    1. Verify that your Java runtime environment is at level 1.5 or higher.

    2. Verify the location of the Java runtime environment is included in your PATH environment setting. If the location is not included in your path, set the variable JAVA_HOME to point to the Java runtime environment.

      • For example, if you have a Java Development Kit installed at C:\jre1.5, use the command:

          SET JAVA_HOME=C:\jre1.5

      • For example, if you are using the bash shell and you have a Java Development Kit installed at /opt/jre15, use the command:

          export JAVA_HOME=/opt/jre15

  2. In a web browser, open the following file:

      WAS_install_root/BPM/isadc/bpm/index.html

  3. On the main page, select a problem type.

  4. Click Start. The collection script runs and prompts you for additional information. The information can include configuration information or the sequence of events leading to the problem. The script might also prompt you for preferences for data collection. After it has all the necessary information, the script proceeds with the remainder of the collection. Typically, the collector takes about 15 to 20 minutes to run.
  5. At the end of the collection, you can send the results to IBM Support. You can choose HTTPS or FTP for file transfer. HTTPS is encrypted and FTP is unencrypted. The name of the compressed file the tool sends to IBM Support is shown in the collection status area.



Related concepts:

Selecting a problem type for IBM Data Collector


Related tasks:

Collecting data in console mode with IBM Support Assistant Data Collector

Collecting data in console mode with IBM Support Assistant Data Collector

With console mode, you record your responses in a response file. You can then use the response file to run the same collection script and generate reports silently as many times as required. The IBM Support Assistant Data Collector is not supported for z/OS.

To run the data collector and generate problem reports silently:

  1. Ensure that your Java environment is configured correctly:

    1. Verify that your Java runtime environment is at level 1.5 or higher.

    2. Verify the location of the Java runtime environment is included in your PATH environment setting. If the location is not included in your path, set the variable JAVA_HOME to point to the Java runtime environment.

      • Windows: For example, if you have a Java Development Kit installed at C:\jre1.5, use the command:

          SET JAVA_HOME=C:\jre1.5

      • UNIX: For example, if you are using the bash shell and you have a Java Development Kit installed at /opt/jre15, use the command:

          export JAVA_HOME=/opt/jre15

  2. Start the script from a command window.

    1. Go to the following directory:

        WAS_install_root/BPM/isadc/bpm/

    2. Run the following command:

        isadc.sh

      Ensure the script is executable file. If necessary, use the following command to change the file permissions:

        chmod 755 isadc.sh

    The IBM Support Assistant Data Collector starts in console mode.

  3. Create a response file.

      isadc.sh -record response.txt

    You can specify your own file name for response.txt.

    When the data collector runs in console mode, there are no selection lists or entry fields for user input. Instead, available choices are presented as numbered lists and you enter the number of your selection followed by the Enter key. Input fields are transformed into prompts, at which you enter your response and press Enter.

    To stop the collector tool, type the quit option.

  4. Run the tool using a response file.

      isadc.sh response.txt

    The response file is a plain text file. You can edit it to modify values as needed. For example, you can use the file on another computer after adjusting the response file values to reflect settings for the local computer.

    Remember that sensitive information, such as user names and passwords, might be stored in the response file. Manage the file carefully, to prevent unauthorized access to important information.

    Some data collection sessions require interaction with the user, and thus are not suitable for the silent collection option. For example, IBM Support might ask you to reproduce a problem during data collection, in order to collect log and trace files. In this case, silent collection cannot record and reproduce all steps.



Related concepts:

Selecting a problem type for IBM Data Collector


Related tasks:

Collecting data in graphical mode with IBM Support Assistant Data Collector

Selecting a problem type for IBM Data Collector

You can use IBM Support Assistant Data Collector to collect information about a number of specific problems. Choose the collector that is appropriate for the troubleshooting issue that you are trying to solve.

Generic problem collector

Use this collector only if the product installed successfully, no server processes are running, and a server profile exists. The generic problem collector gathers configuration information and logs from a profile.

Log and trace collector

Use this collector only if the product installed successfully. This utility collects log and trace files for a profile. It is faster than the generic problem collector.

Migration problem

Use this collector if a problem occurs during migration. Before you can use this collector, the product must have installed successfully, a server profile must exist, and no server processes can be running.

Installation problem

Use this collector if a problem occurs during installation. Before you can use this collector, IBM Installation Manager must have installed successfully.

Profile wizard problem

Use this collector if you encounter a problem when creating or augmenting a profile. Before you can use this collector, IBM Business Process Manager must have installed successfully.

Reproduce problem with tracing level set

Use this collector to set trace levels to reproduce a problem. The utility starts and stops the server.

Be aware that this collector makes changes to the server tracing and tries to restore the original settings. Do not run this collector to avoid these actions.

Before you can use this collector, IBM Business Process Manager must have installed successfully. A profile must exist and you must be able to start it successfully.



Related tasks:

Collecting data in graphical mode with IBM Support Assistant Data Collector

Collecting data in console mode with IBM Support Assistant Data Collector


Troubleshooting Service Component Architecture processing and call chains

Cross-Component Trace identifies whether a Service Component Architecture (SCA) operation completed successfully. It allows you to identify systemout.log or trace.log data that is associated with BPM and WebSphere Enterprise Service Bus modules and components. The log records associated with the WebSphere Enterprise Service Bus applications hold information about errors or events that occurred during processing and can be used for problem determination using IBM Integration Designer.

Events that can be captured include:

You can access the Cross-Component Trace page from the administrative console by clicking Troubleshooting > Cross-Component Trace.


Handling and deleting collected data

Consider the following with regard to handling and deleting data collected by Cross-Component Trace:


Cross-Component Trace settings and call chain processing

The information in this section describes the effect that Cross-Component Trace configuration settings have on call-chain processing. It also includes a description of various Cross-Component Trace configurations and explains the call chain events that result from the configurations.

The following list includes general rules on call chain processing and Cross-Component Trace configuration decisions:

The following illustration is of two servers (Server A and Server B), both with Cross-Component Trace enabled. Server A has the Trace all value set to "Yes", while Server B has Trace all set to "No".

Figure 1. A remote messaging and remote support topology

For the Cross-Component Trace configuration scenario illustrated in Figure 1, call chain events would result on Server A, but not on Server B.

The following illustration is of two servers (Server A and Server B), both with Cross-Component Trace enabled. Server A has the Trace all value set to No and it includes Module A as a module on which to enable Cross-Component Trace. Server B has Trace all set to No and has no SCA modules selected for Cross-Component Trace.

Figure 2. A remote messaging and remote support topology

For the Cross-Component Trace configuration scenario illustrated in Figure 2, call chain events would result on Server A. Trace activity for all Module A operations are written to the log on Server A. Any calls made from Module A to applications or services on Server B, results in call chains. The call chains on Server B would only pertain to those calls that came from Module A (because that module is configured for Cross-Component Trace).



Related concepts:

Cross-Component Trace


Work with failed events in BPM

If an event fails, it is stored in a database in the failed events manager. Use the failed event manager to search for and handle failed events.

Actions for handling failed events include examining the types of data associated with the event (business, trace, or expiration data) to determine the cause of the failure. Actions also include editing the data, resubmitting the event, or both.

The following types of failed events can occur:

To view, modify, resubmit, or delete any failed event, the first step is to display the failed event manager. Click Servers > Deployment Environments > env_name > Failed Event Manager.



Security considerations for recovery

If you have enabled security for your IBM Business Process Manager applications and environment, it is important to understand how role-based access and user identity affect the Recovery subsystem.

Role-based access for the failed event manager

The failed event manager uses role-based access control for the failed event data and tasks. Only the administrator and operator roles are authorized to perform tasks within the failed event manager. Users logged in as either administrator or operator can view all data associated with failed events and can perform all tasks.

Event identity and user permissions

A failed event encapsulates information about the user who originated the request. If a failed event is resubmitted, its identity information is updated to reflect the user who resubmitted the event. Because different users logged in as administrator or operator can resubmit events, these users must be given permissions to the downstream components required to process the event.

For more information about implementing security, see Securing applications and their environment.



Finding failed events

Use the failed event manager to help you search for failed events on all of the servers within the deployment environment. You can search for all failed events, or for a specific subset of events.

One method of finding failed events is to click Get all failed events on the failed event manager main page; this action returns a list of all Service Component Architecture (SCA) and Java™ Message Service (JMS) failed events on the server. If Business Process Choreographer is installed, the query also returns failed, terminated, and stopped Business Process Choreographer events. You can use this search option when you have only a few failed events on the system, or when it is not necessary or possible to narrow the search results by criteria.

When you have a large number of failed events on the system, or when you want to examine only those events associated with a particular set of criteria, use a criteria-based search. The failed event manager supports searching based on the module, component, method, time period, session, qualifier, or exception associated with the failed event. Refer to the following table for more information about how and when to use the different criteria.

`

Search criteria

Search criteria Field or fields to use Supported event types Usage notes
The module, component, or method the event was en route to when it failed. Module

Component

Operation

SCA

JMS

WebSphere MQ

Business Process Choreographer

Business Flow Manager hold queue

Use one or more of these fields to search for failed events associated with a specific module, component, or method.
The time period during which the event failed From date

To date

SCA

JMS

WebSphere MQ

Business Process Choreographer

Business Flow Manager hold queue

Formats for date and time are locale-specific. An example is provided with each field. If the value you provide is not formatted correctly, the failed event manager displays a warning and substitutes the default value for that field.

The time is always local to the server. It is not updated to reflect the local time of the individual workstations running the administrative console.

The session in which the event failed Session ID SCA None
The module or component from which the event originated Source module

Source component

SCA Use one or both of these fields to find only those failed events that originated from a specific source module or component. The failed event manager determines the source based on the point of failure, regardless of interaction type.
The type of business object in the failed event Business object type SCA None
Whether the event had the event sequencing qualifier specified Event sequencing qualifier SCA None
Whether the event caused the store to be started Store and forward qualifier SCA

Business Process Choreographer

None
Whether the event was caused because a failure response could not be sent to Business Process Choreographer Process response qualifier SCA None
The exception thrown when the event failed Exception text SCA Specify all or part of the exception text in the field to find all events associated with that exception.

For more information about searching for failed events, see the online help available in the failed event manager.


Finding business process instances or Common Base Events related to a failed event

If a failed event is generated from a business process, the failed event manager provides a link to view that business process instance in Business Process Choreographer Explorer. In addition, a failed event can be related to one or more Common Base Events; in this case, the failed event manager has a link to those related events. A common session ID links the failed event with related business process instances or Common Base Events. Examining these related process instances or events can give you additional information about how or why the event failed.



Related tasks:

Troubleshooting Service Component Architecture and WebSphere MQ communications


Manage failed SCA events

When problems processing a Service Component Architecture (SCA) request or response message create a failed SCA event in the Recovery subsystem, you must decide how to manage that event. Use the information in this topic to help you identify and fix the error and clear the event from the Recovery subsystem. A Service Component Architecture (SCA) event is a request or response that is received by a service application. It might come from an external source, such as an inbound application adapter, or an external invocation to a web service. The event consists of a reference to the business logic that it wants to operate and its data, which is stored in a Service Data Object. When an event is received, it is processed by the appropriate application business logic.

A single thread of execution might branch off into multiple branches. The individual branches are linked to the main invoking event by the same session. If the business logic in a branch cannot run due to system failure, component failure, or component unavailability, a failed event is created. If multiple branches fail, a failed event is created for each branch. The Recovery service handles the following types of failed SCA events:

The Recovery service does not handle failures from synchronous invocations.

The Recovery service sends failed SCA asynchronous interactions to failed event destinations that have been created on the deployment environment bus. The data for failed events is stored in the failed event database (by default, WPCRSDB). You can find the data in the failed event manager.

Failed SCA events typically have source and destination information associated with them. The source and destination are based on the failure point (the location where the invocation fails), regardless of the type of interaction. Consider the following example, where Component A is asynchronously invoking Component B. The request message is sent from A to B, and the response callback message is sent from B to A.

This pattern is true for all asynchronous invocations.

Because runtime exceptions are not declared as part of the interface, component developers should attempt to resolve the exception and thus prevent a runtime exception from inadvertently being propagated to the client if the client is a user interface.

To manage a failed SCA event.

  1. Use the failed event manager to locate information about the failed SCA event, taking note of the exception type.

  2. Locate the exception type in Table 1 to determine the location and possible causes of the error, as well as suggested actions for managing the failed event.

    Failed SCA events

    Exception type Possible cause of error Suggested action
    ServiceBusinessException A business exception occurred during the execution of a business operation. Look at the exception text to determine the exact cause, and then take appropriate action.
    ServiceExpirationRuntimeException A SCA asynchronous message has expired. Set the expiration time using the RequestExpiration qualifier on the service reference.

    Investigate why the service is not responding fast enough.

    ServiceRuntimeException A runtime exception occurred during the invocation or execution of a service. Look at the exception text to determine the exact cause, and then take appropriate action.
    ServiceTimeoutRuntimeException Response to an asynchronous request was not received within the configured period of time. Set the expiration time using the RequestExpiration qualifier on the service reference.

    Investigate why the service is not responding fast enough.

    ServiceUnavailableException This exception is used to indicate there was an exception thrown while invoking an external service via an import. Look at the exception text to determine the exact cause, and then take appropriate action.
    ServiceUnwiredReferenceRuntimeException A SCA reference used to invoke a service is not wired correctly. Look at the exception text to determine the exact cause, and then take appropriate action to correctly wire the SCA reference.



Manage failed JMS events

The Java™ Message Service (JMS) binding type and configuration determine whether a failed event is generated and sent to the failed event manager. When problems processing a JMS request or response message create a failed JMS event in the Recovery subsystem, you must decide how to manage that event. Use the information in this topic to help you identify and fix the error and clear the event from the Recovery subsystem. You can use the recovery binding property to enable or disable recovery for each JMS binding at authoring time. You can set the recoveryMode property to one of the following settings:

Recovery for JMS bindings is enabled by default. When it is enabled, JMS failed events are created in the following situations:

In addition, a failed SCA event is created when the ServiceRuntimeException exception is thrown in a JMS binding target component after another attempt in JMS.

These failures might occur during inbound or outbound communication. During outbound communication, JMSImport sends a request message and receives the response message. A failed event is generated if the JMS import binding detects a problem while it is processing the service response. During inbound communication, the following sequence of events occurs:

  1. JMSExport receives the request message.
  2. JMSExport starts the SCA component.
  3. The SCA component returns a response to JMSExport.
  4. JMSExport sends a response message.

A failed event is generated if the JMS export binding detects a problem while it is processing the service request.

The Recovery service captures the JMS message and stores it in a Recovery table in the Common database. It also captures and stores the module name, component name, operation name, failure time, exception detail, and JMS properties of the failed event. You can use the failed event manager to manage failed JMS events, or you can use a custom program.

To disable recovery, set the recoveryMode property to unmanaged.

If the recoveryMode property is missing for earlier versions of applications, the recovery capability is regarded as enabled. When recovery is disabled, a failed message is rolled back to its original destination and tried again. The system does not create a failed event.

To manage a failed JMS event.

  1. Use the failed event manager to locate information about the failed JMS event, taking note of the exception type.

  2. Locate the exception type in Table 1 to determine the location and possible causes of the error, as well as suggested actions for managing the failed event.

    Failed JMS events

    Exception type Location of error Possible cause of error Suggested action
    FaultServiceException Fault handler or fault selector There is malformed data in the JMS message.

    1. Inspect the JMS message and locate the malformed data.
    2. Repair the client that originated the message so it creates correctly formed data.
    3. Resend the message.
    4. Delete the failed event.

    There was an unexpected error in the fault handler or fault selector.

    1. Debug the custom fault selector or fault handler, fixing any errors identified.
    2. Resubmit the failed event.

    ServiceRuntimeException Fault handler The fault selector and runtime exception handler are configured to interpret the JMS message as a runtime exception. This is an expected exception. Look at the exception text to determine the exact cause, and then take appropriate action.
    DataBindingException or DataHandlerException Data binding or data handler There is malformed data in the JMS message.

    1. Inspect the JMS message and locate the malformed data.
    2. Repair the client that originated the message so it creates correctly formed data.
    3. Resend the message.
    4. Delete the failed event.

    There was an unexpected error in the data binding or data handler.

    1. Debug the custom data binding or data handler, fixing any errors identified.
    2. Resend the message.
    3. Delete the failed event.

    SelectorException Function selector There is malformed data in the JMS message.

    1. Inspect the JMS message and locate the malformed data.
    2. Repair the client that originated the message so it creates correctly formed data.
    3. Resend the message.
    4. Delete the failed event.

    There was an unexpected error in the function selector.

    1. Debug the custom function selector, fixing any errors identified.
    2. Resend the message.
    3. Delete the failed event.



Manage failed WebSphere MQ events

A WebSphere MQ event might fail if there is a problem such as a data-handling exception in the WebSphere MQ binding export or import used by an SCA module. When problems processing a WebSphere MQ request or response message create a failed WebSphere MQ event in the Recovery subsystem, you must decide how to manage that event. Use the information in this topic to help you identify and fix the error and clear the event from the Recovery subsystem. You can use the recovery binding property to enable or disable recovery for each WebSphere MQ binding at authoring time. You can set the recoveryMode property to one of the following settings:

Recovery for WebSphere MQ bindings is enabled by default. When it is enabled, WebSphere MQ failed events are created in the following situations:

In addition, a failed SCA event is created when the ServiceRuntimeException exception is thrown in a WebSphere MQ binding target component after another attempt in WebSphere MQ.

These failures might occur during inbound or outbound communication. During outbound communication, MQImport sends a request message and receives the response message. A failed event is generated if the WebSphere MQ import binding detects a problem while it is processing the service response. During inbound communication, the following sequence of events occurs:

  1. MQExport receives the request message.
  2. MQExport starts the SCA component.
  3. The SCA component returns a response to MQExport.
  4. MQExport sends a response message.

A failed event is generated if the WebSphere MQ export binding detects a problem while it is processing the service request.

The Recovery service captures the WebSphere MQ message and stores it in the failed event database. It also captures and stores the module name, component name, operation name, failure time, exception detail, and WebSphere MQ properties of the failed event. You can use the failed event manager to manage failed WebSphere MQ events, or you can use a custom program.

To disable recovery, set the recoveryMode property to unmanaged.

If the recoveryMode property is missing for earlier versions of applications, the recovery capability is regarded as enabled. When recovery is disabled, a failed message is rolled back to its original destination and tried again. The system does not create a failed event.

To manage a failed WebSphere MQ event.

  1. Use the failed event manager to locate information about the failed event, taking note of the exception type.

  2. Locate the exception type in Table 1 to determine the location and possible causes of the error, as well as suggested actions for managing the failed event.

    Failed WebSphere MQ events

    Exception type Location of error Possible cause of error Suggested action
    FaultServiceException Fault handler or fault selector There is malformed data in the WebSphere MQ message.

    1. Inspect the message and locate the malformed data.
    2. Repair the client that originated the message so it creates correctly formed data.
    3. Resend the message.
    4. Delete the failed event.

    There was an unexpected error in the fault handler or fault selector.

    1. Debug the custom fault selector or fault handler, fixing any errors identified.
    2. Resubmit the failed event.

    ServiceRuntimeException Fault handler The fault selector and runtime exception handler are configured to interpret the WebSphere MQ message as a runtime exception. This is an expected exception. Look at the exception text to determine the exact cause, and then take appropriate action.
    DataBindingException or DataHandlerException Data binding or data handler There is malformed data in the WebSphere MQ message.

    1. Inspect the message and locate the malformed data.
    2. Repair the client that originated the message so it creates correctly formed data.
    3. Resend the message.
    4. Delete the failed event.

    There was an unexpected error in the data binding or data handler.

    1. Debug the custom data binding or data handler, fixing any errors identified.
    2. Resend the message.
    3. Delete the failed event.

    SelectorException Function selector There is malformed data in the WebSphere MQ message.

    1. Inspect the message and locate the malformed data.
    2. Repair the client that originated the message so it creates correctly formed data.
    3. Resend the message.
    4. Delete the failed event.

    There was an unexpected error in the function selector.

    1. Debug the custom function selector, fixing any errors identified.
    2. Resend the message.
    3. Delete the failed event.



Manage stopped Business Process Choreographer events

Stopped events occur if a Business Process Execution Language (BPEL) instance encounters an exception and one or more activities enter the Stopped state. Use the failed event manager and Business Process Choreographer Explorer to manage stopped Business Process Choreographer events in any process state. You can view, compensate, or terminate the process instance associated with a stopped Business Process Choreographer event. In addition, you can work with the activities associated with the event, viewing, modifying, retrying, or completing them as appropriate. Business Process Choreographer events might cause an activity to stop or the process instance to fail if they are not handled by the process logic. A failed event is generated when a long-running Business Process Execution Language (BPEL) process fails and one of the following events occurs:

The Recovery service captures the module name and component name for failed Business Process Choreographer events. Failed event data is stored in the Business Process Choreographer database (BPEDB) database.

The Recovery service does not handle failures from business process and human task asynchronous request and reply invocations.

To manage stopped events originating from a long-running BPEL process.

  1. Ensure the administrative console is running.

  2. Open the failed event manager by clicking...

      Servers | Deployment Environments

    > env_name > Failed Event Manager.

  3. Perform a search to find the stopped Business Process Choreographer event or events you want to manage.

  4. For each stopped event you want to manage, do the following:

    1. Click the stopped event ID in the Event ID column of the Search Results page.

    2. From the event detail page, click Open calling process in Business Process Choreographer Explorer.

    3. Use Business Process Choreographer Explorer to manage the event and its associated activities.



Manage Business Flow Manager hold queue messages

You can use the failed event manager to manage navigation messages that are stored in the Business Flow Manager hold queue. A navigation message might be stored in the hold queue if an infrastructure, such as a database, is unavailable or if the message is damaged.

In a long-running process, the Business Flow Manager can send itself request messages that trigger follow-on navigation. These messages trigger either a process-related action ( starting a fault handler) or an activity-related action ( continuing process navigation at the activity). A navigation message always contains its associated process instance ID (piid). If the message triggers an activity-related action, it also contains the activity template ID (arid) and the activity instance ID (arid).

You can use the failed event manager to manage Business Flow Manager hold queue messages, or you can use a custom program.

You cannot delete Business Flow Manager hold queue messages in the failed event manager. If the related process instance does not exist, replay the hold queue message to delete the message.



Work with data in failed events

Each failed event has data about the event to help you identify when and where the failure occurred, including the event ID and status, the time it failed, and its deployment target. In addition, some types of failed events contain business data. You can browse the data for all failed events. In some cases, you can edit the data before resubmitting the event.

To browse failed event data, click any failed event ID. The failed event manager displays the data about the failed event data. For SCA events, you can edit the expiration and trace detail information from this detail page. If an event contains business data, the detail page has an Edit business data button. Click that button to open the business data editor, where you can browse and edit the business data. Note that you can edit only simple data types ( String, Long, Integer, Date, Boolean). If a data type is complex ( an array or a business object), you must navigate through the business data hierarchy until you reach the simple data types that make up the array or business object. Refer to the online help in the failed event manager for more information about viewing and editing this data.


Data associated with the failed event

All failed events have the following data:

In addition, the following types of events have data specific to the event type:

Failed event data based on event type

Event type Available data
SCA events

  • The session ID
  • The type of service invocation used between SCA components
  • The names of the module and component from which the event originated (the source)
  • The names of the destination module, component and method for the event
  • Whether an event sequencing qualifier has been declared for this event
  • The destination module where the event has been or will be resubmitted
  • The correlation ID, if one exists
  • The exception thrown when the event failed
  • The expiration date for resubmitted events (this data can be edited)
  • The trace control set for the event (this data can be edited)

JMS events

  • The type of service invocation used
  • The names of the destination module, component and method for the event
  • The exception thrown when the event failed
  • The destination module where the event has been or will be resubmitted
  • The correlation ID, if one exists
  • The expiration date for resubmitted events (this data can be edited)
  • The JMS-specific properties associated with the failed event:

    • The message type and priority
    • The JMS destination
    • The delivery mode
    • Redelivery data, including the redelivered count and redelivered indicator (true or false)
    • The destination replies are sent to for request-response or two-way interactions

WebSphere MQ events

  • The type of service invocation used
  • The names of the destination module, component and method for the event
  • The exception thrown when the event failed
  • The destination module where the event has been or will be resubmitted
  • The correlation ID, if one exists
  • The expiration date for resubmitted events (this data can be edited)

  • The WebSphere MQ-specific properties associated with the failed event:

    • The message type, format, and priority

    • The WebSphere MQ destination
    • The delivery mode
    • Redelivery data, including the redelivered count and redelivered indicator (true or false)
    • The reply-to queue and queue manager

Business Process Choreographer events

  • The names of the destination module and component for the event
  • The process instance name associated with the event
  • The top-level process ID associated with the event

Business Flow Manager hold queue events

  • The process instance ID (if the process instance does not exist, 0 is returned)
  • The name and state of the process instance
  • The name of the associated process template
  • The activity instance name and ID
  • The activity template ID


Business data

SCA and Business Process Choreographer failed events typically include business data. Business data can be encapsulated in a business object, or it can be simple data that is not part of a business object. Business data for SCA failed events can be edited with the business data editor available in the failed event manager.



Resubmitting failed events in BPM

You can resubmit a failed event in BPM from the failed event manager. You can resubmit an event without changes, or, in some cases, you can edit the trace and expiration data or the business data before you resubmit the event. In addition, you can use the failed event manager to resubmit failed events with a process response qualifier to either the request queue or the response queue.

When a failed event is resubmitted, the processing resumes only for the failed branch and not for the entire event. You can use the unique ID of the event to track its success or failure. If a resubmitted event fails again, it is returned to the failed event manager with its original event ID and an updated failure time.

If you have modified the trace control value, you can also trace resubmitted SCA events to monitor the event processing.

If one of the following conditions occurs while an application is running in a unit-test environment or a production environment, you cannot resubmit the failed event:



18. Development toolkit troubleshooting

These sections describe problems you might encounter while you are using the Business Monitor development toolkit.

You can find the latest troubleshooting tips as well as information about downloads and fixes in the Support and downloads tab of the following page: IBM Business Monitor and WebSphere Business Monitor detailed system requirements.


Business Monitor server fails to start in the test environment

If you used a non-administrative ID to install a IBM Business Monitor server in a Windows installation of IBM Integration Designer or IBM Rational Application Developer, it is possible the server will fail to start in the Servers view.

If the server fails to start, you must first restart Integration Designer or Rational Application Developer using a user account that has administrative authority.

  1. In Integration Designer or Rational Application Developer, select File > Exit to exit the application.

  2. Open Windows Explorer and navigate to the location where Integration Designer or Rational Application Developer is installed.

  3. Launch Integration Designer or Rational Application Developer again by completing one of the following steps:

    • On Windows XP, right-click the Integration Designer or Rational Application Developer executable and select Run As to open the Run As window, then specify the name of a user account that has administrative authority and click OK.
    • On Windows Vista or Windows 7, right-click the Integration Designer or Rational Application Developer executable and select Run As Administrator.

  4. In the Servers view, select the Business Monitor server and click the Start icon.

The server should successfully start. When you next launch Integration Designer or Rational Application Developer, you can launch it in your usual way and you will not need to launch it using the Run As or Run as Administrator menu item.



Java EE project generation results in an out-of-memory condition

When you generate Java EE projects for very large monitor models, it is possible that you may experience an out-of-memory condition in the Eclipse workspace of IBM Integration Designer or in IBM Rational Application Developer.

To resolve these out-of-memory conditions, it is recommended that you increase the Java maximum heap size for Eclipse.

  1. Edit the eclipse.ini file located in the installation path of Integration Designer or Rational Application Developer.
  2. Change the Xmx parameter to a larger value, such as -Xmx1280m, -Xmx1408m, or -Xmx1536m.

  3. RestartIntegration Designer or Rational Application Developer and then create a new workspace.
  4. Generate the Java EE project for your monitor model.


Additional information about increasing the maximum heap size is found in the technote How to avoid "Out of Memory" error in Headless WebSphere Integration Developer environment.



Server does not appear in the Servers view

After you have installed the IBM Business Monitor development toolkit, the IBM Business Monitor server should appear in the Servers view in Rational Application Developer or Integration Designer. Verify the profile was created and restart Rational Application Developer or Integration Designer using the -clean parameter.

  1. Verify the profile was created in the profile_root directory.

  2. Start Rational Application Developer or Integration Designer with the -clean parameter.

    1. Open a command prompt and navigate to the directory where Rational Application Developer or Integration Designer is installed.
    2. Type one of the following commands depending on the development environment:

      • Rational Application Developer: eclipse.exe -clean
      • Integration Designer: wid.exe -clean

  3. If the server still does not appear, create a new server following the instructions in Add a new server to the test environment.




Manually removing workspace server configuration

To avoid problems with unresolved project resources, you should not continue to use the Business Monitor development toolkit workspace after the toolkit has been uninstalled. However, to use the workspace for non-toolkit projects, you should first remove any remaining toolkit server configurations. When you remove Business Monitor development toolkit using IBM Installation Manager, IBM Business Monitor server configurations remain in the workspace. After the uninstallation completes successfully, you can use the following procedure to delete the configurations from the workspace.

  1. In the development application, select Window > Show View > Servers.

  2. In the Servers view, delete the server configurations that are applicable to the environment.
  3. When the Delete Server confirmation message is displayed, make selections pertaining to the deletion of actively running servers and click OK.



WebSphere Business Monitor TechNotes


Visual model testing returns unexpected results

When you are testing the visual model in the Monitor Model editor, you might get unexpected values if you use expressions containing user-defined XML Path Language (XPath) functions with certain dependencies. These functions might not evaluate as expected if they require database access, need to run in a Java™ 5 Platform, Enterprise Edition container, have dependencies on other JAR files, or are not capable of executing within a development environment for any other reason. The problem occurs because the Monitor Model editor does not have the underlying infrastructure to support these dependencies. The user-defined XPath functions will work as expected when you run the functions on the IBM Business Monitor server or within the test environment.



Code generation fails when generating Java EE projects

Event generation sometimes produces very long file names for the Common Base Events. You cannot generate the Java EE projects if you are using any Common Base Events with very long file names. If this problem occurs, you will see an error when you right-click and select Generate Monitor Java EE Projects. Code generation fails with a code generation exception. If you click Details, the first line contains Class 'eventDefinitionList' not found.

To solve this problem:

  1. In the Project Explorer, under Event Definitions, select the event definition name and click Rename. Type a new name for the Common Base Event file.

  2. In the Monitor Model editor, click the Event Model tab and locate the Common Base Event file that you renamed. Click Remove. Click Add, browse to the new name, and click OK.



Deployed monitor models are not startable

A deployed monitor model does not show up as startable or started in the Integration Designer or Rational Application Developer servers view or in the WebSphere Application Server administrative console if the monitor model name or path name is too long.

If you try to deploy a monitor model with a long name, the monitor model does not show up as startable or started. An I/O exception occurs when loading the deployment descriptor from the deployed EAR file, which resides in a deep directory because of the long monitor model name. This exception occurs because Windows only supports 259 characters in any given path. Any path longer than 128 characters at creation time will likely cause problems by the time the model is deployed.

The exception is found in the System.Out log file and resembles the following:

00000075 MMIntegration E com.ibm.wbimonitor.lifecycle.
MMIntegration_General bindJMS_BATCH CWMLC0652E:
Runtime exception has occurred. Integration work flow will stop.

There are two possible ways to solve this problem. Either shorten the path to the temporary directory used by the Monitor test environment server, or shorten the name of the deployed monitor model.


Shortening the path to the temporary directory

To shorten the path to the temporary directory used by the server:

  1. Completely remove the monitor model, following all the steps in Remove the monitor model.

  2. Create a directory with a short name on the C drive, for example, C:\tmp.

  3. Start Rational Application Developer or Integration Designer and open the Servers view. Right-click the Monitor test environment server and click Administration > Run administrative console.

  4. Navigate to Servers > Server types > WebSphere application servers. Click server1.

  5. Under Server infrastructure, open Java and Process Management. Click Process Definition. Click Java Virtual Machine.
  6. In Generic JVM arguments, type:

      -Dworkspace.user.root=C:\tmp

  7. Click OK. Click Save.

  8. In the Business Monitoring perspective, in the Project Explorer, right-click the monitor model and click Generate Monitor Java EE projects. Add the monitor model EAR file to the server, following the steps in Add projects to the Monitor test environment server.

When the server restarts, it uses the C:\tmp directory as the Monitor test environment server temporary directory.


Shortening the name of the deployed monitor model

To shorten the name of the deployed monitor model:

  1. Completely remove the monitor model, following all the steps in Remove the monitor model.

  2. In the Business Monitoring perspective, in the Project Explorer, right-click the monitor model and click Generate Monitor Java EE projects.

  3. In the Generate Monitor Java EE Projects window, shorten the names for the Model Logic, Moderator, and Application files.

  4. Add the monitor model EAR file to the server, following the steps in Add projects to the Monitor test environment server.



Top-level monitoring context is not created

If you generated a monitor model from an Integration Designer application and your process is running and emitting events but a top-level monitoring context is not created when you think it should have been, you might not have created all the required events. This problem can also result in many No parent found exceptions. This problem occurs when the creation event for the top-level monitoring context is sent AFTER the other events that are being monitored.IBM Business Monitor tries to deliver the earlier events but cannot find a monitoring context instance to deliver them to, so it performs whatever action you specified for No matching monitoring context instances are found, which might include retrying delivery a number of times, ignoring the event, or generating an exception. Choosing the option for retrying the delivery of the event can help if the problem is just a timing issue, with some events arriving before the creation event because the events were sent out of order. In general, however, the creation event for the top-level monitoring context must arrive before any other events, including the creation events for any nested monitoring contexts.

To solve this problem:

  1. In the Monitor Model editor, right-click the top-level monitoring context and click Update from Application > application name, where application name is the application from which you generated the monitor model. Locate the component-level entry event for the component implementation that you are monitoring, and add it to your monitor model. Component-level events are represented by the blue cog icon that can be found in the event source tree within the wizard. Component-level events typically consist of ENTRY, EXIT and FAILURE events.

  2. In the Project Explorer, right-click the monitor model and click Generate Monitor Java EE projects.

  3. Add the updated project to the server.



Metrics are not being evaluated correctly

If you have a metric that you think is not being evaluated correctly at run time, and the metric receives its value from an expression that references itself or another metric, check that all reference metrics are being initialized correctly before they are used.

For example, you could have a metric that references itself so that its value can be incremented or used in the calculation of its future value, or you could have a metric that is calculated using another metric. If one of the referenced metrics is not set initially, its value will always be null because it does not have a value to use in the initial calculation.

To solve this problem when a metric references itself, make sure the metric has a default value. If the metric uses a calculation that involves another metric, you can either set a default value for the referenced metric or you can add a guard condition such as if (fn:exists(someMetric) then (someMetric + someOtherMetric) else someOtherMetric to make sure the referenced metric has a value.



Monitor models are not created for all imported processes

When you import from WebSphere Business Modeler and two processes in the same project have the same name, even if they are in different catalogs, only one monitor model is generated for them. You exported a project from WebSphere Business Modeler using the IBM Integration Designer export and clicked Export business measures as a monitor model or models. Only one monitor model was generated for two or more processes with the same name in different catalogs.

To solve this problem, you must return to WebSphere Business Modeler. You can either change the name of one of the processes, or complete the following steps:

  1. In WebSphere Business Modeler, switch to Process Server mode.

  2. Click the Technical Specification tab for one of the processes.

  3. Select the Implementation page, and set the Component name to a different name.


Export the project and monitor models again.



Versioning model in test environment fails

After you remove a model with no errors and try to install a new version of that model (with a new time stamp and name) in the test environment, deployment of the model might intermittently fail. When the failure occurs, the exception might read, for example:

AppDeploymentException: [] org.eclipse.jst.j2ee.commonarchivecore.internal.exception.SaveFailureException:
IWAE0017E Unable to replace original archive:
C:\IBM\WID61\wbmonitor\profiles\WBMonSrv_wps\wstemp\0\upload\SCAModelApplicationCSV2.ear IWAE0017E Unable to replace original archive:
C:\IBM\WID61\wbmonitor\profiles\WBMonSrv_wps\wstemp\0\upload\SCAModelApplicationCSV2.ear
Take the following action to correct the problem:

  1. Delete the contents of the wstemp directory.

  2. Restart the test server.
  3. Reinstall the new version of the model.



IBM Business Monitor server stops unexpectedly in the test environment

If you are using the IBM Business Monitor test environment or a standalone environment where FileStore was selected as the Messaging Engine Repository, the system could stop and you might need to increase the file store size. Check the SystemOut.log file for an error message similar to the following message:

 ObjectStoreFullException: CWSOM1042E:  ObjectStore=AbstractObjectStore(C:\IBM\WID61\wbmonitor\profiles
\WBMonSrv_wps\filestores\com.ibm.ws.sib\WBMonSrv_wps_Node.server1-MONITOR.WBMonSrv_wps_Cell.Bus-263BD65C369C145F
\store\PermanentStore)/1a041a04(ObjectStore) was asked to allocate space for ManagedObject=ManagedObject(null/null)/Constructed/1a001a0(PersistableRawData[ BINARYDATA ])
(ManagedObject) when it was full.

To increase the file store size on the messaging engine:

  1. Stop and restart the IBM Business Monitor server.
  2. Log in to the administrative console and select Service Integration > Buses > Monitor.[cellName].Bus > Messaging Engines.

  3. Open the Messaging Engine.

  4. Under Additional Properties, select Message Store.

  5. Increase the Maximum Permanent Store Size.
  6. Stop and restart the IBM Business Monitor server.



Integrated Test Client cannot process imported events

If an XML file of events begins with a blank line, the Integrated Test Client cannot read the file and therefore cannot emit the events. The only message that shows in the Integrated Test Client console is Emitting events.

To verify that a blank line is the problem, open the Windows Explorer browser and enter %TEMP% in the address line to find the temp directory on the machine. In the temp directory, open the ITCEmitter.log file. You might see a message similar to the following message:

INFO: WBM_ITC_EXCEPTION_OCCURRENCE :: Event Generator error using file: The processing instruction target matching "[xX][mM][lL]" is not allowed. : The processing instruction target matching "[xX][mM][lL]" is not allowed.
org.eclipse.hyades.logging.events.cbe.FormattingException: The processing instruction target matching "[xX][mM][lL]" is not allowed. : The processing instruction target matching "[xX][mM][lL]" is not allowed.

This error occurs because the file is not a valid XML file and the XML schema validation fails. The XML file could also fail validation for many other reasons, but a blank line at the beginning is one thing to check for. The file must be a valid XML file.

To fix this problem, delete the blank line at the beginning of the file and save your changes. Try to emit the events again using this file in the Integrated Test Client.



Monitor model debugger terminates after model is changed

If you modify the monitor model and maintain the data from the previous debugging session, the debugger might terminate with an unrecoverable error in the SystemOut log. If you debug a monitor model, make changes, and then debug the same model again and do not select the check box to Clear data from previous debug sessions, the debugger might terminate. In the SystemOut.log, you see an error that begins with R FATAL ERROR: '[ERR XP1031][ERR XPST0008] Variable or parameter '{http://www.ibm.com/xmlns/prod/websphere/ice}target1' is undefined.

This problem occurs because the previous events are no longer valid for the modified monitor model.

To solve the problem:

  1. Terminate the debugging session.
  2. Right-click the monitor model in the Project Explorer view, and click Debug As > Debug.

  3. In the Debug configuration window, right-click Monitor Model in the list and click New to create a new debug configuration. Alternatively, select an existing configuration that you have used before.

  4. Select Clear data from previous debug sessions to empty the input queue of incoming events and remove any remaining monitoring context instances from the previous debugging sessions.

  5. Use the Integrated Test Client to resubmit your events.


To prevent this problem, only retain data between debugging sessions when you are taking an incremental approach to debugging or your test cases are logically connected to one another.



Monitor model debugger fails to start with older monitor model

If you try to debug a monitor model that was created using an earlier version of the Monitor Model editor or using another tool, the monitor model debugger might fail to start. To be debugged, a monitor model must declare XML namespace prefixes for the XML Schema namespace and the XPath functions namespace. These definitions are added automatically if the monitor model is created using the Monitor Model editor.

To solve the problem, add the following to the namespace declaration section at the top of the monitor model:

xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:xs="http://www.w3.org/2001/XMLSchema"



19. Recovering from a failure

Recovering from a failure requires an understanding of standard system processing in the event of a failure, as well as an understanding of how to analyze problems that may be the cause of a failure.



Overview of the recovery process

The recovery process encompasses a set of tasks that include both analysis and procedures.

When you must recover from a failure, these are the high-level steps to follow:



Triggers for recovery

The need for solution recovery can result from a variety of triggers.


Situations from which solution recovery is necessary

Solution recovery is the process of returning the system to a state from which operation can be resumed. It encompasses a set of activities that address system failure or system instability that can be triggered by unforeseen circumstances.

You may need to perform solution recovery activities for the following circumstances:



Assessing the state of the system

The first thing to do when an abnormal condition occurs is to take the pulse of the overall system and get a feel for how much or how little of the system is operational and how much of it is rendered 'out of service' by whatever the external stimuli was that caused this condition.

Address a predefined set of questions to assess the extent of the outage. The following list provides examples of predefined questions designed to help you gather the appropriate information:

  1. Is this system still performing work?

    Determine if system is still operational. Often times, a system can be operational, but because of overload or inappropriate tuning, or both, the system is not completing tasks quickly and is attempting to do work that is indeed failing.

    The litmus test for each of these questions will be specific to the nature of the deployed solution.

  2. What special error handling support is built-in to the application?

    If there is a lot of automated retry and various support logic, then the application itself might shield some errors from manifesting the IT operator.

    These conditions must be known and documented for reference by the recovery team.

Things you can do to help assess the state of the system include the following:

  1. Check to see if the server is at least running.

    Do you see the PID or get a positive feedback from the deployment manager via the administrative console?

  2. Check to see if there are locks in the database(s) or any unusual database traffic.

    Most databases will have facilities to look at locks. Depending on the deployment topology, there maybe multiple databases.

    • Messaging Engine Database
    • Business Process Container Database
    • WebSphere Process Server Common Database (Failed Events and Relationship data)

  3. Check to see what the status of the messaging system is.

    Check for events or messages in the following locations:

    • Business Process Choreographer Hold and Retention Destinations
    • Number of failed events
    • Number of messages on the solutions module destinations

  4. Check to see if the database is functioning.

    Can you perform some simple SELECT operation, on unlocked data in a reasonable amount of time?

  5. Check to see if there are errors in the database log.

If the database is not working properly, then recovering the database (so that it can at least release locks and perform simple selects) is vital to system recovery.

If the messaging system is not working properly, then recovering the messaging subsystem, so that it can at least be viewed and managed, is also vital to system recovery.

A 'bottoms up' approach is not always conclusive. However, chances of successful recovery vary based on these basic activities.

From these basic procedures and health check kinds of activities, start to look at some specific situations. Patterns will be described, specifics will be given and insights as to what is going on under the covers will be provided.

Realize that this situational analysis is a read-only activity. While it provides vital information from which to determine the appropriate recovery actions, it should not change the state of the system under review. It is impossible to predict and provide prescriptive actions for all possible causes of a system outage. For example, consider the following decision tree:

There are many broad categories to investigate in the event of an unplanned outage. These broad categories will have sub categories and so on. Defining prescriptive actions for each node and the subsequent node will depend on the results of each investigation. Because this type of relationship is difficult to convey in a document form, using a support tool such as IBM Guided Activity Assist to interactively walk you through the investigative and decision making process is recommended. As you progress from the top to each child node, it is important to conduct the appropriate level of situational analysis.



Recovery: Analyzing the problem

For all unplanned system events, a set of basic recovery procedures can be leveraged at the point of identification.

There are six well defined steps to situational analysis:

  1. Define the question
  2. Gather information and resources (observe)

  3. Form hypothesis
  4. Perform experiment and collect data
  5. Analyze data
  6. Interpret data and draw conclusions that serve as a starting point for new hypothesis

For each production scenario the symptoms that initiate a recovery action may vary.

It is important to follow the guidelines for situational analysis and take the corrective action relative to the symptoms that are presented.



Situational analysis

Situational analysis is the cyclical execution of the scientific method and can take into account various situations that will initiate a recovery procedure.

The following list is of the different types of situations that will initiate a recovery procedure:



Recovery: First steps

Administrators can facilitate solution recovery processes by following a first steps checklist of general practices.

The following list describes actions that you SHOULD NOT TAKE under normal circumstances when trying to recover a solution.

There could be special situations for which you might need to perform some of the actions listed in this topic. However, you should never initiate any of these actions without first consulting with the BPM support organization.

The following list describes the recommended recovery actions.



Failed-event locations: Where does the data go?

For all (production and test) recovery activities there are a finite number of locations in the solution where events accumulate.

By adhering to guidelines and preventive measures described in Planning error prevention and recovery, all business events and associated data will reliably accumulate in one of these locations.

If you do not adhere to sound architectural and application development practices, then a percentage of inflight events may end up in an inconsistent state, from which recovery cannot be attained. Under such circumstances, (presumably identified during testing cycles) post recovery investigation and clean up is necessary to correct the issue so that future recovery activities are completely successful.

In order to accurately describe the following scenarios, it is important to put the information in the context of a use case.



Use case: recovering data from failed events

A use case provides a context for a recovery scenario. In the use case, a business has an application that receives a request to create a new Account.

The solution consists of multiple modules as recommended through module best practices.

The first module mediates the request and delegates work to an Account Creation process. In the Figure 1 we have implemented the solution as separate modules where the request is passed between the mediation module (AccountRouting) and the processing module (AccountCreation) via an SCA import/export. See the following screen capture for an illustration of the two modules.

Figure 1. Assembly diagram of account routing process

From the assembly diagram shown in Figure 1, you can begin to see at what locations in the flow that failures might occur. Any of the invocation points in the assembly diagram can propagate or involve a transaction. There are a few areas in the flow where data will collect as a result of application or system failures.

In general, transaction boundaries are created and managed by the interaction (synchronous and asynchronous) between components and import/export bindings and their associated qualifiers. Business data accumulates in specific recovery locations most often due to transaction failure, deadlock or rollback.

Transaction capabilities within WebSphere Application Server help IBM Business Process Manager enlist transactions with service providers. These enlisted interactions are particularly important to understand with respect to import and export bindings. Understanding how imports and exports are used within your specific business cases is important in determining where events in need of recovery accumulate.

An error handling strategy should define interaction patterns, transactions used, and import and export usage before developing the application. The solution architect should identify the preferences to use, and the guidelines that are then used as the application is created. For example, the architect needs to understand when to use synchronous compared to asynchronous calls, when to use BPEL fault handling and so forth. The architect must know whether or not all services can participate in transactions, and for those services that cannot participate, how to handle compensation if problems are encountered.

Additionally, the application shown in the assembly diagram in Figure 1 uses connectivity groups and module development best practices. By leveraging this pattern we now have the ability to stop the inbound flow of new events by stopping the AccountRouting module.

The following sections address the location of business data in the case of failure and recovery.


Business Flow Manager or Human Task Manager

In our business case, we use a BPEL process for AccountCreation process.

With regard to recovery, there are a some questions ask yourself with respect to BPEL and human task management:

  1. What type of process is being run (short running or long running, business state machine, human task) ?

    Short running processes are known as microflows.

  2. Is the process developed properly and using fault handling to promote data integrity?
  3. How are the invocation patterns and unit of work properties configured to predict and control transaction boundaries?

Knowing the answers to these questions will affect your recovery strategy for invocations 7 and 8 shown in the assembly diagram, as highlighted in Figure 2.

Figure 2. Assembly diagram of account routing - invocations 7 and 8

Stateful components, such as long-running BPEL processes and business state machines, involve many database transactions where process activity changes and state changes are committed to the database. The work progresses by updating the database, and placing a message on an internal queue that describes what is to be done next.

If there are problems processing messages that are internal to the Business Flow Manager, these messages are moved to a Retention Queue. The system attempts to continue to process messages. If a subsequent message is successfully processed, the messages on the retention queue are resubmitted for processing. If the same message is placed on the retention queue five times, it is then placed on the hold queue.

Additional information about viewing the number of messages and replaying messages can be found in Replaying Messages from the Retention Queue / Hold Queue.


Failed event manager

The failed event manager is used to replay events or service invocation requests that are made asynchronously between most component types.

Failed events are created if the AccountRouting component makes an asynchronous call to the SCA Import binding AccountCreationSCAImport and a ServiceRuntimeException is returned.

Failed events are not generated in most cases where a long running BPEL process is the client in the service interaction. This means the invocation for 7 and 8 (as shown in Figure 2) will not typically result in a failed event. BPEL provides fault handlers and other ways to model for failure. For this reason, if there is a ServiceRuntimeException (SRE) failure calling "JDBCOutboundInterface", the SRE is returned to the BPEL for processing. The error handling strategy for the project should define how runtime exceptions are consistently handled in BPEL.

However, failed events are created for asynchronous response message for the BPEL client if these messages cannot be delivered to the process instance due to an infrastructure failure.

When mediation Service Invoke or Callout primitives are making an asynchronous invocation, retry behavior is defined by the primitive and overrides any asynchronous behavior from the destination. Failed events go to the failed event manager if the fail terminal on the primitive is not wired.

The following diagram illustrates how the failed event manager component works. Descriptions of the processing associated with each numbered step are provided in Figure 3.

Figure 3. Failed event manager processing


Failed event manager processing

  1. The source component makes a call using an asynchronous invocation pattern
  2. The SCA MDB picks the message up off the SCA destination
  3. The SCA MDB makes the call to the correct target component
  4. The target component throws a ServiceRuntimeException
  5. The SCA MDB transaction rolls back to the SCA destination
  6. The exception information is stored to the failed event manager database with a status of not confirmed
  7. The invocation is retried by the SIBus n number of times

    The initial retry count value for new modules is 0 - one original and 0 retries. Existing modules from previous releases keep the existing retry count value of 4. You can change the retry count value by setting the asynchronous retry count for the modules at design time. Also, administrators can change it at run time using the configSCAAsyncRetryCount command. See Controlling system retries overview.

  8. After the number of retries reaches the specified limit, the message is moved to the failed event manager destination.
  9. The failed event manager database picks up the message
  10. The failed event manager database updates the failed event in the database, and the status is set to failed.


When are failed events created?

As stated, failed events are neither created for synchronous invocations nor typically for two-way business process interactions.

Failed events are generally created when clients use an asynchronous invocation pattern and a ServiceRuntimeException is thrown by the service provider.

If everything is done synchronously and in the same transaction, data is not collected anywhere. Instead it is all rolled back to the client that made the call. Where ever a commit is occurs, data collects. If the calls are all synchronous, but there are multiple commits, then these commits become an issue.

In general, you should use asynchronous processing calls or long running BPEL processes if multiple transactions are needed. So each ASYNC call is a chance for data to collect. Long running BPEL processes are a collection point.

Invocation patterns and relationship to the creation of failed events: Service Business Exceptions

Invocation Pattern Failed Event Created Y/N? Notes
Synchronous No Failed events are not created for service business exceptions or when using a synchronous pattern
Asynchronous - One Way No By definition, one-way invocations cannot declare faults, meaning, it is impossible to throw a ServiceBusinessException.
Asynchronous - Deferred Response No Failed events are not created for service business exceptions
Asynchronous - Callback No Failed events are not created for service business exceptions

Invocation patterns and relationship to the creation of failed events: Service Runtime Exceptions

Invocation Pattern Failed Event Created Y/N? Notes
Synchronous No Failed events are not created for service runtime exceptions or when using a synchronous pattern.
Asynchronous - One Way Yes  
Asynchronous - Deferred Response Yes  
Asynchronous - Callback Yes  
BPEL - Two Way No

Failed events are not created when the source component is a business process.

For an asynchronous call, if the response cannot be returned to BPEL, then a failed event is created.

BPEL - One Way Yes  

For additional information, review the information center topic titled Manage failed events.

Additional information about viewing and resubmitting failed events can be found in section Resubmitting failed events.


Service integration bus destinations

Messages that are waiting to be processed may accumulate in a few service integration bus (SIBus) destinations. For the most part these destinations are "system" destinations. Messages within these destinations typically are a mixture of three types:


SCA module destination

Again, refer back to our business case.

There would be two SCA module destinations in the solution:

These destinations are created when the module is deployed to an application server or a cluster.

There are rare opportunities for messages to accumulate in these destinations. The accumulation of messages in these locations is a strong indication there maybe a performance problem or an application defect. Investigate immediately. It is important to monitor the depth of the module destinations (with your chosen IT monitoring solution), because a back up of messages could lead to a system outage or a prolonged recycle time.

We call these SCA module destinations because the generated name is the same as the module name with the additional sca/. These destinations are pivotal in the functioning of SCA asynchronous invocations (brokering requests and responses). There are a varying number of additional destinations that are generated during application installation on the SCA.SYSTEM bus, but for the purpose of the discussion we'll be addressing the importance of the SCA module destination.


System integration bus retry

As we learned above, the failed event manager has a built-in retry mechanism with the SCA message driven bean (MDB).

Referring to our business case, there are a number of service integration bus destinations created by SCA to support asynchronous communication.

As we have learned, one of these destinations is called sca/AccountRouting. You can adjust the number of retries that happen when a ServiceRuntimeException occurs on an asynchronous service invocation. The number of retries can be controlled by setting the asynchronous retry count for the module at design time or by using the configSCAAsyncRetryCount command at run time. However, you cannot set the value less than 2 in modules with a BPEL process. The second delivery is required to return ServiceRuntimeExceptions back to the BPEL for processing.

For more information about retry behavior, see Controlling system retries overview.


System exception destinations

The failed event manager is one place where we can look to administer failures. When dealing with imports and exports that are JMS or EIS based, we must consider another important location.

Destinations on the SCA.Application bus are configured to route failed messages to the service integration bus system exception destination for that bus. Thus, if a JMS export picks up a message from the SCA.Application bus and runs into a rollback situation, the failed message is routed to the service integration bus system exception destination instead of to the process server recovery exception destination. This scenario differs from the failed event discussion above in that a failure to deserialize a message on the SCA.Application bus will not result in a failed event. There is a system exception destination on every bus within the solution. These destinations must be monitored and administered much like the "dead letter queue" common to MQ infrastructures.

Consider the following scenario.

An external JMS client places a message on an inbound queue exposed via a JMS export. The JMS export binding MDB picks up the message for processing. From here, one of two things happens:

  1. The JMS export successfully parses the message and determines which operation on the interface to invoke at which point the message is sent to the SCA runtime for processing.
  2. The JMS export fails to recognize the message body as a valid business object or the JMS export binding deserializes the message body but is unable to determine the appropriate operation on the interface to invoke. At this point the message is placed on the system exception destination for the bus.

We can have this type of failure when trying to receive requests from the AccountRoutingJMSExport (1). This export is a JMS export and there is a possibility that events can accumulate on the system exception destination on the SCA.Application.Bus. Use the chosen IT monitoring solution to observe the depth of this destination.


Failed event manager and service integration bus destinations

For IBM Business Process Manager, the exception destination is set to the BPM exception destination queue. This queue follows a naming convention as follows:

Node name: MyNode Server name: server1
Recovery exception destination: WBI.FailedEvent.MyNode.server1
In general, all the destinations created on the SCA.System bus are configured to route failed messages to the recovery exception destination.

When a system failure occurs, in addition to capturing the failed message in this exception destination, the BPM recovery feature also generates a failed event that represents the system error and stores it into the Recovery database as described in the failed event manager section of this document.


Summary

In summary, IBM Business Process Manager provides administrative capabilities above and beyond the underlying WAS platform. Proper measures should be made to understand and use these capabilities along with following the guidance provided in the Planning error prevention section of Plan error prevention and recovery.

Administrative capabilities to help manage failures

Administrative Capability Bundled With IBM Business Process Manager? Summary
Business Process Choreographer Explorer Yes Read/Write/Edit/Delete Access. This is the central place to administer business processes and human tasks.
failed event manager Yes Read/Edit/Delete Access. This is the central place to administer Service Runtime Exceptions and other forms of infrastructure failures.

Service Integration Bus Browser

Yes

Read/Delete. Use the Service Integration Bus Browser on the administrative console for browsing and performing day-to-day operational tasks on service integration buses.

The number of events or records that can be simultaneously administered by these tools are specific to external factors such as memory allocation, result sets and DB tuning, connection timeout. Run tests and set the appropriate thresholds to avoid exceptions (OOM, TransactionTimeOut).



Related concepts:

Retention queues and hold queues


Recovery troubleshooting tips

This section provides a list of tips for troubleshooting the recovery process.



Restarting deployment environments

As one step in a recovery process, you may need to restart of you deployment environment.


About restarting deployment environments

The procedure to restart a deployment environment varies depending on the topology. Topologies are based on system configuration patterns, each pattern designed to meet particular business requirements.

IBM Business Process Manager supports a set of predetermined deployment environment configuration patterns. If none of the patterns meet your requirements, you can plan and create your own customized deployment environment.

In any given deployment environment configuration pattern there are a number of servers running as JVM processes. In general there are three types of servers as follows:

The deployment patterns differ in how you group and organize all the functional components, so the pattern can address your business requirements in the most cost effective fashion. For more advanced and highly available environments, the servers would reside in clusters that are distributed across physical resources.


General practice for restarting servers as part of a recovery operation

A general model for starting servers is to start the messaging servers first, then the support servers and lastly the BPM servers. Each application architecture may have specific dependencies between application components that need to be taken into consideration.

Shutdown basically happens inverse to the startup procedure, starting with the application server clusters and ending with shutting down the messaging infrastructure after it has had time to quiesce and process any inflight transactions.



Related tasks:

Choose the deployment environment pattern

IBM Business Process Manager and WebSphere Enterprise Service Bus deployment patterns: Selecting the deployment pattern


View the service integration bus

Use the Service Integration Bus browser on the administrative console to view the service integration bus.

Make sure you understand how the Service Integration Bus is used. Each deployment environment has its own bus. The single bus is called BPM.env_name.Bus. The Service Integration Bus Browser provides a single location for browsing and performing day-to-day operational tasks on service integration buses.

View the service integration bus is a useful way to determine if messages are accumulating on the destinations.

The accumulation of messages on the SCA Module destinations is a strong indication there maybe a performance problem or an application defect.

It is a good idea to periodically view the messages and determine if there are any messages have become locked for an extended duration of time as this may indicate there are "indoubt transactions".

  1. From the administrative console, expand Service integration.

  2. Select Buses.

  3. Select the appropriate messaging bus for the service. For example, for a messaging engine that is named DE1Cluster1.000-BPM.DE1, the name of the bus would be, BPM.DE1.Bus.

  4. Select Destinations.
  5. Review the relevant information. You should look at the destinations named sca/XYZ, where XYZ is the name of the module.

  6. Select the link text for the destination that you are interested in viewing.

    This will link you to a general properties page for the destination to view.

  7. From the general properties page of the destination, select the Queue points.

  8. From the Queue points page, select the link for the message point.

  9. Select the Runtime tab.

    From this screen you can see the current message "depth" and the threshold.

    Selecting the Messages link lets you view the message contents.

    Ideally, use an appropriate IT monitoring tool and set alert thresholds for these destinations. The threshold value would be established during the performance test phase for the application.

    Messages on a production system should never be deleted unless explicitly directed to do so by the L3 team.



Related concepts:

Service Integration Bus Browser


Related tasks:

Resolve indoubt transactions

Service integration bus for BPM


Capturing javacore

There are a number of methods that you can use to capture a javacore from an IBM JDK and thread dumps for non-IBM JDKs.


Capturing javacore

A javacore dump, or a thread dump as it is also called, is one of the primary problem determination documents that an application server creates.

  1. Use wsadmin to produce a javacore in the Profile directory:

    1. For Windows:
      <PROFILE_DIR>\bin\wsadmin.bat [-host host_name] [-port port_number]
      [-user userid -password password] -c "$AdminControl invoke [$AdminControl queryNames WebSphere:name=JVM,process=server1,*]
      dumpThreads"

    2. For UNIX (IBM JDKs):
      <PROFILE_DIR>>/bin/wsadmin.sh[-host host_name]
      [-port port_number] [-user userid -password password] -c "\$AdminControl invoke [\$AdminControl queryNames WebSphere:name=JVM,process=server1,*]
      dumpThreads"

    The braces, [] around the AdminControl queryNames command are part of the command, and not used to signify optional parameters as is the case for braces around host, port and user. The process name: server1 may need to be change to fit your configuration.

  2. A signal can be sent to the server process:

    1. Windows:

      A launch script must be used to start the server process to allow the signal to be passed to the process. This does require special setup before starting the server.

      1. <PROFILE_DIR>\bin\startServer.bat server1 -script SERVER1.bat
      2. b. SERVER1.bat

        The server process will start in a command window. You will need to check the logs to verify the server has successfully started since the intermediate JVM process which usually starts the server process is not used.

      3. <CTRL><BREAK>

        Issue a <CTRL><BREAK> into the command window where the server process is running. A javacore will be produced.

    2. UNIX (all JDKs): kill -3 <pid>

      Where <pid> is the process id of the WebSphere Process Server. For IBM JDKs a javacore will be produced in the <PROFILE_DIR>directory.

      For non-IBM JDKs, a thread dump will be written to the native_stdout.log.

  3. An alternative method to dumping a windows core file is to use jvmdump.

    This does not require special setup before starting the server. However, it does require a special executable file from the JVM team. The jvmdump.exe program can be requested by sending a note to jvmcookbook@uk.ibm.com. The advantage of this method is additional information can be obtained about native code being executed within JVM. The format of the dump differs from the IBM javacores.

    • jvmdump.exe <PID>
    • WAS_HOME>\java\jre\bin\jextract.exe <core.name.dmp>
    • WAS_HOME\java\jre\bin\jdumpview.exe

      • set dump <core.name.dmp>.zip
      • display thread

        Displays the current executing thread at the time of the dump

      • c. display thread *

        Display all of the threads from the dump.

    For more details about the jdumpview utility consult the Diagnostics Guide for the IBM Developer Kit and Runtime Environment, Java™ Technology Edition, Version 5.0.


Configure the hang detection policy


Servers and recovery mode processing

When you restart an application server instance with active transactions after a failure, the transaction service uses recovery logs to complete the recovery process.

These recovery logs, which each transactional resource maintains, are used to rerun any Indoubt transactions and return the overall system to a self-consistent state. An indoubt transaction is one that has encountered environmental or other errors during commit processing. Logging occurs for normal inflight transactions, but those log entries are removed upon successful commit processing.

This recovery process begins as soon as all of the necessary subsystems within the application server are available during a server startup. If the application server is not restarted in recovery mode, the application server can start accepting new work as soon as the server is ready, which might occur before the recovery work has completed. This might be okay in many cases, but the more conservative option is provided here. To be clear, recovery will run on a server restart even if the server is started in 'normal' start model.



Related tasks:

Start managed servers

Profile-specific log files


Retention queues and hold queues

When a problem occurs while processing a message, it is moved to the retention queue or hold queue.

You can perform administrative actions on the messages in the retention queue and hold queue using either the administrative console or through scripting.

In some cases, viewing and replaying messages on the retention queue or the hold queue can be part of a recovery procedure.



Related concepts:

Use case: recovering data from failed events

BPEL processes: Recovery from infrastructure failures

Failed event manager console help field descriptions

Manage failed events

Query and replay failed messages, using the administrative console

Query and replay failed messages, using administrative scripts


Business Process Choreographer maintenance and recovery scripts

There are several maintenance-related scripts for Business Process Choreographer. Run these maintenance scripts as part of a general maintenance policy to help maintain database performance, or as part of a recovery process as deemed necessary.

You should run these scripts to remove from the database the templates and their associated objects, as well as completed process instances, that are not contained in any corresponding valid application in the configuration repository.

There is also the possibility of having invalid process templates. This situation can occur if an application installation was canceled or not stored in the configuration repository by the user.

IBM Business Process Manager also provides a service that automates Business Process Choreographer cleanup. You can run that service from the administrative console.

Use the following scripts for Business Process Choreographer recovery maintenance:

deleteInvalidProcessTemplate.py

Run this script to delete, from the Business Process Choreographer database, business process templates that are no longer valid. You cannot use this script to remove templates of valid applications from the database. This condition is checked and a ConfigurationError exception is thrown if the corresponding application is valid.

These templates usually have no impact. They are not shown in Business Process Choreographer Explorer.

deleteInvalidTaskTemplate.py

Run this script to delete, from the Business Process Choreographer database, human task templates that are no longer valid. You cannot use this script to remove templates of valid applications from the database. This condition is checked and a ConfigurationError exception is thrown if the corresponding application is valid.

deleteCompletedProcessInstances.py

Run this script when all completed process instances have to be deleted. A top-level process instance is considered completed when it is in one of the following end states:

  • Finished
  • Terminated
  • End
  • Failed

You can specify the criteria to selectively delete top-level process instances and all their associated data (such as activity instances, child process instances, and inline task instances) from the database.

When running these scripts from the command line, make sure the SOAP client timeout is longer than the duration of the requested operation. For more information about this timeout, see "Connection timeout when running a wsadmin script".


Delete an allotment of completed process instances

You can delete an allotment of process instances from the development environment.

Use a script that wrappers the provided deleteCompletedProcessInstances.py


Example

For example, after altering the script and running the command as:

wsadmin.<bat|sh> -user<USERNAME> -password<PASSWORD> -f loopDeleteProcessInstances.py
2008-04-02T21:00:00 3600
This command will run deleteCompletedProcessInstances.py while increasing the completed before time stamp by one hour (60 minutes * 60 seconds) after every execution.

The deleteCompletedProcessInstances.py script has a time stamp parameter which can be used to control the number of instances being deleted. The smaller the interval, the fewer instances will be deleted per invocation of the deleteCompletedProcessInstances.py. This can be useful in situations where deletion of multiple process instances encounter transaction timeouts. The most common cause for transaction timeouts during process deletion involve the following:


BPEL process instances

Use scripts to administer Business Process Choreographer

Delete process templates that are unused

Delete completed process instances

Delete human task templates that are unused

Configure the cleanup service and cleanup jobs

Connection timeout when running a wsadmin script


Resolve indoubt transactions

Transactions can become stuck in the indoubt state indefinitely due to exceptional circumstances, such as the removal of a node causing messaging engines to be destroyed.

Use the procedure to resolve indoubt transactions only if you have tried other procedures (such as restarting the server in recovery mode), unsuccessfully. When a transaction is stuck in the indoubt state, it must either be committed or rolled back so that normal processing by the affected messaging engine can continue.

You can use the administrative console to display the messages causing the problem by Listing messages on a message point.

If there are messages related to an indoubt transaction, the identity of the transaction displays in a panel associated with the message. You can then resolve the transaction in one of the following ways:

You should first attempt to resolve the indoubt transaction using the application server transaction management panels. If this does not work, then use methods on the messaging engine's MBean. These are described in the Procedure section below.

  1. Use the application server transaction management panels to resolve indoubt transactions

    1. Navigate to the transaction management panels in the administrative console

      Click Servers > Application servers > [Content Pane] > server-name > [Container Settings] Container Services > Transaction Service > Runtime > Imported prepared transactions - Review

    2. If the transaction identity appears in the resulting panel, you can commit or roll back the transaction

      Choose the option to roll back the transaction

      If the transaction identity does not appear in the panel, the transaction identity was not enlisted with the Transaction Service on the server. In this case only, you should use methods on the MBean (as described in the next step) to display a list of the identities of the indoubt transactions managed directly by the messaging engine.

  2. Use methods on the messaging engine's MBean to resolve indoubt transactions

    CAUTION:

    Only perform this step if you were unable to display the transaction identity by using the server's transaction management panels

    1. The following methods on the messaging engine's MBean can be used to get a list of transaction identities (xid) and to commit and roll back transactions:

      • getPreparedTransactions()
      • commitPreparedTransaction(String xid)
      • rollbackPreparedTransaction(String xid)

    2. To invoke the methods, you can use a wsadmin command, for example, you can use a command of the following form to obtain a list of the indoubt transaction identities from a messaging engine's MBean:

        wsadmin> $AdminControl invoke [$AdminControl queryNames type=SIBMessagingEngine,*] getPreparedTransactions

      Alternatively, you can use a script such as the following to invoke the methods on the MBean:

      foreach mbean [$AdminControl queryNames type=SIBMessagingEngine,*] {
        set input 0
      
        while {$input >=0} {
          set xidList [$AdminControl invoke $mbean getPreparedTransactions]
      
          set meCfgId [$AdminControl getConfigId $mbean]
          set endIdx [expr {[string first "(" $meCfgId]  - 1}]
          set me [string range ${meCfgId} 0 $endIdx]
      
          puts "----Prepared Transactions for ME $me ----"
          set index 0
          foreach xid $xidList {
            puts "  Index=$index XID=$xid"
            incr index     }
          puts "------- End of list ---------"
          puts "Select index of XID to commit/rollback (-1 to continue) :"
          set input [gets stdin]
      
         if {$input < 0 } {
      puts "No index selected, going to continue."
          } else {
            set xid [lindex $xidList $input]
            puts "Enter c to commit or r to rollback XID $xid"
            set input [gets stdin]
           if {$input == "c"} {
              puts "Committing xid=$xid"
              $AdminControl invoke $mbean commitPreparedTransaction $xid
            }
           if {$input == "r"} {
              puts "Rolling back xid=$xid"
              $AdminControl invoke $mbean rollbackPreparedTransaction $xid
            }
          }
          puts ""
        }}

      This script lists the transaction identities of the transactions together with an index. You can then select an index and commit or roll back the transaction corresponding to that index.

In summary, to identify and resolve indoubt transactions:

  1. Use the administrative console to find the transaction identity of indoubt transactions.

  2. If a transaction identity appears in the transaction management panel, commit or roll back the transactions as required.

  3. If a transaction identity does not appear in the transaction management panel, use the methods on the messaging engine's MBean. For example, use a script to display a list of transaction identities for indoubt transactions. For each transaction:

    1. Enter the index of the transaction identity of the transaction.

    2. Enter c to commit the transaction

    3. Enter r to roll back the transaction.

  4. To check that transactions are no longer indoubt, restart the server and use the transaction management panel, or methods on the messaging engine's MBean.



Related tasks:

View the service integration bus


Review DB2 diagnostic information

Use a text editor to view the DB2 diagnostic log file on the machine where you suspect a problem to have occurred. The most recent events recorded are the furthest down the file. Review DB2 diagnostic information when your systems are not working well. This is a way to see if the log files are full.

On Unix type the following command: tail -f /home/db2inst1/sqllib/db2dump/db2diag.log

If the database is unresponsive, you will see something similar to the following:

2008-04-03-11.57.18.988249-300 I1247882009G504    LEVEL: Error PID     : 16020                TID  : 3086133792  PROC : db2agent (WPRCSDB) 0
INSTANCE: db2inst1             NODE : 000         DB   : WPRCSDB
APPHDL  : 0-658                APPID: 9.5.99.208.24960.080403084643
AUTHID  : DB2INST1
FUNCTION: DB2 UDB, data protection services, sqlpWriteLR, probe:6680
RETCODE : ZRC=0x85100009=-2062548983=SQLP_NOSPACE
          "Log File has reached its saturation point"
          DIA8309C Log file was full.

2008-04-03-11.57.18.994572-300 E1247882514G540    LEVEL: Error PID     : 16020                TID  : 3086133792  PROC : db2agent (WPRCSDB) 0
INSTANCE: db2inst1             NODE : 000         DB   : WPRCSDB
APPHDL  : 0-658                APPID: 9.5.99.208.24960.080403084643
AUTHID  : DB2INST1
FUNCTION: DB2 UDB, data protection services, sqlpgResSpace, probe:2860
MESSAGE : ADM1823E  The active log is full and is held by application handle           "274".  Terminate this application by COMMIT, ROLLBACK or FORCE           APPLICATION.

In the preceding example, looking at the DB line, you can see the WPRCSDB is experiencing full transaction logs.

Another way of viewing the db2diag logs is to log in as the DB2 user and run db2diag:

su -l db2inst1
 db2diag | less 


Interpreting diagnostic log file entries


Process recovery troubleshooting tips

Use Business Process Choreographer Explorer can facilitate process recovery efforts.

The Business Process Choreographer Explorer provides a user interface for administrators to manage BPEL processes and human tasks.

You can use the Business Process Choreographer Explorer to check the status of the Business Process Choreographer database (BPEDB). If you are unable to retrieve database information through the Business Process Choreographer Explorer, or if the Business Process Choreographer is slow to return database information, it might be an indication of a problem with the database.

Attempting to retrieve thousands of process instances or tasks is not wise if performance or database problems are suspected. Selecting a view which does not retrieve considerable data, such as "My Process Templates", or limiting the amount of data retrieved for another view would be better options.


Repairing BPEL processes and activities

Start Business Process Choreographer Explorer

Business Process Choreographer Explorer overview

Tuning Business Process Choreographer Explorer


About recovering the messaging subsystem

If the messaging system experiences problems you may need to recover the underlying messaging subsystem.

Typically this involves checking the state of various queues but can also include analyzing the integration bus infrastructure.

Detailed information on recovering the messaging subsystem can be found in the WebSphere Application Server information center.



Related concepts:

Enterprise service bus messaging infrastructure

Troubleshooting service integration message problems


20. Disaster recovery

Disaster recovery consists of the policies and procedures that describe how to recover or continue the technology infrastructure critical to an organization after a natural or human-induced disaster.

Business continuity is an overall plan to keep all aspects of a business functioning in the midst of disruptive events. Disaster recovery is a subset of business continuity, focusing on the technology systems that support business continuity.

Disaster recovery consists of well-defined strategies to back up the primary data center and restore its data to a secondary data center.

The topics in this section provide information about the supported scenarios and configuration for disaster recovery in a production environment that includes IBM Business Process Manager and IBM Business Monitor.



Disaster recovery concepts

When you are planning for disaster recovery, consider the topology of the production environment, the types of data, the scope of the recovery, and the plans for data consistency.


Production environment

The data center of an IT environment typically consists of various systems and environments, such as Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), and Human Resource Management (HRM). The disaster recovery strategy must define the general rules from a high-level point of view, with detailed plans for each system.

Each system might be a complicated combination of software and hardware deployments. The disaster recovery for the system must take all components into consideration to provide a complete solution.

The underlying database to support IBM Business Process Manager, the messaging engine, Business Space that is powered by WebSphere, and IBM Business Monitor are also regarded as part of the production environment. They are included in the same recovery scope because the whole production environment must be in a consistent state during the restoration phase.


Data classification

The production environment contains four types of data.

RAM data

RAM data is the intermediate data kept inside memory.

Installation data

Installation data is the data that is associated with the installation of IBM Business Process Manager and IBM Business Monitor, the underlying database installation, and the operating system data that is related to IBM Business Process Manager and IBM Business Monitor. The installation data does not change after initial installation.

Configuration data

Configuration data is the data that is associated with profile configuration, applications, resource configuration of IBM Business Process Manager and IBM Business Monitor, and related database, and table definitions. The configuration data changes when you install an application, create a profile, generate a new cluster member, or make other configuration changes.

Runtime data

Runtime data is the data that is associated with transaction logs, messages that are saved in the database table, process instance information persisted in the database table, and other persistent business states. Runtime data changes continuously while the production environment is running.

Some kinds of data, such as operating system installation and configuration data, IBM Business Process Manager and IBM Business Monitor installation data, and database installation data, can be rebuilt or reinstalled. Other kinds of data, such as transaction logs, application data, and configuration data for BPM and IBM Business Monitor, must be recovered.

Define your recovery scope, recovery point objective, and recovery time objective goals according to your business needs.


Recovery scope

Recovery scope defines which resources are part of a backup. In this case, the resources include IBM Business Process Manager and IBM Business Monitor configuration, runtime data, and all the customer data, including customer applications and process templates and instances.

Put IBM Business Process Manager and IBM Business Monitor underlying database files and all the profiles into the same volume group or consistency group. Putting the files in the same group affects the sizing result in the disaster recovery plan.


Recovery point objective

The recovery point objective defines how much data you can afford to lose between the original environment and the restored environment. From a business perspective, a smaller recovery point objective means that fewer business transactions are lost, which is critical for normal business operations.

To achieve a smaller recovery point objective, you must increase the frequency with which you back up the production environment. However, also consider the cost and effect of frequent backups on BPMion environment. The more times you back up, the more copies you must maintain.


Recovery time objective

The recovery time objective defines how long you can wait until the restored environment can continue with normal processing. From a business perspective, you might want to achieve different recovery time objectives that are based on your own business needs.

To define the appropriate recovery time objective, consider the work that must be done during disaster recovery. Increasing the frequency of your backups does not always lead to a smaller recovery time objective. For example, if the server startup takes 20 minutes, you cannot reduce recovery time below 20 minutes, no matter how often you back up. You must rearchitect your server to start faster or get a faster server.

Part of your comprehensive disaster recovery plan includes determining the recovery point objective and recovery time objective that are based on your real business needs.

Figure 1. Recovery point objective and recovery time objective


Consistency

After a disaster and a successful recovery of the production system from backup, ensure that you have consistent data. For IBM Business Process Manager and IBM Business Monitor, this consistency must apply to all cell members. If one node in a cell is inconsistent, the backup image and restore attempt are invalid.

You must have crash consistency and application consistency:

Type of consistency Description
Crash consistency The bytes in the restoration match the ones in the primary system at the time of the backup. In a shared, multinode environment, the data for the cluster is assured to be in the same time sequence as the write operations.
Application consistency When the operating system starts, there are no file system recovery errors. Applications are able to access data from the time of the backup without failure. The applications recover inflight transactions when they are restarted.



Backing up data

A backup system for disaster recovery is a copy of the production environment. The goal of any disaster recovery system is to create a mirror image of the data from the primary data center in a secondary data center. There are several ways to manage a backup system. Each method imposes some constraints on the production environment, and each presents some advantages and disadvantages.

When you are thinking about disaster recovery, focus on this question: "What is the best way to build a disaster recovery solution for the organization’s business process applications?" Base your infrastructure design decisions on the real needs of your business. Before you start to consider how you want to configure a disaster recovery system, make sure that you clearly define your priorities.

Managed by the operating system

You can use the operating system to manage the data replication. Operating system techniques rely on capabilities provided by the operating system to copy data from one location to another. The operating system approach captures the state of a running production system at a specific point in time.

This system is simple to set up if you are just backing up a single server. If you are backing up a distributed production environment, you must set up a shared file system, such as the network file system (NFS) on UNIX.

Managed by a storage area network (SAN)

A storage system such as storage area network (SAN) can be used to provide a central repository of production environment data. The picture shows a typical system in which the deployment manager, messaging engine, application server, support, and database data are all mirrored on a backup system through a storage area network.

Managed by the database

Databases that are designed for enterprise use have features that support high availability scenarios. You can use those features to manage runtime resources such as transaction logs and compensation logs. You can configure your system to store these logs in the database rather than in operating system files. You can use database features such as DB2 HADR or Oracle Data Guard to provide high availability for runtime logs and to automatically replicate transaction logs and compensation logs to a disaster recovery system.

Each backup method has its own constraints, advantages, and disadvantages, as described in the accompanying table.

Comparison of three ways to back up data

Type of replication Constraints Advantages Disadvantages
Managed by the operating system This type of replication ensures a coherent state across all data that is being replicated. For runtime data, this requirement means that no work can occur throughout the entire production environment during the replication. Business Process Manager servers must be quiesced and stopped. This system is inexpensive and simple to maintain.

This approach requires a maintenance window, which imposes limits on the server availability and on the recovery point objective. Therefore, it might not be appropriate for business-critical production servers.

To minimize your recovery point objective, determine the ideal backup frequency by considering the business requirements and resources to take and maintain your backups.

Managed by a storage area network (SAN) The production system can write to only one file system. The SAN must provide the capability to define a single consistency group that contains all replicated volumes, including those volumes that host the database, and the shared file system for the transaction and compensation logs.

This replication type is the classic technique for disaster recovery. SAN replication capability is robust and well documented. Depending on the sophistication of the SAN being used, replication points can be very short.

This method can be used to manage consistent replication across various managed resources. Most SAN systems support both periodic snapshots and synchronous replication of data to a remote site.

Requires extra hardware (the SAN), which is not available for all environments.
Managed by the database

The feature that supports this approach is introduced in V8.0.1.2 and V8.5.0.1.

The database manager must have access to all the data that must be replicated. To use this method, WebSphere Application Server transaction and compensation logs must be stored within the database. Database replication solutions like DB2 HADR and Oracle Data Guard are familiar to many infrastructure teams. Various synchronous and asynchronous qualities of service are possible. Configure all managed resources in to the same database can be problematic, depending on the amount of integration present in the application.


Runtime logs in a database: Overview

Store transaction and compensation logs in a relational database to improve high availability support and disaster recovery processes.

The WAS transaction service writes information to a transaction log for every global transaction that involves two or more resources, or that is distributed across multiple servers. These transactions are started or stopped either by applications or by the container in which they are deployed. The transaction service maintains transaction logs to ensure the integrity of transactions. Information is written to the transaction logs in the preparation phase of a distributed transaction. If a WAS with active transactions restarts after a failure, the transaction service is able to use the logs to replay any in-doubt transactions. This implementation allows the overall system to be brought back to a consistent state. See Transaction log file.

The WAS compensation service allows applications on disparate systems to coordinate activities that are more loosely coupled than atomic transactions. It stores information in its own dedicated recovery logs. That information is necessary for compensation after a system failure.

IBM Business Process Manager now provides two ways that you can store these runtime logs in a data recovery system. As in previous releases, the transaction logs can be stored as operating system files. Using that approach, high-availability transaction support requires the use of a shared file system to host the transaction logs, such as Network File System (NSF) or IBM General Parallel File System (GPFS). The shared file system is typically mounted on a storage area network (SAN). Storing runtime data in the operating system files remains a recommended configuration, but now you have another configuration that you can use for high availability.

With new features introduced in BPM Version 8.0.1.2 and 8.5.0.1, you can choose to store transaction logs and compensation logs in a relational database. This configuration is a useful option to use database features such as DB2 HADR or Oracle Data Guard to provide high availability for runtime logs. It supports automatically replicating transaction logs and compensation logs to a disaster recovery system. Installation and configuration data can be copied directly from the primary site. All runtime data is persistent in database. You can use database replication to synchronize runtime data from the primary site to the disaster recovery site, if all related runtime data can be configured into the same database.

In the topology that is shown in the accompanying diagram, in each data center, each cluster has two members. To use high availability features, configure all members in application, message, and support clusters to use the database to store transaction logs. During normal processing, all cluster members access their own transaction tables to store transaction information. If one cluster member fails, the high availability manager notifies the other member from that cluster to take over the work. Then, the high availability manager starts an automatic peer recovery of the transaction log tables of the failed cluster member.

For database replication, you can be confident of data consistency only if you use a single database. If two or more databases are used by a single transaction, there is no way to guarantee the data consistency because it is impossible to coordinate the two replication processes. So for BPM Standard Edition, configure transaction logs in BPMDB. For process applications in BPM Advanced Edition, you can use CMNDB for transaction logs with Oracle products with Data Guard. You can use CMNDB with DB2 if the process application uses only BPEL.

Since a single transaction might involve two or three cluster members, make sure the transaction logs from all cluster members are configured in the database. Place compensation service logs in the same database.

The figure shows a typical configuration for primary and standby data centers.



Disaster recovery procedures

From the perspective of IBM Business Process Manager and IBM Business Monitor, disaster recovery means the production environment can be restored to the secondary data center through a well-defined replication method.

Disaster recovery for BPM and IBM Business Monitor is supported through disk replication technology. A snapshot of the original production environment is taken, and data is restored and validated in the secondary data center. Optionally, beginning in V8.0.1.2 and 8.5.0.1, you can store transaction and compensation logs in a relational database so you can use the high availability disaster recovery features the database offers.

The following topics provide some guidance in setting up and managing a disaster recovery system.



Configure a disaster recovery backup system

The configuration data of your system describes the BPM environment. Set up the disaster recovery data center to have the same configuration as the primary data center so that any recovery can be complete. These instructions help you set up all of the installation images on a single replication volume. With that implementation, you are able to exactly duplicate the original data center configuration in your disaster recovery data center. If a disaster occurs while you are rolling out a configuration change to the environment, you can continue rolling out the configuration change when you restart the environment in the disaster recovery data center.

  1. Set up a disk replication system.

  2. Add the profile directory, profile_root.

  3. Add the following files from subdirectories of the installation directory, install_root: properties/profileRegistry.xml, properties/fsdb/*, and properties/Profiles.menu.

  4. Add files from the logs directory that might contain errors that are related to profile actions. Those files might be useful in the disaster recovery data center.

  5. If the original data center uses a storage area network (SAN), create an identical directory for mounting the SAN in the disaster recovery data center. Create the profiles in a subdirectory of that mounted directory, /opt/ibm/WebSphere/profiles.
  6. Develop scripts for the recovery system. Actions that create a profile, delete a profile, add a node, or remove a node must also trigger a snapshot to the installation data. You need a snapshot of the installation data for these configuration changes because some of the files that are altered for these changes are contained in the installation data (see step 3).

    1. Develop scripts or procedures for mounting the disk.
    2. Develop scripts to start the administrative processes of your disaster recovery center.
    3. Develop scripts to start your disaster recovery center resources.

  7. Load in the disaster site recovery scripts or procedures for the configuration replication volume.


Do not schedule snapshots of this replication volume. Instead, cause a snapshot to be taken each time one of the configuration images changes. A configuration image changes when configuration changes are saved and when the configuration changes are replicated to a node. Any snapshot that is taken while the configuration changes are "in flight" captures an unusable view of the installation image. When you try to use such a snapshot in your disaster recovery center, you get unpredictable results. To prevent these troublesome snapshots, take a snapshot of the configuration volume every time the configuration is altered.



Related concepts:

Environment considerations

Snapshot support

NFS support


Backing up runtime data by using a SAN drive

The runtime data of your system is the information that is stored in transaction logs and compensation logs. You can use a storage area network (SAN) drive to copy files from your primary data center to a standby server. Your runtime data is changing continually. Therefore, it is not reasonable to expect the backup data center can always have the same state as the primary data center, unless you are using synchronous replication. In many environments, synchronous replication is not a valid option because there are performance impacts from a synchronous implementation.

The runtime data consists of the WebSphere transaction logs and the compensation logs. Some of the files are associated with the BPM database, and some of the files are associated with any other resource managers. The files of interest are files that reflect the current state of the database tables, the current state of the transactions, and any other data that is managed by the resource that reflects the current state of the resource. These files vary from one implementation to another, depending on the database product or resource manager and vendor that is being used. The set of database tables in this runtime data includes at least all of the tables that are associated with the BPM configuration, such as persistent stores for messaging engines, business process applications, human tasks, and failed events.

Include the data that you require on a disk replication system with the following configuration:

  1. Configure the original data center.

    1. Create directories needed for mounting the SAN, like /opt/ibm/WebSphere/tranlogs on the WebSphere servers and /opt/ibm/WebSphere/database on the database server.
    2. Mount the storage area network (SAN) drive.

    3. Configure the transaction service to use a distributed file system. The SAN creates two volumes, one for the database, and one for the distributed file system, which is mounted on the SAN for its transaction logs. The distributed file system serves high availability by managing file locks for the distributed servers. The SAN serves disaster recovery by providing replication.

    4. Configure the database server to use this mount for its data and log files.

  2. Configure the disaster recovery data center.

    1. Create a directory in the disaster recovery data center similar to the one in the original data center.
    2. Load the disaster site recovery scripts or procedures for the run data replicated volume.
    3. Install and configure the data database catalog to find the appropriate files.

    The entire set of files for the runtime data must be included in the same snapshot and that snapshot must be taken at an instant of time. Your performance needs might require you to place the database log files on different disk arms than the database data or indicate some other placement needs. Work with the database vendor, your SAN vendor, and your operating system to configure the optimum configuration for your requirements. As you work with your SAN vendor, you must make sure the write order is preserved in the snapshot and its replica.


Set a schedule for the snapshot that is taken of the volume. The schedule determines whether you can meet your recovery point objective. For example, if you have a recovery point objective of 30 minutes, capture a snapshot at an interval of just less than 30 minutes. You must consider the time that it takes to actually take a snapshot and transfer it to the disaster data center. Your SAN provider can help you sort out those details.



Related concepts:

Database considerations

Snapshot support

NFS support

Storing transaction logs in a database for high availability

You can configure transaction logs to be stored in a database where you can implement automatic replication and simpler disaster recovery.

For more information about this feature and why it can be useful, read Runtime logs in a database: Overview. Configure the transaction log location and the compensation log location for each server in the cluster before you enable high availability. This solution uses two data centers. One is the primary data center and the other is a standby data center. The installation and configuration data from the primary data center is copied to the standby data center. Database replication is used to synchronize replication of runtime data from the primary database to the standby database.

  1. At the primary site, install BPM on all nodes and create a deployment environment as you would normally.

  2. Set up and configure the database.

    1. Install the database for the primary data center.

    2. Use generated database scripts to create database objects.
    3. Install the database in corresponding standby data center.

    4. Configure the databases to implement data replication between the primary database, and the standby database.

  3. Configure the transaction service.

    1. Start the deployment environment at the primary data center.

    2. For each cluster member in the application cluster and in the support cluster, create a data source on the cluster level. Configure the logs for the transaction service and the compensation service for each cluster member into the database. Use a unique prefix for each member. See detailed instructions inStoring transaction and compensation logs in a relational database for high availability in the WebSphere Application Server documentation.

  4. When all cluster members in application and support clusters are configured, enable transaction high availability for each application and support cluster. On the Configuration page under General Properties, select Enable failover of transaction log recovery.

  5. Restart the whole environment and make sure there are no exceptions in the system log files. When you start a process server that is configured to store transaction and compensation logs in a database, the transaction service can time out while the service waits for the data source to become available. If that happens, you see this error message:
    WSVR0009E: Error occurred during startup com.ibm.ws.exception.RuntimeError: com.ibm.ws.recoverylog.spi.InternalLogException: Failed to locate data source, com.ibm.ws.recoverylog.spi.InternalLogException: Failed to locate data source         at com.ibm.ws.tx.util.WASTMHelper.asynchRecoveryProcessingComplete(WASTMHelper.java:176)
            at com.ibm.tx.util.TMHelper.asynchRecoveryProcessingComplete(TMHelper.java:57)
          
    If you encounter an error of this sort, increase the timeout value.

    1. Open the administrative console.

    2. Select Servers > Application servers > server name.

    3. Under Server infrastructure, select Java and Process Management > Process Definition.

    4. Under Additional properties, select Java virtual machine > Custom properties > > New.

    5. In the Name entry field, type com.ibm.ws.recoverylog.custom.jdbc.impl.ConfigOfDataSourceTimeout.

    6. In the Value entry field, set an integer timeout variable such as 30000 to represent a 30-second timeout. The timeout period is measured in milliseconds.

    7. Select OK.


Devise a high availability and disaster recovery test plan that is appropriate for the business needs of your organization. The plan might include simulating a WAS ND failover and a cross-database failover to ensure that your system provides adequate business continuity.



Restoring data

If a disaster occurs in the primary data center, you can continue to provide business support if you have a valid backup. You restore the backup to the secondary data center and then verify the restored data. Restoration is the process of rebuilding all or part of a backup to the corresponding secondary environment, as shown in Figure 1:

Figure 1. Restoration process

To ensure the consistency of all data, the restoration must occur for the whole cell and underlying database.

To restore the production environment to the secondary environment:

  1. Reinstall the installation data, including the BPM installation data and the IBM Business Monitor installation data.
  2. Reinstall DB2 and create the DB2 instance.
  3. Restore the configuration data to all servers from the backup configuration data.
  4. Restore the runtime data to all servers by replicating the backup runtime data.
  5. Perform changes that are specific to the environment. For example, update the host name to reflect the secondary environment, or change the data source configuration to point to the secondary database.
  6. Validate the connectivity to the resources outside the recovery scope.
  7. To restart the environment, follow these steps:

    1. Start the database server.

    2. Start the deployment manager and node agents.

    3. Start the message servers of IBM Business Process Manager.

    4. Start the support servers of IBM Business Process Manager.

    5. Start the application servers of IBM Business Process Manager.

    6. Start the message servers of IBM Business Monitor.

    7. Start the support servers of IBM Business Monitor.

    8. Start the application servers of IBM Business Monitor.

  8. Verify the restored environment and determine whether it is valid.
  9. Recover inflight transactions.
  10. Redirect load to the new environment.

    You can typically set the same host name and IP address for the secondary environment as for the primary environment. This step depends on your backup policy.



Verifying restored data

After you restore the backup of BPMion environment to the secondary data center, verify the data to determine whether the backup is a valid copy. You verify restored data at the following levels:

A failure, loss of data, or inconsistency from the process instance level can be tolerated. An abnormal state from the system level or application level, however, must be fixed because the backup is invalid.

Generally, verification is relatively simple for the system, module, and application levels. Verification of the instance level can be more difficult because the number of instances might be large. Use a real runtime scenario for the disaster recovery test, which takes the backup of the running instances and verifies the specific instances are working properly.

To verify the restored data in the secondary environment is valid:

  1. Verify the system-level services such as the Business Process Choreographer container and the Human Task Manager container are working properly. Verify the messaging engines for various buses can be started successfully. To perform these verifications, you can use the System Health widget in Business Space.

  2. Verify the modules and applications can be started successfully. Verify the process templates can be started normally.

  3. Verify the process instances are in a consistent state. Some backups might not work properly after restoration. You must identify and discard those backups and use only the valid ones.

    Figure 1. Backup to a remote storage system

  4. Verify the process instance state between IBM Business Process Manager and IBM Business Monitor is consistent.

  5. Verify that synchronous and asynchronous invocation for Service Component Architecture (SCA) can continue for processing.

  6. Verify that you see new instances in your monitor dashboards when you run new process instances.



Installation and configuration considerations

Consider disaster recovery as you install and configure your operating system, databases, and production environment.



Operating system considerations

Verify the basic operating system configurations for the primary and secondary environments are the same or consistent.

The host name of the primary and secondary environments are used in the BPM and IBM Business Monitor configuration data, for example in the serverindex.xml file.

For an BPM server configuration where distributed transactions must be recovered, mirror the transaction logs on a different server that has the same server name, the same host name, and access to the same resource managers as the original server. Information about each server that is involved in a distributed transaction is stored in the transaction logs. This information includes the server name and the host name of the computer on which the server is running. When a distributed transaction is being recovered, the servers that are involved in the recovery use this stored information to contact each other. Therefore, if a server fails and the logs must be recovered on a new server, that new server must have the same server name and host name as the original server. The new server must also have the same access to the same resource managers, databases, and message queues as the original server.

In the examples that are in the topics that follow, all operating systems are deployed with Red Hat Enterprise Linux.



Snapshot support

To back up the primary environment without affecting normal functioning, you need the additional support of an operating system snapshot.

On the Linux platform, you can use Logical Volume Management (LVM). LVM provides a higher-level view of the disk storage on a computer system than the traditional view of disks and partitions. With LVM, the system administrator has more flexibility in allocating storage to applications and users by demand. The physical volumes of the disk are organized as logical volumes, and the file system is mounted on logical volumes. This organization allows the flexible and dynamic management of the disk size of the file system.

When you enable the snapshot function on the LVM, the file system supports concurrent backup while the file system is undergoing a write operation.

Without snapshot support, the native backup of a large number of files consumes a great deal of time. During this period of time, some files might be updated because transactions are continuing in the production environment, which means the backup contains files saved at different points in time. If any files are in an inconsistent state, the backup is not acceptable.

To support the snapshot functionality through LVM, the Copy on Write mechanism is used. When Copy on Write is used, the following sequence of events occurs:

  1. The snapshot creates a logical copy of the data after the application is frozen for a very short period.
  2. A write request to the original copy of the data results in the system copying the original data to the snapshot disk area before the original copy is overwritten.
  3. A read into the logical copy is redirected to the original copy if the data is not modified. If the data is modified, the read request is satisfied from the snapshot disk area.

The following topics provide information about taking the snapshot:



Prepare the operating system before a snapshot

Before you take a snapshot of the operating system, you create a physical volume and logical volume.

When you prepare for an operating system snapshot, consider the following factors.

To prepare the operating system before you take a snapshot:

  1. List the general information (physical volume, volume group, and logical volume) of the Linux operating system:
    # pvdisplay # vgdisplay
    # lvdisplay
  2. List the disk information:

      # fdisk -l

  3. Create a physical volume on the disk partition, as in the following example:

      /dev/sda2: # pvcreate /dev/sda2 /home

  4. Extend the new physical volume to the volume group:

      # vgextend VolGroup00 /dev/sda2

  5. Create a logical volume on the volume group:

      # lvcreate -name homebackup -size 10G VolGroup00

  6. Make the file system format for the new logical volume:

      # mkfs.ext3 /dev/VolGroup00/homebackup

  7. Mount the logical volume to the /home directory:

      # mount /dev/VolGroup00/homebackup/home /home


Take a snapshot of the operating system.



Next topic: Taking an operating system snapshot


Taking an operating system snapshot

As part of your disaster recovery plan, you create a snapshot of the operating system from your primary environment. You then transfer the snapshot to your secondary environment.

Make sure you have completed the steps described in "Preparing the operating system before a snapshot."

The longer you keep a snapshot, the more disk space will be taken up. Create snapshots periodically and sort them based on your recovery point objective.

You can also use other methods to create a snapshot.

To take an operating system snapshot:

  1. Take a snapshot of the /home directory. The snapshot is also a new logical volume:

      # lvcreate -L1G -s -n homesnapshot /dev/VolGroup00/homebackup

    You can also use the GUI tool in the operating system, which, for Red Hat Linux, is Logical Volume Management.

  2. To use the logical volume, create a directory under /mnt to store the snapshot files:

      # mkdir /mnt/homesnapshot

  3. Mount the snapshot logical volume to the new directory:

      # mount /dev/VolGroup00/homesnapshot /mnt/homesnapshot

    If you no longer need a snapshot, unmount it and remove it to save disk space:

      # lvremove /dev/VolGroup00/homesnapshot


After you take a snapshot, compress it and then FTP it to the secondary environment. On the secondary environment, extract the snapshot files and test them.



Previous topic: Prepare the operating system before a snapshot


NFS support

In a distributed environment, the data of the production environment is distributed over several operating systems. Without special configuration, during run time, it is highly possible to get an inconsistent copy of the entire environment even when you use a snapshot. A consistent copy of the entire environment is required to ensure the proper behavior of the system. To ensure consistency, you can use a Network File System (NFS).

If a snapshot is performed at the operating system level, the snapshot for different operating systems might correspond to the state at different points in time.

When you use a Network File System (NFS), users on a client computer can access files over the network as if the files were on their local server. In this architecture, a file server is configured on one operating system, which functions as the central repository for all files. The NFS client operating system can connect with the file server and mount the specific directory to the file server. The NFS client operates transparently on the directory mapped on the file server.

When NFS is enabled, therefore, the configuration and installation data of the production environment can be configured on a centralized NFS file server. In combination with the snapshot support of the file server operating system, you can create a consistent backup of the entire production system.

Before you create a snapshot, you must set up your NFS server and clients.



Configure the NFS server

The first step in configuring your NFS environment is to configure the NFS server, which functions as the central repository for all files.

The following example shows how to configure your NFS server.

  1. Create the directories to mount to the NFS client directories ( /home/machine1, /home/machine2, and /home/machine3).

    Verify these directories have write authority.

  2. Configure the /etc/exports file:
    /home/machine1 *(rw, sync)
    /home/machine2 *(rw, sync, no_wdelay, nohide)
    /home/machine3 *(rw, sync, no_root_squash)
    /home/machine4 *(rw, sync, no_root_squash)

    In this example, the /home/machine3 and /home/machine4 directories will be mounted to the remote managed-node profile directory for IBM Business Monitor.

    You must have the no_root_squash parameter, or you will see an error (cp:failed to preserve ownership) when you create the managed-node profile for IBM Business Monitor.

  3. Before the NFS service starts, the portmap service must be running. To check its status, use the following command:

      # service portmap status

  4. If the portmap service has stopped, use the following command to start it:

      # service portmap start

  5. To start or restart the NFS service, use one of the following commands:
    # service nfs start # service nfs restart 
  6. To make the NFS service start automatically with the system, use the following command:

      # chkconfig --level 35 nfs on

  7. To check the NFS export directories, use the following command.

      # showmount -e <server_ip>

    You can use this command on both the NFS server and the NFS client.


Configure the NFS client.



Next topic: Configure the NFS clients


Configure the NFS clients

The second step in configuring your NFS environment is to configure the NFS clients.

The following example shows how to configure your NFS server.

For each NFS client:

  1. To mount the corresponding directory to the remote NFS server, use the following commands:
    # mount <server_ip>:/home/machine1 /home/dmgr
    # mount <server_ip>:/home/machine2 /home/db2
    # mount <server_ip>:/home/machine3 /home/custom01

  2. Make these mounts start automatically with the system so that you will not have to run these commands every time that you start your system.
  3. Repeat steps 1 and 2 for all other NFS clients.



Previous topic: Configure the NFS server


Database considerations

Install and configure the database for your primary and secondary environments in accordance with your disaster recovery plan.

The underlying database must be included in the same recovery scope of the BPM and IBM Business Monitor production environment.

In the examples in the topics in this section, DB2 is the underlying database type.


Installation

For the database installation in the primary environment, follow the instructions in the DB2 installation manual to install and create the DB2 instance and related database users.

For the secondary environment, perform the following tasks:

  1. Install DB2 with the same installation path and instance name as in the primary environment.

  2. Use the same user names and passwords used by DB2 in the system.


Configuration

The database configuration involves the creation of the database, and tablespace.

For the primary environment, perform the following tasks:

  1. Manually create all the necessary databases for the environment.

  2. Set the database path to the directory that is mounted on the NFS server.

For the secondary environment, mount the same directory of the database server on the NFS server. No configuration is required before restoration.



Environment considerations

Install and configure your primary and secondary environments in accordance with your disaster recovery plan.


Installation

When you install the BPM or IBM Business Monitor environment as the root user, there are no special instructions for the primary environment.

For the secondary environment, reinstall the environment with the same information, such as installation path, product version, and patch level, as in the primary environment.


Configuration

Configuration includes creating profiles and configuring cluster environments.

When creating profiles in the primary environment, the profile path must be located in the directory that is targeted at the NFS server. In the secondary environment, the same directory of the corresponding operating system must be mounted on the NFS server. No configuration is required before the restoration.

To configure the cluster environment, follow the normal process of cluster configuration. The following figure illustrates the entire test scenario example.

Figure 1. Production environment and backup environment

This example contains a total of ten servers, five for each environment (NFS server, Machine1, Machine2, Machine3, and Machine4).

Each server except the NFS server in the production environment has the same IP address and host name as the one in the primary environment. The NFS servers in the two environments have different IP addresses and host names.

DB2 and IBM Business Process Manager are all installed in the /opt/ibm directory under their installation servers. DB2 is installed on Machine2, BPM is installed on Machine1 and Machine2, and IBM Business Monitor is installed on Machine1, Machine3, and Machine4. For DB2, the databases related to IBM Business Process Manager and IBM Business Monitor are created under /home/db2, and the dmgr and custom profiles for BPM and IBM Business Monitor are created under /home.

The dmgr files for BPM and IBM Business Monitor are created on Machine1, IBM Business Process Manager custom profiles are created on Machine1 and Machine2, and IBM Business Monitor custom profiles are created on Machine3 and Machine4.

The following figure provides more information about the test scenario example. The structure in the figure is just an example. You can arrange your directories according to the requirements of your system.

Figure 2. Directories on the NFS server

With this structure, to back up all profiles and database files, you can conveniently take a snapshot of the /home directory under the NFS server. Alternatively, you could separate the runtime data from the configuration data and make snapshots for them individually.

Keep the backups as small as possible because otherwise your processing time could be greater than your recovery time objective.



Recovery scenarios

Disaster recovery occurs at various times in a production environment. For example, in addition to the backups that occur according to a regular schedule, you also perform disaster recovery tasks as needed. The production environment might go through various states ( processes might be running) while the backup is taking place.

The topics in this section describe the disaster recovery scenario for BPM with IBM Business Monitor, including the installation, configuration, and underlying database. The recovery scope covers only the production environment and no other systems and components that interact with it.

In a complete scenario, the suggestions in this section would be incorporated into the overall disaster recovery document to provide a complete solution.

The following sections describe typical scenarios for backup, restoration, and verification.



Configuration backup and restoration

After a configuration change, such as when you create a profile, configure a deployment environment, or install an application, back up the configuration data of the primary environment. Then verify whether the configuration change can be restored successfully in the secondary environment.

To verify the data for this scenario:

  1. After a configuration change, create a snapshot of the environment.
  2. Restore the snapshot to the secondary environment.
  3. To verify the secondary environment, start the whole environment independently, and make sure the secondary environment does not share any resources with the primary environment.

After you verify the data, you should discover the configuration changes are still valid in the secondary environment.


You can now safely take a snapshot of the configuration changes, because configuration changes are protected through the backup and restoration procedure.



Runtime backup and restoration

After you back up and restore the configuration and runtime data, verify whether the current instances, such as long-running process instances, short-running process instances, SCA invocation instances, and IBM Business Monitor monitored instances, can be restored to the secondary environment.

This is the most challenging scenario, and it requires special design considerations.

Because RAM data will be lost during the backup and restoration procedure, you must depend on global transactions to keep data integrity.

To ensure overall consistency, all modified resources inside the scenario design must be included in the same recovery scope.

For asynchronous invocation, you can get different replay results because you can have different settings on the transaction boundaries. Because the transaction cannot pass through the boundary of caller and partner, a separate transaction context is required for both caller and partner, so they can be restored through the disaster recovery procedure.

The testing scenario is shown in the following figure:

The scenario consists of the following steps:

To verify the data for this scenario:

  1. Generate some load on the environment, and make sure that some instances are still running.
  2. Take a snapshot of the environment.
  3. Restore the snapshot to the secondary environment.
  4. To observe the behavior of the restored environment, start the whole environment in an isolated environment that does not share any resources with the primary environment.

After you verify the data, you should discover the running instances will continue for navigation in the secondary environment as normal and the instance states from BPM and IBM Business Monitor are consistent.

Through the persistence and transaction support of the underlying implementation, the running instances will continue to run through the backup and restoration procedure.



Verification of the backup and restoration

For production environment and application scenarios, test your backup and restoration procedure so that you can identify any problems that might exist in your procedure.

When your primary environment comes back, carry out the steps:

  1. Perform a clean shutdown of the secondary environment.
  2. Move all the data back to your primary environment.

  3. Start the primary system and switch all the connections back.



21. Searching knowledge bases

You can often find solutions to problems by searching IBM knowledge bases. Optimize your results by using available resources, support tools, and search methods.

To search for solutions to your problems in IBM knowledge bases.

  1. Search with IBM Support Assistant. IBM Support Assistant (ISA) is a free software serviceability workbench that helps you resolve questions and problems with IBM software products. The ISA tool can search multiple knowledge bases simultaneously

    To search multiple Internet resources for BPM, open the ISA and click Search. From this page, you can search a variety of resources including:

    • IBM Software Support Documents
    • IBM developerWorks
    • IBM newsgroups and forums
    • Google
    • IBM product information centers

    These free newsgroups and forums do not offer any formal IBM product support. They are intended for user-to-user communication. IBM will not be actively participating in these discussions. However, IBM does review these newsgroups periodically to maintain a free flow of accurate information. You may also want to browse the following resources individually.

  2. Search the information center.

    IBM provides extensive documentation in the form of online information centers. An information center can be installed on your local machine or on a local intranet. An information center can also be viewed on the IBM web site. You can use the powerful search function of the information center to query conceptual and reference information and detailed instructions for completing tasks.

  3. Search available technical resources. In addition to this information center, the following technical resources are available to help you answer questions and resolve problems:

    • IBM Business Process Manager technotes
    • IBM Business Process Manager Authorized Program Analysis Reports (APARs)
    • IBM Business Process Manager support web site
    • Redbooks Domain
    • IBM Education Assistant
    • IBM Business Process Manager forums and newsgroups


The following resources describe how to optimize your search results:


IBM Education Assistant


22. Getting fixes

A product fix might be available to resolve your problem.

To get product fixes, perform the steps in the Procedure section.

  1. Determine which fix you need. Check the list of IBM Business Process Manager recommended fixes to confirm that your software is at the latest maintenance level. Check the list of problems fixed in the IBM IBM Business Process Manager fix readme documentation that is available for each listed fix pack and refresh pack to see if IBM has already published an individual fix to resolve your problem. To determine what fixes are available using IBM Support Assistant, run a query on fix from the search page.

    Individual fixes are published as often as necessary to resolve defects in BPM. In addition, two kinds of cumulative collections of fixes, called fix packs and refresh packs, are published periodically for BPM, in order to bring users up to the latest maintenance level. You should install these update packages as early as possible in order to prevent problems.

    Fixes specific to the underlying WebSphere Application Server product may also be obtained from the WebSphere Application Server Support Site or from the WebSphere Application Server Support team. Fixes for individual APARs for WebSphere Application Server generally can be applied without affecting IBM Business Process Manager. However, consult with the software requirements page before updating WAS with cumulative collections of fixes (fix packs). First check to see the cumulative fix has passed certification, or contact the Support team for verification.

  2. Download the fix. Open the download document and follow the link in the Download package section. When downloading the file, ensure the name of the maintenance file is not changed. This includes both intentional changes and inadvertent changes caused by certain web browsers or download utilities.
  3. Apply the fix. Follow the instructions in the Installation Instructions section of the download document. See the "Updating IBM Business Process Manager"topic in the "Install" documentation.

  4. To receive weekly notification of fixes and updates, subscribe to My Support email updates.


Subscribe to My Support e-mail updates

Required interim fixes for the BPM V8.5 products


23. Contacting IBM Software Support

IBM Software Support provides assistance with product defects.

To take advantage of unique Support features, see the BPM support page. The Support Page contains the latest information on fixes and downloads, educational resources, and commonly encountered problems and their solutions.

Before contacting IBM Software Support, your company must have an active IBM software subscription and support contract, and you must be authorized to submit problems to IBM. The type of software subscription and support contract that you need depends on the type of product you have. For information about the types of software subscription and support contracts available, see "Enhanced Support" in the Software Support Handbook site listed in the Related Topics section.

To contact IBM Software Support with a problem, perform the steps in the Procedure section.

  1. Define the problem, gather background information, and determine the severity of the problem. For help, see the "Contacting IBM" in the Software Support Handbook.
  2. Gather diagnostic information. When explaining a problem to IBM, be as specific as possible. Include all relevant background information so that IBM Software Support specialists can help you solve the problem efficiently.

    For information that IBM Support needs in order to help you solve a problem, see the BPM MustGather technote. You can use the BPM plug-in for the IBM Support Assistant to capture the data and send it to IBM.

    If you are able to determine the problem is purely with underlying WebSphere Application Server functionality, consider requesting assistance specifically from the WAS Support team rather than the BPM team. For information that IBM Support needs in order to help you solve a WebSphere Application Server problem, see the WebSphere Application Server MustGather Technote.

  3. Submit your problem to IBM Software Support in one of the following ways:

    • Use IBM Support Assistant: See the "IBM Support Assistant" topic.
    • Online: Open a service request on the IBM Software Support site using the Electronic Service Request (ESR) tool.
    • By telephone: For the telephone number to call in your country or region, go to the contacts page of the IBM Software Support Handbook on the Web and click the name of your geographic region.

If the problem you submit is for a software defect or for missing or inaccurate documentation, IBM Software Support creates an Authorized Program Analysis Report (APAR). The APAR describes the problem in detail. Whenever possible, IBM Software Support provides a workaround that you can implement until the APAR is resolved. Support will work and communicate with you on the progress and deliver the fix once it is completed. Additionally, once completed, IBM will also publish the resolved APARs on the Software Support Web site, so that other users who experience the same problem can benefit from the same resolution.


IBM Software Support specialists often use the IBM Assist On-site live, remote-assistance tool to help with problem determination, data collection, and problem resolution. Read the IBM Assist On-site website for information on how to prepare the machine for a remote-assistance session.



Related tasks:

IBM Support Assistant Data Collector

Software Support Handbook

IBM Software Support site

IBM Support Portal Overview


+

Search Tips   |   Advanced Search