Network Deployment (Distributed operating systems), v8.0 > Tune performance > Tune Service integration > Tune messaging engine data stores


Tune the detection of database connection loss

If a messaging engine is configured to use a data store and cannot connect to its data store, for example because the database that contains the data store is not running, the messaging engine does not start. We can tune the system to increase the chance of a successful start of the messaging engine.

In a single-server environment, when you start the application server the messaging engine attempts to start. If the database is unavailable for more than 15 minutes, the messaging engine might enter the stopped state and need to be started manually.

In a high availability environment, a messaging engine starts either as part of the server or cluster startup, or as part of the failover process. During messaging engine startup, the messaging engine attempts to connect to the data store, for up to 15 minutes by default. If one of the following statements remains true during that time, the messaging engine cannot start on the server, and the server is disabled for high availability:

This disabled state can propagate to all members of the cluster. We must manually re-enable the servers to maintain your high availability environment.

We can increase the chance of the messaging engine starting successfully by configuring various parameters, such as the 15 minute default timeout, on the database server or application server.


Procedure

  1. On the database server, configure the operating system to minimize the amount of time taken to detect the loss of a network connection to an application server. Refer to the documentation for the operating system for details. For example, the following table lists the relevant parameters for Windows and AIX operating systems:

    TCP/IP parameters. The first column of the table provides the list of TCP/IP parameters for the Windows operating systems. The second column of the table provides the list of TCP/IP parameters for the AIX operating systems. The third column provides the description of the parameters.

    Parameter name on Windows operating systems Parameter name on AIX operating systems Description
    KeepAliveTime tcp_keepidle The amount of time (in milliseconds on Windows operating systems and in 0.5 seconds on AIX operating systems) to wait before sending a keepalive request for an inactive connection.
    KeepAliveInterval tcp_keepintvl The amount of time (in milliseconds on Windows operating systems and in 0.5 seconds on AIX operating systems) to wait for a response.
    TCPMaxDataRetransmissions tcp_keepcnt The number of requests to send before ending the connection.
    We can calculate the total amount of time taken for the database server to detect the failure of the connection to the application server, by using the following formula:

    time to detect connection failure = keep alive time + (keep alive interval x number of requests) For example, for a Windows system with the parameters set according to the following table, the total amount of time taken for the database server to detect the failure of the connection to the application server is 350 seconds.

    Example parameter values. The first column provides the parameter names. The second column provides a sample value for the parameters.

    Parameter Value
    KeepAlive 300000 milliseconds
    KeepAliveInterval 10000 milliseconds
    TCPMaxDataRetransmissions 5
    Your database product might also have relevant parameters that you can configure, for example, the IDLE THREAD TIMEOUT parameter in DB2 for z/OS.

    When the database server detects the loss of the connection to the application server, the database releases the locks on the data store. The messaging engine can now access the data store and can therefore start successfully.

  2. On the application server, tune the messaging engine to wait for an appropriate amount of time for the data store to become available. By default, the messaging engine will attempt to connect to the data store every 2 seconds for 15 minutes. Complete the rest of this step to adjust these timings.

    1. Click Service integration -> Buses -> bus_name -> [Topology] Messaging engines -> engine_name -> [Additional Properties] Custom properties to navigate to the custom properties panel for the messaging engine.

    2. Click New.
    3. Type sib.msgstore.jdbcInitialDatasourceWaitTimeout in the Name field and an appropriate value in the Value field. This property is the time, in milliseconds, to wait for the data store to become available. The default value is 900000 (15 minutes). This time includes the time required to establish a connection to the database and to obtain the required table locks.

      Ensure that the value of this property is greater than the total time taken for the database server to detect the loss of a network connection, as configured in step 1.

    4. Click OK.

    5. Click New.
    6. Type sib.msgstore.jdbcStaleConnectionRetryDelay in the Name field and an appropriate value in the Value field. This property is the time, in milliseconds, to wait between attempts to connect to the data store. The default value is 2000 (2 seconds). For example, if you set the sib.msgstore.jdbcInitialDatasourceWaitTimeout property to 600000, and the sib.msgstore.jdbcStaleConnectionRetryDelay property to 3000, the messaging engine will attempt to connect every 3 seconds until 10 minutes has passed.

    7. Click OK.

    8. Save your changes to the master configuration.

    9. Restart the application server.

    10. If we have a cluster, repeat the previous steps to add these properties for every messaging engine in the cluster.


Results

By configuring these parameters and custom properties, you minimize the amount of time taken for the database server to detect the loss of a network connection, and ensure that the messaging engine waits for a reasonable amount of time for the database connection to recover before attempting to start.


What to do next

You might want to configure the messaging engine and server to restart in the event of a database connection failure. This behavior reduces the risk of the messaging engine being in an inconsistent state when the database connection is restored.
Data store life cycle
Manage high availability when messaging engines fail to start
http://support.microsoft.com/kb/314053
Configure messaging engine and server behavior when a data store connection is lost


Related


Service integration custom properties

+

Search Tips   |   Advanced Search