Verifying shared file system behavior on Multiplatforms

Run amqmfsck to check whether a shared file system on UNIX and IBM® i systems meets the requirements for storing the queue manager data of a multi-instance queue manager. Run the IBM MQ MQI client sample program amqsfhac in parallel with amqmfsck to demonstrate that a queue manager maintains message integrity during a failure.

Before you begin

About this task

Failover of a multi-instance queue manager can be triggered by hardware or software failures, including networking problems which prevent the queue manager writing to its data or log files. Mainly, you are interested in causing failures on the file server. But you must also cause the IBM MQ servers to fail, to test any locks are successfully released. To be confident in a shared file system, test all of the following failures, and any other failures that are specific to your environment:

Create the directory on the networked storage that you are going to use to share queue manager data and logs. The directory owner must be an IBM MQ Administrator, or in other words, a member of the mqm group on UNIX. The user who runs the tests must have IBM MQ Administrator authority.

Procedure

In each of the checks, cause all the failures in the previous list while the file system checker is running. If you intend to run amqsfhac at the same time as amqmfsck, do the task, Running amqsfhac to test message integrity in parallel with this task.

Results

Examples

Table 1. Running the data integrity check on two servers at the same time
IBM MQ server 1	IBM MQ server 2
amqmfsck -a /shared/qmdata
`Please start this program on a second machine with the same parameters.` `File lock acquired.` `Start a second copy of this program with the same parameters on another server.` `Writing data into test file.` `To increase the effectiveness of the test, interrupt the writing by ending the process, temporarily breaking the network connection to the networked storage, rebooting the server or turning off the power.`	amqmfsck -a /shared/qmdata
`Waiting for lock...` `Waiting for lock...` `Waiting for lock...` `Waiting for lock...` `Waiting for lock...` `Waiting for lock...`
`Turn the power off here.`
	`File lock acquired.` `Reading test file` `Checking the integrity of the data read.` `Appending data into the test file after data already found.` `The test file is full of data. It is ready to be inspected for data integrity.`

Table 2. Successful locking on two servers
IBM MQ server 1	IBM MQ server 2
> amqmfsck -w /shared/qmdata Please start this program on a second machine with the same parameters. Lock acquired. Press Return or terminate the program to release the lock.
	> amqmfsck -w /shared/qmdata Waiting for lock...
[ Return pressed ] Lock released.
	Lock acquired. The tests on the directory completed successfully

Table 3. Successful locking on two servers - verbose mode
IBM MQ server 1	IBM MQ server 2
> amqmfsck -wv /shared/qmdata Calling 'stat("/shared/qmdata")' Calling 'fd = open("/shared/qmdata/amqmfsck.lkw", O_EXCL \| O_CREAT \| O_RDWR, 0666)' Calling 'fchmod(fd, 0666)' Calling 'fstat(fd)' Please start this program on a second machine with the same parameters. Calling 'fcntl(fd, F_SETLK, F_WRLCK)' Lock acquired. Press Return or terminate the program to release the lock.
	> amqmfsck -wv /shared/qmdata Calling 'stat("/shared/qmdata")' Calling 'fd = open("/shared/qmdata/amqmfsck.lkw", O_EXCL \| O_CREAT \| O_RDWR,0666)' Calling 'fd = open("/shared/qmdata/amqmfsck.lkw, O_RDWR, 0666)' Calling 'fcntl(fd, F_SETLK, F_WRLCK) 'Waiting for lock...
[ Return pressed ] Calling 'close(fd)' Lock released.
	Calling 'fcntl(fd, F_SETLK, F_WRLCK)' Lock acquired. The tests on the directory completed successfully

Running amqsfhac to test message integrity

amqsfhac checks that a queue manager using networked storage maintains data integrity following a failure.

Before you begin

You require four servers for this test. Two servers for the multi-instance queue manager, one for the file system, and one for running amqsfhac as a IBM MQ MQI client application.

Follow step 1 in Procedure to set up the file system for a multi-instance queue manager.

About this task

Procedure

Create a multi-instance queue manager on another server, QM1, using the file system you created in step 1 in Procedure.
- See Create a multi-instance queue manager.
Start the queue manager on both servers making it highly available. On server 1:
```
strmqm -x QM1
```
On server 2:
```
strmqm -x QM1
```
Set up the client connection to run amqsfhac.
1. Use the procedure in Verifying an IBM MQ installation for the platform, or platforms, that your enterprise use to set up a client connection, or the example scripts in Reconnectable client samples.
2. Modify the client channel to have two IP addresses, corresponding to the two servers running QM1. In the example script, modify:
```
DEFINE CHANNEL(CHANNEL1) CHLTYPE(CLNTCONN) TRPTYPE(TCP) +
CONNAME('LOCALHOST(2345)') QMNAME(QM1) REPLACE
```
  To:
```
DEFINE CHANNEL(CHANNEL1) CHLTYPE(CLNTCONN) TRPTYPE(TCP) +
CONNAME('server1(2345),server2(2345)') QMNAME(QM1) REPLACE
```
  Where server1 and server2 are the host names of the two servers, and 2345 is the port that the channel listener is listening on. Usually this defaults to 1414. We can use 1414 with the default listener configuration.
Create two local queues on QM1 for the test. Run the following MQSC script:
```
DEFINE QLOCAL(TARGETQ) REPLACE
DEFINE QLOCAL(SIDEQ) REPLACE
```
Test the configuration with amqsfhac
```
amqsfhac QM1 TARGETQ SIDEQ 2 2 2
```
Test message integrity while you are testing file system integrity.
- Run amqsfhac during step 5 of Procedure.
```
amqsfhac QM1 TARGETQ SIDEQ 10 20 0
```
If you stop the active queue manager instance, amqsfhac reconnects to the other queue manager instance once it has become active. Restart the stopped queue manager instance again, so that we can reverse the failure in your next test. You will probably need to increase the number of iterations based on experimentation with your environment so that the test program runs for sufficient time for the failover to occur.

Results

An example of running amqsfhac in step 6 is shown in Figure 9. The test is a success.

If the test detected a problem, the output would report the failure. In some test runs MQRC_CALL_INTERRUPTED might report Resolving to backed out. It makes no difference to the result. The outcome depends on whether the write to disk was committed by the networked file storage before or after the failure took place.

Figure 9. Output from a successful run of amqsfhac

Sample AMQSFHAC start
qmname = QM1
qname = TARGETQ
sidename = SIDEQ
transize = 10
iterations = 20
verbose = 0
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Resolving MQRC_CALL_INTERRUPTED
MQGET browse side tranid=14 pSideinfo->tranid=14
Resolving to committed
Iteration 7
Iteration 8
Iteration 9
Iteration 10
Iteration 11
Iteration 12
Iteration 13
Iteration 14
Iteration 15
Iteration 16
Iteration 17
Iteration 18
Iteration 19
Sample AMQSFHAC end