+

Search Tips | Advanced Search

Verify shared file system behavior on Multiplatforms

Run amqmfsck to check whether a shared file system on UNIX and IBM i systems meets the requirements for storing the queue manager data of a multi-instance queue manager.


Before starting

You need a server with networked storage, and two other servers connected to it that have IBM MQ installed. We must have administrator (root) authority to configure the file system, and be an IBM MQ Administrator to run amqmfsck.


About this task

Requirements for shared file systems on Multiplatforms describes the file system requirements for using a shared file system with multi-instance queue managers. The IBM MQ technote Testing statement for IBM MQ multi-instance queue manager file systems lists the shared file systems that IBM has already tested with. The procedure in this task describes how to test a file system to help you assess whether an unlisted file system maintains data integrity.

Failover of a multi-instance queue manager can be triggered by hardware or software failures, including networking problems which prevent the queue manager writing to its data or log files. Mainly, we are interested in causing failures on the file server. But we must also cause the IBM MQ servers to fail, to test any locks are successfully released. To be confident in a shared file system, test all of the following failures, and any other failures that are specific to the environment:

  1. Shutting down the operating system on the file server including syncing the disks.
  2. Halting the operating system on the file server without syncing the disks.
  3. Pressing the reset button on each of the servers.
  4. Pulling the network cable out of each of the servers.
  5. Pulling the power cable out of each of the servers.
  6. Switching off each of the servers.

Create the directory on the networked storage that we are going to use to share queue manager data and logs. The directory owner must be an IBM MQ Administrator, or in other words, a member of the mqm group on UNIX. The user who runs the tests must have IBM MQ Administrator authority.

Use the example of exporting and mounting a file system in Create a multi-instance queue manager on Linux or Mirrored journal configuration on an ASP using ADDMQMJRN to help you through configuring the file system. Different file systems require different configuration steps. Read the file system documentation.

Note: Run the IBM MQ MQI client sample program amqsfhac in parallel with amqmfsck to demonstrate that a queue manager maintains message integrity during a failure.


Procedure

In each of the checks, cause all the failures in the previous list while the file system checker is running. If you intend to run amqsfhac at the same time as amqmfsck, do the task, Running amqsfhac to test message integrity in parallel with this task.

  1. Mount the exported directory on the two IBM MQ servers.

    On the file system server create a shared directory shared, and a subdirectory to save the data for multi-instance queue managers, qmdata. For an example of setting up a shared directory for multi-instance queue managers on Linux, see Example in Create a multi-instance queue manager on Linux

  2. Check basic file system behavior. On one IBM MQ server, run the file system checker with no parameters. On IBM MQ server 1:
    amqmfsck /shared/qmdata
    
  3. Check concurrently writing to the same directory from both IBM MQ servers. On both IBM MQ servers, run the file system checker at the same time with the -c option.On IBM MQ server 1:
    amqmfsck -c /shared/qmdata
    
    On IBM MQ server 2:
    amqmfsck -c /shared/qmdata
    
  4. Check waiting for and releasing locks on both IBM MQ servers. On both IBM MQ servers run the file system checker at the same time with the -w option.On IBM MQ server 1:
    amqmfsck -w /shared/qmdata
    
    On IBM MQ server 2:
    amqmfsck -w /shared/qmdata
    
  5. Check for data integrity.
    1. Format the test file. Create a large file in the directory being tested. The file is formatted so that the subsequent phases can complete successfully. The file must be large enough that there is sufficient time to interrupt the second phase to simulate the failover. Try the default value of 262144 pages (1 GB). The program automatically reduces this default on slow file systems so that formatting completes in about 60 seconds On IBM MQ server 1:
      amqmfsck -f /shared/qmdata
      
      The server responds with the following messages:
      Formatting test file for data integrity test.
      
      
      Test file formatted with 262144 pages of data.
      
    2. Write data into the test file using the file system checker while causing a failure.

      Run the test program on two servers at the same time. Start the test program on the server which is going to experience the failure, then start the test program on the server that is going to survive the failure. Cause the failure we are investigating.

      The first test program stops with an error message. The second test program obtains the lock on the test file and writes data into the test file starting where the first test program left off. Let the second test program run to completion.

      IBM MQ server 1 IBM MQ server 2
      amqmfsck -a /shared/qmdata
      
       
      Please start this program on a second machine
      with the same parameters.
      
      
      File lock acquired.
      
      
      Start a second copy of this program
      with the same parameters on another server.
      
      
      
      Writing data into test file.
      
      
      
      To increase the effectiveness of the test,
      interrupt the writing by ending the process,
      temporarily breaking the network connection
      to the networked storage,
      rebooting the server or turning off the power.
      
      amqmfsck -a /shared/qmdata
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      
      Waiting for lock...
      
      Turn the power off here.
       
      File lock acquired.
      
      
      Reading test file
      
      
      Check the integrity of the data read.
      
      
      Appending data into the test file
      after data already found.
      
      
      The test file is full of data.
      It is ready to be inspected for data integrity.
      

      The timing of the test depends on the behavior of the file system. For example, it typically takes 30 - 90 seconds for a file system to release the file locks obtained by the first program following a power outage. If we have too little time to introduce the failure before the first test program has filled the file, use the -x option of amqmfsck to delete the test file. Try the test from the start with a larger test file.

    3. Verify the integrity of the data in the test file. On IBM MQ server 2:
      amqmfsck -i /shared/qmdata
      
      The server responds with the following messages:
      File lock acquired
      
      
      Reading test file checking the integrity of the data read.
      
      
      The data read was consistent.
      
      
      The tests on the directory completed successfully.
      

  6. Delete the test files. On IBM MQ server 2:
    amqmfsck -x /shared/qmdata
    
    Test files deleted.
    

    The server responds with the message:

    Test files deleted.
    


Results

The program returns an exit code of zero if the tests complete successfully, and non-zero otherwise.


Examples

The first set of three examples shows the command producing minimal output.

    Successful test of basic file locking on one server
    > amqmfsck /shared/qmdata
    The tests on the directory completed successfully.
    

    Failed test of basic file locking on one server
    > amqmfsck /shared/qmdata
    AMQ6245: Error Calling 'write()[2]' on file '/shared/qmdata/amqmfsck.lck' error '2'.
    

    Successful test of locking on two servers

    IBM MQ server 1 IBM MQ server 2
    > amqmfsck -w /shared/qmdata
    Please start this program on a second
    machine with the same parameters.
    Lock acquired.
    Press Return
    or terminate the program to release the lock.
    
     
     
    > amqmfsck -w /shared/qmdata
    Waiting for lock...
    
    [ Return pressed ]
    Lock released.
    
     
     
    Lock acquired.
    The tests on the directory completed successfully
    

The second set of three examples shows the same commands using verbose mode.

    Successful test of basic file locking on one server
    > amqmfsck -v /shared/qmdata
    System call: stat("/shared/qmdata")'
    System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fchmod(fd, 0666)
    System call: fstat(fd)
    System call: fcntl(fd, F_SETLK, F_WRLCK)
    System call: write(fd)
    System call: close(fd)
    System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fcntl(fd, F_SETLK, F_WRLCK)
    System call: close(fd)
    System call: fd1 = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fcntl(fd1, F_SETLK, F_RDLCK)
    System call: fd2 = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fcntl(fd2, F_SETLK, F_RDLCK)
    System call: close(fd2)
    System call: write(fd1)
    System call: close(fd1)
    The tests on the directory completed successfully.
    

    Failed test of basic file locking on one server
    > amqmfsck -v /shared/qmdata
    System call: stat("/shared/qmdata")
    System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fchmod(fd, 0666)
    System call: fstat(fd)
    System call: fcntl(fd, F_SETLK, F_WRLCK)
    System call: write(fd)
    System call: close(fd)
    System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fcntl(fd, F_SETLK, F_WRLCK)
    System call: close(fd)
    System call: fd = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fcntl(fd, F_SETLK, F_RDLCK)
    System call: fdSameFile = open("/shared/qmdata/amqmfsck.lck", O_RDWR, 0666)
    System call: fcntl(fdSameFile, F_SETLK, F_RDLCK)
    System call: close(fdSameFile)
    System call: write(fd)
    AMQxxxx: Error calling 'write()[2]' on file '/shared/qmdata/amqmfsck.lck', errno 2
    (Permission denied).
    

    Successful test of locking on two servers

    IBM MQ server 1 IBM MQ server 2
    > amqmfsck -wv /shared/qmdata
    Calling 'stat("/shared/qmdata")'
    Calling 'fd = open("/shared/qmdata/amqmfsck.lkw",
    O_EXCL | O_CREAT | O_RDWR, 0666)'
    Calling 'fchmod(fd, 0666)'
    Calling 'fstat(fd)'
    Please start this program on a second
    machine with the same parameters.
    Calling 'fcntl(fd, F_SETLK, F_WRLCK)'
    Lock acquired.
    Press Return
    or terminate the program to release the lock.
    
     
     
    > amqmfsck -wv /shared/qmdata
    Calling 'stat("/shared/qmdata")'
    Calling 'fd = open("/shared/qmdata/amqmfsck.lkw",
    O_EXCL | O_CREAT | O_RDWR,0666)'
    Calling 'fd = open("/shared/qmdata/amqmfsck.lkw,
    O_RDWR, 0666)'
    Calling 'fcntl(fd, F_SETLK, F_WRLCK)
    'Waiting for lock...
    
    [ Return pressed ]
    Calling 'close(fd)'
    Lock released.
    
     
     
    Calling 'fcntl(fd, F_SETLK, F_WRLCK)'
    Lock acquired.
    The tests on the directory completed successfully
    

Parent topic: Requirements for shared file systems on Multiplatforms


Related information

Last updated: 2020-10-04