File store disk requirements

File store disk requirements

The reliability of our storage infrastructure affects the ability of WebSphere Application Server to maintain the integrity of our data.

Consult the documentation for our storage infrastructure for information on the level of reliability that it can be configured to provide. Examples of components which might be included in your storage infrastructure are: hard disk drives, RAID controllers, file systems, and network file system protocols.

Input/Output Reliability

The log and store files are written using Java APIs in such a way that requires the JVM to either set flags to indicate that all writes are synchronous or all previous writes must be forced to the disk after certain API calls. This requires that the occurrence of any failure will not result in data written in a synchronous write or before a force being lost, corrupted, or written out of order.

A number of layers are involved in honoring these write requests. Only the operating system, file system, and hard disk vendors can indicate whether the various configurations available will provide the level of reliability required. This is also true of other logging systems, such as databases.

In particular, if the storage device is not on the local machine where the messaging engine is running, and the files are residing on a network file system such as NFS, then NFS must be configured to ensure that these requirements are met. There is no test to ensure that write caching is not taking place.

File Location

If a messaging engine is in a cluster bus member, then it can run on different servers. This requires the file store files to be located on shared storage. The path to the file store files, as configured in the administration console, must be a path to the same files on each machine where the messaging engine can run. This can be achieved using NFS or some other advanced storage mechanism.

If the files in the specified path are not the same files, then when the messaging engine fails over from one server to another, it is effectively a different messaging engine with the same name. None of the persistent data is available to the new instance of the messaging engine.

File Locking

The log file is locked using java.nio.channels.FileLock.tryLock(). It is mandatory for the operating system and file system to honor this lock in all cases, and that the lock is released when requested explicitly or the Java process in which the messaging engine is running terminates unexpectedly.

Specifically, the lock if held, must prevent any other process from locking the file, even if that file is being accessed by a different machine. In addition, if the Java process running the messaging engine terminates unexpectedly, the lock must be released so that the other machine can access the file.

This requirement facilitates the case where a messaging engine is in a cluster bus member, and so can run on multiple different servers. If the HAManager directs a messaging engine to start because the first instance of the messaging engine failed, then it must be able to lock the log file if the other process has ended. Equally, if the other process has not ended (the so called split brain scenario), then the new instance of the messaging engine must not be able to acquire the lock, thus preventing the messaging engine from starting.

A file locking test tool can be used to ensure that a file system does provide the basic locking requirements. The IBM Support document contains more details about the file system locking protocol test tool.

IBM File System Locking Protocol Test for WAS