Independent ASPs and high availability
Independent ASPs enable applications and data to be moved between servers. The flexibility of independent ASPs means they are the basis for some IBM® i high availability solutions. In considering whether to use an ASP or independent ASP for the queue manager journal, you should consider other high availability configuration based on independent ASPs.
Auxiliary storage pools (ASPs) are a building block of IBM i architecture. Disk units are grouped together to form a single ASP. By placing objects in different ASPs we can protect data in one ASP from being affected by disk failures in another ASP.
Every IBM i server has at least one basic ASP, known as the system ASP. It is designated as ASP1, and sometimes known as *SYSBAS. We can configure up to 31 additional basic user ASPs that are indistinguishable from the system ASP from the application's point of view, because they share the same namespace. By using multiple basic ASPs to distribute applications over many disks we can improve performance and reduce recovery time. Using multiple basic ASPs can also provide some degree of isolation against disk failure, but it does not improve reliability overall.
Independent ASPs are a special type of ASP. They are often called independent disk pools. Independent disk pools are key component of IBM i high availability. We can store data and applications that regard themselves as independent from the current system to which they are connected on independent disk storage units. We can configure switchable or non-switchable independent ASPs. From an availability perspective you are generally only concerned with switchable independent ASPs, which can be transferred automatically from server to server. As a result we can move the applications and data on the independent ASP from server to server.
Unlike basic user ASPs, independent ASPs do not share the same namespace as the system ASP. Applications that work with user ASPs require changes to work with an independent ASP. You need to verify your software, and third-party software we use, works in an independent ASP environment.
When the independent ASP is attached to a different server the namespace of the independent ASP has to be combined with the namespace of the system ASP. This process is called varying-on the independent ASP. We can vary-on an independent ASP without IPLing the server. Clustering support is required to transfer independent ASPs automatically from one server to another.
Building reliable solutions with independent ASPs
Journaling to an independent ASP, rather than journaling to an ASP and using journal replication, provides an alternative means to provide the standby queue manager with a copy of the local journal from the failed queue manager instance. To automatically transfer the independent ASP to another server you need to have installed and configured clustering support. There are a number of high-availability solutions for independent ASPs based on the cluster support, and low level disk mirroring, that we can combine with, or substitute for, using multi-instance queue managers.
The following list describes the components that are needed to build a reliable solution based on independent ASPs.
- Journaling
- Queue managers, and other applications, use local journals to write persistent data safely to disk to protect against loss of data in memory due to server failure. This is sometimes termed point-in-time consistency. It does not guarantee the consistency of multiple updates that take place over a period of time.
- Commitment control
- By using global transactions, we can coordinate updates to messages and databases so that the data written to the journal is consistent. It gives consistency over a period of time by using a two-phase commit protocol.
- Switched disk
Switched disks are managed by the device cluster resource group (CRG) in an HA cluster. CRG switches independent ASPs automatically to a new server in the case of an unplanned outage. CRGs are geographically limited to the extent of the local IO bus.
By configuring your local journal on a switchable independent ASP, we can transfer the journal to a different server, and resume processing messages. No changes to persistent messages made without syncpoint control, or committed with syncpoint control, are lost, unless the independent ASP fails.
If we use both journaling and commitment control on switchable independent ASPs, we can transfer database journals and queue manager journals to a different server and resume processing transactions with no loss of consistency or committed transactions.
- Cross-site mirroring (XSM)
- XSM mirrors the primary independent ASP to a geographically remote secondary independent ASP across a TCP/IP network, and transfers control automatically in case of a failure. You have a choice of configuring a synchronous or asynchronous mirror. Synchronous mirroring reduces the performance of the queue manager because data is mirrored before the write operations on the production system complete, but it does guarantee the secondary independent ASP is up to date. Whereas if we use asynchronous mirroring we cannot guarantee that the secondary independent ASP is up to date. Asynchronous mirroring does maintain the consistency of the secondary independent ASP. There are three XSM technologies.
- Geographic mirroring
- Geographic mirroring is an extension of clustering, enabling you to switch independent ASPs across a wide area. It has both synchronous and asynchronous modes. We can guarantee high availability only in synchronous mode, but the separation of independent ASPs might impact performance too much. We can combine geographic mirroring with switched disk to provide high availability locally and disaster recovery remotely.
- Metro mirroring
- Metro mirroring is a device level service that provides fast local synchronous mirroring over longer distances than the local bus. We can combine it with a multi-instance queue manager to give you high availability of the queue manager, and by having two copies of the independent ASP, high availability of the queue manager journal.
- Global mirroring
- Global mirroring is device level service that provides asynchronous mirroring, and is suitable for backing up and disaster recovery over longer distances, but is not an normal choice for high availability, because it only maintains point in time consistency rather than currency.
The key decision points you should consider are,
- ASP or independent ASP?
- You do not need to run a IBM i HA cluster to use multi-instance queue managers. You might choose independent ASPs, if you are already using independent ASPs, or we have availability requirements for other applications that require independent ASPs. It might be worth combining independent ASPs with multi-instance queue managers to replace queue manager monitoring as a means of detecting queue manager failure.
- Availability?
- What is the recovery time objective (RTO)? If you require the appearance of near uninterrupted behavior, then which solution has the quickest recovery time?
- Journal availability?
- How do you eliminate the journal as a single point of failure. You might adopt a hardware solution, using RAID 1 devices or better, or your might combine or use a software solution using replica journals or disk mirroring.
- Distance?
- How far apart are the active and standby queue manager instances. Can your users tolerate the performance degradation of replicating synchronously over distances greater than about 250 meters?
- Skills?
- There is work to be done to automate the administrative tasks involved in maintaining and exercising the solution regularly. The skills required to do the automation are different for the solutions based on ASPs and independent ASPs.