(ZOS) Optimized local adapters performance considerations
When using the WebSphere Application Server for z/OS optimized local adapters APIs, there are several areas to be considered regarding performance.
The optimized local adapter APIs are designed to provide optimal performance for calling between an external address space and applications on WAS for z/OS, and are expected to establish new kinds of application patterns which supports fine-grained interactions between applications in these environments. The following information describes issues to be aware of regarding optimized local adapters and performance. This content is designed to help we understand the configuration options for using the optimized local adapters in order to achieve the best performance. Benchmark results comparing optimized local adapters to other technologies for synchronous calling between WAS and external address spaces on the same system, like SOAP over HTTP, are not documented here. For this information, read the WAS for z/OS Performance Report.
Selection of Connection Minimum and Maximum Connection values for Register API call
Select a value that is too high for the minimum number of connections parameter on the Register API call is not recommended and can degrade performance. This results in a call from the external address space to the WAS control region to establish each connection and add it to the optimized local adapters connection pool during the Register API call. When these connections are established with a server, the connections remain until an Unregister API call is received, at which time WOLA disconnects the connections from the server and removes them from the available connection pool. Setting a minimum value too high results in more memory consumed and adds path length cost to the Register and Unregister APIs. Select a minimum connections value that is best for our need. If the expectation is that there is a potential for hundreds of simultaneous threads sharing a registration, then it might make sense to pay for the cost of the connections during registration. If the expected number of concurrent threads sharing a registration is lower, then IBM recommends that the minimum connections value is set lower.
The maximum connections Register API parameter provides a boundary for the number of connections in the optimized local adapters connection pool for a registration. This is not extendable for the life of the registration. Once the number of concurrent Connection Get requests exceeds this value, the calling thread waits the specified number of seconds set on the Wait Time parameter for Connection Get, Invoke, Receive Request Any, or Host Service APIs for a connection to become available. Once this time expires, a return and reason code are passed back indicating a connection handle cannot be acquired for the request before the wait time expired. There is a maximum size for the connection pool for any single registration which can be set. This value is derived by the cell-wide environment variable, WAS_DAEMON_ONLY_adapter_max_conn. The shipped default value for this variable is 100. The value can be changed using the administrative console. We must restart of the daemon after the setting is changed.
Effect of Connection Minimum and Maximum settings on the optimized local adapters CICS Link server
When starting a Link server task under Customer Information Control System (CICS ) (using BBOC START_SRVR), if the minimum connections (MNC) and maximum connections (MXC) parameters are not passed, the MNC register setting defaults to 1 and the MXC defaults to 10. This means that the number of Link invocation tasks (BBO# tasks) that can be started and run concurrently by the starting Link server (BBO$) task is 10. This translates to the number of concurrent threads from WAS that can run and start CICS target programs. The setting for MXC Link server parameter must reflect the expected duration of the typical target CICS programs that are expected to be started under this instance of the Link server. If the target CICS programs are mostly long-running, then a larger MXC value is appropriate to keep requests flowing efficiently from WAS into CICS. If the target programs are short-lived, then a lower MXC setting is more efficient.
Determining the correct MXC setting should also take into account how many WAS servant regions are communicating with a Link server under a specific registration name. If there is a single servant region and it is running with the threading option set to ISOLATE then only a single thread can be sending to CICS in that servant at once, so the MXC value would be set to the number of servants times 1 to ensure that there are no bottlenecks. If running with threading set to LONGWAIT, where up to 40 threads can be active per servant, depending on the expected number of requests and types of CICS programs being called, long-running or short-lived, the MXC should be set according to the expected number of concurrent requests across the servants to the Link server running for a specific registration name. It might take some experimentation to determine the optimum MXC setting. Start with a lower number and raise it gradually and settle where the throughput is determined to be best.
Shared 64-bit memory
The optimized local adapters support requires that the WAS for z/OS server runs in 64-bit mode. When the daemon group starts the first time with the WAS_DAEMON_ONLY_enable_adapter value set to true or 1, WAS allocates a shared memory buffer in 64-bit above the bar storage and initializes it. The default size of this area is 32MB. This is where the optimized local adapters shared control structures all reside. It is not where message data is cached. Message and context data is flowed between external address spaces and WAS servant regions using WAS for z/OS local communications inter-address space technology, which stages message data in the server address space. For large messages, this is in 64-bit above the bar storage. Currently, WAS local communication supports a maximum message size of 2GB and in the initial optimized local adapters support, this is the largest supported size for single message.
If a misbehaved application continues to call the Register API and loop without calling Unregister and without terminating (where these are automatically cleaned up), it can overflow the optimized local adapters shared memory buffer for the daemon group. If this occurs, API calls are returned with out of memory reason codes. To diagnose this situation, issue the following command on one of the WAS servers in the effected daemon group:
F <server>,DISPLAY,ADAPTER,REGISTRATIONSFrom the display output we should be able to determine what job is consuming and not releasing the registrations and rectify the problem by restarting it.If, after analysis, the default 32MB is determined to be too small to meet the needs of the daemon group, this value can be changed using the WAS_DAEMON_ONLY_adapter_max_shrmem cell-wide environment variable. Changing this value should only be done after careful consideration. It requires a recreate of the optimized local adapters shared above the bar memory buffer, which can only be done with an IPL of the system.
We can get a rough estimate of the amount of memory needed. Each client registration consumes 392 bytes of shared memory, plus 112 bytes of shared memory for each connection. A registration with a maximum of 100 connections consumes about 12 KB of shared memory. Each client thread, which must wait for a connection to become available, (all connections are in use) consumes an additional 80 bytes. Each service being hosted by the registration consumes an additional 336 bytes.
For example, suppose we have 200 registrations in your daemon group. Each registration contains 200 connections, and will have a maximum of 1000 threads waiting for a connection at any time. The total memory consumed by this configuration is about 20 MB. This leaves enough shared memory to host about 38,000 services concurrently or 190 concurrent services per registration.
200 Registrations x 392 bytes=78,400 bytes 200 Registrations x 200 connections x 112 bytes=4,480,000 bytes 200 Registrations x 1000 waiters x 80 bytes= + 16,000,000 bytes -------------------------------- 20,558,400 bytes 33,554,432 bytes - 20,558,400 bytes=12,996,032 bytes remaining / 336 bytes per service ---------------------------------- 38,678 ServicesWhen increasing or decreasing the size of the shared memory, remember that above-the-bar shared memory is allocated in 1MB sections. WAS rounds the value we specify upwards to the nearest MB.
Controlling the maximum number of concurrent outbound calls from WAS
There is a daemon-wide default setting that controls the maximum number of concurrent outbound from WAS calls for a single registration supported with optimized local adapters. The variable for controlling this is WAS_DAEMON_ONLY_adapter_max_serv. The default is 100. This means that there can be no more than 100 different target services running under a single registration (concurrent Host Service, Receive Request Any, or Receive Request Specific API calls). If this value is changed, a restart of the daemon is required.
With the default value of 100, attempts to set up a thread as a 101st server for a specific registration name using one of the three APIs for this purpose results in a non-zero return and reason code indicating the adapter_max_serv was reached. If an application in WAS looks for this service and it is unavailable immediately, the application waits for a default value of 30 seconds before receiving an exception indicating a timeout occurred waiting for the requested service. In theWAS servant log this appears as a C9C24C15 minor code. The default 30 seconds for this timeout can be modified by the application using the setConnectionWaitTimeout() method on the Java EE Connector Architecture (JCA) ConnectionSpecImpl.
Optimized local adapters CICS Link server performance considerations
The optimized local adapters support for the CICS Link server can be used to provide a simple means for invoking existing CICS application programs from applications running on WAS for z/OS. When we start the Link server using the BBOC transaction, or the CICS PLTPI program BBOACPL2, the optimized local adapters Link server task (BBO$) starts and receives program link requests from WAS. The Program Link task (BBO#) is then initiated, which in turn issues an EXEC CICS LINK to the target program, receives the response, and sends it back to the WAS caller. Part of this support involves propagating and asserting the WAS application thread-level identity onto the target CICS task. The propagation and assertion of the identity is requested using the SEC=Y parameter on the BBOC START_SRVR command.
Running a Link server with SEC=N and using the identity of the user ID that initiated the Link server yields the better performance, but might not be aligned with the security and auditing requirements of our organization.
If it is determined that the Link server can run with SEC=N, the best performance is achieved by also running with the REU=Y BBOC START_SRVR parameter. REU=Y results in the Link server reusing the program Link invocation tasks (BBO# transactions) between program invocation requests.
If we run the Link server in this configuration, the support in the optimized local adapters JCA for passing a separate LINK transaction ID for individual requests is disabled and a request for this results in a ResourceException thrown back to the application. Also, if we attempt to select REU=Y and SEC=Y, the reuse option is forced to No as the Link server must start a new Program Link task for each request with the identity that was propagated asserted.
Running with the REU=Y option means that the Program Link tasks (BBO#s) remain active, once started, until a BBOC STOP_SRVR or BBOC UNREGISTER is entered for the registration. If we are running with a high MXC value on BBOC START_SRVR and a large number of requests arrive concurrently, the number of BBO# tasks can get high and these do not terminate until the Link server is stopped. This is another issue to consider when determining whether to use REU=Y and what is an appropriate MXC value.
If the goal is to achieve the fastest performance for calling into an application under CICS from WAS, consider coding the Host Service (BBOA1SRV), Receive Request Any (BBOA1RCA), or Receive Request Specific (BBOA1RCS) APIs directly in the application program. With this, there is no built-in support for identity propagation as the Link server provides, but if this is not required and performance is a high enough priority, then direct use of the APIs might be the best option.
JCA considerations
When using the optimized local adapters JCA resource adapter, keep in mind that there is additional overhead with each connection that is obtained from the ConnectionFactory object. If the application must make several calls to an external address space or CICS in the same application method, using the same connection for each interaction performs better than obtaining a different connection for each interaction. In addition, a JCA interaction object can be used repeatedly within the same application method.
When creating the JCA ConnectionFactory for optimized local adapters, it is possible to modify the minimum and maximum size of the JCA connection pool for that ConnectionFactory. This connection pool represents logical connections, which are bound to physical connections (those specified on your BBOA1REG register call) during an interaction. For optimal performance, the size of the JCA connection pool should be the same size of the physical connection pool set during BBOA1REG. If our JCA connection pool is set too small, the application might have to wait for a JCA connection object, even though there are physical connections available. Your physical connection pool is shared by all servant regions, so if we have multiple servant regions, we might want to decrease the size of the JCA connection pool for each servant region to keep the total number of JCA connections across all servant regions in line with the size of the physical connection pool.