Dealing with CICS/MQ trigger interface quirks
Despite its simplicity, the CICS/MQ trigger interface has its share of quirks, such as messages left in queues and uneven workload distribution. An expert shares fixes for common CICS/MQ message triggering problems in this tip.
The CICS/MQ trigger interface provides a simple way to drive online messaging without customized code. However, its simplicity also has a couple of annoying quirks in some situations. The conditions described in this tip regarding message triggering apply to the interface supplied with IBM's CICS Transaction Server 4.1 as well as the one previously shipped with WebSphere MQ.
Message trigger interface review
The CICS/MQ trigger interface begins with the transaction CKTI, which listens on an initiation queue (INITQ). When a CICS-bound message arrives, MQ sends a message through the INITQ to wake up CKTI. Then, CKTI reads the trigger message, starts the target transaction and commits the unit of work, which deletes the trigger message from the INITQ. Finally, CKTI attempts to get the next message from the INITQ. If no message is available, it goes to sleep.
Application transactions can be triggered in one of two ways. "Trigger on first" creates a trigger when an application message arrives on an empty queue. "Trigger on every" sends a trigger every time a message arrives on the queue.
CICS/MQ triggering gotcha: The "left behind" message
In this scenario, a trigger message wakes up CKTI. CKTI kicks off the target transaction and then syncpoints, thus deleting the trigger. The target transaction starts, reads its intended message and then either ABENDs or issues a ROLLBACK command. If the application queue is defined as recoverable, MQ requeues the application message.
At this point, the original application message is on the queue but its accompanying trigger message is gone. In this condition, the message will languish on the queue until something generates another trigger. In fact, this condition will persist as CKTI consumes each newly arriving message's trigger to process the previous message until the queue drains to zero.
This problem can be difficult to diagnose because most monitors don't directly measure time spent waiting on a queue. In addition, while clients report long response times, the CICS monitors show no problems. One clue may be a non-zero queue depth, although that might be the application's usual behavior. For full diagnosis, tech support might have to find some sort of handle -- possibly data within the messages themselves -- to track a request from the client to CICS and back.
The gravity of this situation depends on the application's volume and importance. If the application is not time-dependent, waiting five minutes for the next message to arrive isn't harmful. However, if the queued message has someone waiting for the answer, the wait can be deadly.
Perhaps the best way to get around the "left behind problem" is to modify the application to read more than one message at a time to get past the triggerless messages. However, finding the right number of messages to read at one time may be tricky.
The workload balancing myth
MQ supports shared queues by storing messages in the Sysplex Coupling Facility (CF) where any application can read them. INITQ's may be shared as well, which puts trigger message into the CF. MQ then wakes up every CKTI listener to create a race for which one gets the trigger. The CKTI that gets the cheese completes the process by starting the target transaction and consuming the trigger message. The rest of the CKTI tasks go back to sleep.
Note that there's no inherent mechanism for workload balancing in this scheme. However, when sharing INITQs between cloned CICS instances running at the same priority on similar processors, there's a reasonable expectation that work will be smoothly distributed.
In my experience, this seems to be true for time intervals of five minutes or greater. At shorter intervals, however, sometimes the listener that wins the race for the trigger message will be successful in reading the next handful as well. Thus, CICS experiences brief bursts of time where it may get dozens, if not hundreds, of messages off of the queue while other regions get nothing.
Again, the seriousness of this depends on the application. A low-volume application that is not time-dependent can afford to have some controls, such as transaction classes, to buffer any sudden onslaughts. More timely applications may have to seek other forms of remediation.
IBM will tell you it's purely a race condition without any sort of intelligence controlling the workload distribution. Everybody loves a winner and MQ, apparently, is no exception. If transaction classes or other CICS throttling mechanisms don't work, the best, albeit least elegant, solution is transaction routing, where CKTI runs in a routing region that can spray the target transactions across a group of application-owning regions, or AORs.
Conclusion
The problems outlined above are inherent in the MQ interface and must be dealt with as such. The best defense is a well-designed application along with a few recommendations. First, as IBM sugggests, most queues should be defined as "trigger on first." Second, the application should read more than one message at a time, lest something get stuck. Finally, an installation should only use recoverable queue when truly necessary.
ABOUT THE AUTHOR: For 24 years, Robert Crawford has worked off and on as a CICS systems programmer. He is experienced in debugging and tuning applications and has written in COBOL, Assembler and C++ using VSAM, DLI and DB2.
What did you think of this feature? Write to SearchDataCenter.com's Matt Stansberry about your data center concerns at [email protected].