Mar 11, 2009

Error Handling in Websphere Message Broker V6/V6.1

A major issue in the WMB message flows is the error handling.
I saw many production message flows which loose messages meaning loose business information.

WMB platform give the developer freedom and variety of the error handling but it is very important to understand how it works.

So let's begin.

You can divide the broker elements in two categories under transactional objects and non transactional. On the transactional objects the engine makes checkpoints before entering each node. Thus if exception thrown in the entered node the object that you will get will be without any changes made in the node. If you need the changed object details you need to catch it in the node (java/esql try catch code). The non transactional elements save their state and the engine don't do any rollback on them.

Transactional Objects (those who have the Input/Output prefix):

  • Body - Root
  • LocalEnvironment

Non transactional :

  • Environment
  • ExceptionList

Node's Failure Terminal

Each node has a failure terminal. Message will route to this terminal only if there was an exception in the node and the failure terminal is connected otherwise the exception will be propagated to the previous up the flow. If an error occurred downstream it won't be routed to the node's failure terminal. I don't recommend using this pattern except maybe when calling sync service like.

When developing sub flows remember to connect all the terminal of the sub flow, because during the deployment of the flow the compiler will neglect the unconnected terminals, thus if failure occurred in the sub flow and you didn't connect it the failure won't be propagated to the main flow and thus you will lose track of the error.

Try Catch pattern and Catch terminal

My suggestion is to use this pattern; you get it implicit on the input nodes, by choosing the transaction mode.
Don't forget to throw exception at the end of the catch handling if you working under transaction otherwise no rollback will be executed because the whole transaction handled successfully.
If you developing synchronized service construct indicative reply message for the caller.
Very important if you putting dump messages for logging by the MQOutput set it to non transaction mode, if not your dump message will also be roll backed by the engine and you won't see your message in the queue.
Put your tryCatch node on common business parts, divide your flows wisely.

Trace Nodes

Nice function giving you the ability to dump your trace to log files, trace or custom files. You can dump whatever you want, ${Root}, and use the ESQL function like CURRENT_DATE.
The major improvement in 6.1 version is that now you don't need to delete the trace nodes after the development you can just disable them by this command :

mqsichangetrace –n [on off]

MQInput Node error handling flow when working under transaction (otherwise the message will be discarded) and the error occurred beyond the MQInput node.

  1. If catch terminal connected the exception will be propagated there. Remember to throw custom exception for the Roll back process in which all action against external sources like DB or MQ queues will be back-out.
    It is also a good place for compensation process if needed.

    The message will be rolled back to input queue and the back out count will be raised by one.

  2. If failure terminal connected the and the back-out count of the message equals treshold property the message will routed there if not it will be routed to the Back-out queue (property on the input queue) and if it not set then it tries to put it on the Qmgr back out queue.
    if no back out queue exist it will try to put the message on the Qmgr DLQ.
    If error occurred beyond the failure terminal then the engine will try to resend the message twice the treshold number ( new in version 6.1) and then try to put it on the Qmgr Back out Queue and if not then on the Qmgr DLQ.
    If no success yet then the message will loop infinitely thus become a poison message and manual interfering is needed.


gives you set of properties by which you can handle back out messages. You can set on the queue the back out number and the back out queue name. the MQInput node will route the message implicitly if the treshold is reached.


Edmond said...

Hi there

I have a few question regarding the blog. Maybe you can expand on these:

Node's failure Terminal
what does 'otherwise the exception will be propagated to the previous u the flow', sound as if exception is propagaed backwards, is this possible?

Alex Linder said...

Hi Edmond,

I will try to explain in more details.
you can see the messageflow as one big scope of code.
each node can defined each catch scope ( failure terminal) and if the exception occurs in the node it will be propagated there.
if the exception occurs on the node that not connected the failure terminal ( no catch scope) the exception is routed backward (upward) on the flow to the first node that catch terminal is connected or the first catch node.
similarly to the code scopes, try and catch.

hope it helps.

Vimax said...

I just added this blog as a favorite.
because it is very interesting and provide insight to readers ..

shaan said...

Hi there

Could you please explain about Transaction Mode in MQInput and MQOutput node with simple example? Thanks in Advance

Alex Linder said...

Hi Shaan,
transaction is a set of actions that completes a unit of work.

regarding MQInput, if you won't work under transaction and exception will be thrown than the message of that transaction will be discarded. It is very important to work under transaction if the message is critical, also keep in mind to work with persists mode on.
But if you use custom logging for example and the flow writes log messages to a table or a queue, it is crucial that the records will be committed even if exception occurred . to succeed in that the loging process need to be out of the instance transaction, atomic action, ether wise it will be roll backed and won't leave any trace.

I suggest to use broker event mechanism for logging process.


Anonymous said...

There is a scenario which i face, i havent set backout threshold and backout queue, and a message failed and struck in flow input queue with back out count as 1?

(Catch and Failure terminal are connected).

why message strucks in flow input queue itself?

Alex Linder said...

I don't familiar with the constraints in your scenario but I suggest as best practice to always set backout count and backout queue to avoid unexpected infinite loops.

In the scenario you described it is seems like a bug. If the catch terminal connected to the MQ Input Node and their no exceptions during the handling on catch than the message should have be deleted from the queue.
Which version (including CSD) of WMB you using? which platform?