Saturday, November 1, 2008

Learning and implementing the Oracle Fault Management Framework

Why?
The process has some external dependencies, due to a network failure the partner system is down for a few days. The message sent to the partner can be sent later but recovery mechnism can be difficult and redundant in BPEL.

How was it done before?
Without the framework solutions that were possible were
1. Assume that the partner system link or the system would rarely be down, in such cases the process would have to be handled manually. Data fix, using the test case feature etc. This is called wishful thinking.
2. Retry then create a worklist task whenever there is error, requires programming effort and redundancy in process.
3. Rollback using compensation handler may be an option based on design of your process.

What is it in a nutshell?
Instead of handling faults in BPEL by adding catches handle faults use the framework to handle it for you. Both can also be used together. Retry of failed activity, Replay of failed activity scope, Human intervention and many other ways of handling the faults can be provided.

In detail it can be read fro references stated below.

What is the advantage?
1. Generic framework can be reused without coding effort.
2. Will provide resume, retry, continue and modify functionalities.
3. No BPEL change required

Lessons learnt and opinions
While implement the fault handling using this framework, found rather suprisingly that it was very easy to use. The only trouble I had was that in my patch of 10.1.3.3 the post installation steps had not been executed. Without these steps the framework does not catch the fault.

It was rather suprising that the framework can override the fault handling defined in the process. It was a bit difficult to digest but I could not think of any way else it could be designed.

The best resource I could find was http://www.it-eye.nl/weblog/2007/09/10/oracle-bpel-10133-fault-policy-management/
A very good resource to begin.
Also use Oracle documentation http://www.oracle.com/technology/products/ias/bpel/pdf/10133technotes.pdf is of great help.

Using these resources when I started testing my processes I wanted all my faults remote faults to be retried and then sent for human intervention.

But then I started facing issues when an synchronous process is invoked and the process waiting for human intervention the calling process is timed out. So I had to have a seperate policy for synchronous processes which only contain retries and another for asynchronous processes which contain retry and human intervention.

This reminded me of some Oracle document which had said prefer Async process over sync (I think it said it because of performance reasons).

Then I stumbled over an article http://orasoa.blogspot.com/2008/03/bpel-fault-policies-best-practise.html which validated my understandings.

If this works(in production) I'm pretty sure it will (It's working on my PC but cannot celebrate until it goes to production) I will be very pleased.

The framework has limited extensibility only java tasks can be used to extend the framework. It leaves very less scope for out of the box thinking.

Also, the problem I faced was that the activities tab in the BPEL Console needs to be provided to support personnel, none of the other tabs should be accessible. Could not resolve this issue :(. From the information I could gather was that this can only be done by tweaking the code of BPEL Console. ref: http://chintanblog.blogspot.com/2007/12/i-saw-numerous-people-asking-about-bpel_290.html

Special thanks:
To the Oracle team to come up with this feature and the blogs of consultants I have mentioned and all the people who answer questions on the Oracle forum.

3 comments:

Calvin said...

A very Nice article describing the various Faults we can face in the real-world scenarios, & more importantly, your "discoveries" regarding HOW to Manage them, esp. in case of Oracle SOA Suite.

Keep IT up (& running)...! :)

KarROX said...

I tried using one of the link. I would say that the link was what i was looking for.

eric said...

Since this post, has anyone gotten anything other than an "=" or "!=" to work on the test condition section of a fault policy?

For example, have you tried to use a "contains" like this?...
.. contains($fault.summary/summary, "a")

Are the options in the condition "test" tag documented anywhere? The Oracle SOA Suite: New Features tech document for 10.1.3.3 just says "XPath Expressions" can be used but I have not been able to get anything other than = or != to work.

Thanks.