Sashwat Gupta's blog: XML

Showing posts with label XML. Show all posts

Tuesday, May 20, 2008

WS-BPEL 2.0

Derived from : http://docs.oasis-open.org/wsbpel/2.0/Primer/wsbpel-v2.0-Primer.pdf

Copyright © OASIS Open 2007. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

What’s new in WS-BPEL 2.0
As a result of the OASIS Technical Committee’s issues process, the original BPEL4WS 1.1 specification has received several updates. The following list summarizes the major changes that have been incorporated in the WS-BPEL 2.0 specification.

Data Access

Variables can now be declared using XML schema complex types

XPath expressions are simplified by using the ‘$’ notation for variable access, for example, $myMsgVar.part1/po:poLine[@lineNo=3]

Access to WSDL messages has been simplified by mapping directly mapping WSDL message parts to XML schema element/type variables

Several clarifications have been added to the description of the activity’s semantics

The keepSrcElementName option has been added to in order to support XSD substitution groups or choices

The ignoreMissingFromData has been added to automatically some of operation, when the from data is missing.

An extension operation has been added to the activity

A standardized XSLT 1.0 function has been added to XPath expressions

The ability to validate XML data has been added, both as an option of the activity and as a new activity

Variable initialization as part the of variable declaration has been added

Scope Model

New scope snapshot semantics have been defined

Fault handling during compensation has been clarified

The interaction between scope isolation and control links have been clarified

Enrichment of fault catching model

A activity has been added to fault handlers

The has been added to scopes

The exitOnStandardFault option has been added to processes and scopes

Message Operations

The join option has been added to correlation sets in order to allow multiple participants to rendezvous at the same process with a deterministic order

Partner link can now be declared local to a scope

The initializePartnerRole option has been added to specify whether an endpoint reference must be bound to a partner link during deployment

The messageExchange construct has been added to pair up concurrent and activities

Important differences between BPEL 1 and 2 as imported from wsbpel-v2.0-Primer.

Most of the features added seems to be lessons learn't, a good way to improve. It does not drastically change the specifications but seems to be more friendly for the developer. Many problems that I am facing with the current implementation with BPEL 1.1 would be removed using this specification when the product vendor decides to implement it.

Sadly one point was missing support for XPath 2.0 and XSLT 2.0. The FAQs [http://www.oasis-open.org/committees/download.php/23858/WS-BPEL-2.0-FAQ.html] clearly state 'WS-BPEL2.0 is based upon XPath 1.0 and XSLT 1.0'. I thought XSLT 2.0 and XPath 2.0 were huge advancements but not being supported I was very disappointed. I believe the product vendors would still support them along with XPath 1.0 and XSLT 1.0 as standard extensions.

Monday, April 21, 2008

Oracle SOA suite - Retrieving process information

The problem statement
To retrieve the information of a process from dehydration store even when I did not have the instance id using data present in the request of the process. (This is a part of a bigger problem I am trying to resolve)

I have an a unique field in the BPEL input. How can I find the BPEL instance and the related instances (This BPEL is executed which in turn executes several other BPEL processes ). I could not modify the existing BPEL so I was left with only one option that was to use the Oracle BPEL Process Manager Client API. This API has very less documentation, most of the help I got was from some blogs specially http://orasoa.blogspot.com/2007/06/calling-bpelesb-webservice-from.html and the API documentation http://download-uk.oracle.com/docs/cd/B31017_01/integrate.1013/b28986/toc.htm.

Driving to the beach
The Marc Kelderman blog solved one of the biggest problems I faced the 'jar hell'. I had tried several other posts but was not successful in setting up the project correctly, but the jar files as specified by him worked wonderfully.

Finally my classpath contained :-
connector15.jar, ejb.jar, oc4j-internal.jar, optic.jar, orabpel.jar, orabpel-ant.jar, orabpel-boot.jar, orabpel-common.jar, orabpel-exts.jar, orabpel-thirdparty.jar, oracle_http_client.jar, orawsdl.jar, xmlparserv2.jar. I am using SOA Suite version 10.1.3.3.

Now I setup a new java project in eclipse with these jars in my classpath. I then setup a server configuration file

Getting your feet wet

## server_config.properties
java.naming.factory.initial=com.evermind.server.rmi.RMIInitialContextFactory
java.naming.provider.url=opmn:ormi://someserver.com:6004:oc4j_soadqa/orabpel
java.naming.security.principal=myname
dedicated.connection=true
java.naming.security.credentials=mypwd

Load the properties:-

 prop = new Properties();
InputStream resourceAsStream = BPELManagerControl.class
.getClassLoader().getResourceAsStream("server_config.properties");
prop.load(resourceAsStream);

The next piece of code I wrote was to retrieve all the instances of a specific BPEL process,

  Locator locator = new Locator("domainname", prop);
WhereCondition whereProcessId = new WhereCondition("process_id = ?");
whereProcessId.setString(1, "myprocessname");
IInstanceHandle[] instanceHandles = locator
                .listInstances(whereProcessId);

Taking the dive

And using the instance handles I could display the states and instance id's of the processes. Now this was just the beginning of learning of how to use the API. I needed to modify it to suite my requirements. The previous piece of code returns all record in no particular order, this would make my task very difficult. By trial and error I realized that the API was using a view admin_list_ci to query and all its fields could be used in the query. Thus I added the following to order by creation_date (My instance would be one of the current instances in the server) desc. The next problem was if there was an error the list would continue processing infinitely. So I decided that my instance would be one of the last 50 instance executed on the server. This was a safe assumption since I would be searching immediately after submission. Thus my code became:-

   Locator locator = new Locator("domainname", prop);
 WhereCondition whereProcessId = new WhereCondition("process_id = ?");
 whereProcessId.setString(1, "myprocessname");
     whereProcessId.append("ORDER BY  CI_Creation_Date desc");
 IInstanceHandle[] instanceHandles = locator
     .listInstances(whereProcessId,0,50);

The next problem to resolve was ho do I find if the given IInstanceHandle handle was the instance I was searching for. I needed to search if my application specific id was present in the request of the instance.

The IInstanceHandle object had a getField method which seemed to suite my requirements (get the request variable and get the xml from it), but I realized it could only be used for a process that is not finished, thus had to drop the idea of using this method.
The only other way to get the data I could find was using the debug and audit xmls. In the BPELConsole along with the flow of the executed instance it also can display the Audit and Debug xmls. Corresponding methods were getAuditTrail and getDebugTrace that gave the dump of the whole BPEL instance data. I decided to use the getAuditTrail as the debug trace referred to the XML as an id (probably a refernce to some other table) which I could not find. Audit trail seemed to be working thus I decided to use it.

   String auditTrailXML = instanceHandle.getAuditTrail();
  String XPATH = "//event[@label=\"receiveInput\"]/details/text()";
  String receiveInput = Utils.xPathEvaluator(auditTrailXML, XPATH);
  XPathFactory factory = XPathFactory.newInstance();
  XPath xPath = factory.newXPath();

  NamespaceContext ctx = new NamespaceContext() {
      public String getNamespaceURI(String prefix) {
          String uri;
          if (prefix.equals("ns1"))
              uri = "http://www.sash.com/Schema/Declaration";
          else
              uri = null;
          return uri;
      }

      public Iterator getPrefixes(String val) {
          return null;
      }

      public String getPrefix(String uri) {
          return null;
      }
  };

  xPath.setNamespaceContext(ctx);
  String XPATH2 = "//ns1:appid/text()";
  XPathExpression xPathExpression = xPath.compile(XPATH2);
  String appid = xPathExpression.evaluate(new InputSource(
      new StringReader(receiveInput)));

And from the appid compare with the appid we had and keep on looping until the instance is found. Thus I was successful in retrieving the instance id.

To find all the related instances find the current instance and get it's handle. Then use the following where condition to retrieve related instances:-

    WhereCondition wpi = new WhereCondition("ROOT_ID=?");
    wpi.setString(1, instanceHandle.getRootId());

Conclusion

The whole code was mostly based on trial and error and basic API documentation.
The API can be very helpful but very difficult to use
The view admin_list_ci in the BPEL dehydration store can be used to construct the where clause in the query.
Setting up the project is not very simple. The jars have to be correct.
Do you want to use the API? Depends on your requirement. I turned to this API when I had no other option and yes I was satisfied as it resolved my problem.

Friday, March 28, 2008

XSLT / XPATH 2.0

In my current project I was trying to replace a java transformation service to an XSLT. I faced some speed breakers. Googling I came to references of XSLT 2.0 which solves the problems that I was encountering.We are using the Oracle SOA suite 10.3.x. the editor (JDeveloper) only supports XSLT 1.0, but the interesting part is that in text editor changing version of XSLT from 1.0 to 2.0 the parser supports the newer version also.

I just needed group by function like the database. In an array I had to do a grouping. XSLT 2.0 brings the new for-each-group construct. This missing feature from the older xslt version had been a major drawback.

Another simple requirement I had was a sum of products. Say I have multiple items and I need to calculate the total price.

 <items>
   <line_item>
       <price>20</price>
       <quantity>33</quantity>
   </line_item>
   <line_item>
       <price>10</price>
       <quantity>4</quantity>
   </line_item>
</items>

To calculate the product (20*33 + 10*4) we can calculate it by using XPath . Evalaution of XPath sum(for $a in (//line_item) return ($a/price * $a/quantity)) gives the result. We can use this in our XSLT to calculate the value. The new version of XSLT/XPath brings in features that were long waited.

Some other features include :

Output multiple documents from a single transformation

Type awareness

The resultant tree created by querying the doc, can be queried like any other element

custom functions

I hope these standards are adopted soon by everyone with better tool support.

Sunday, November 25, 2007

Non WS-I document literal web service

A document literal web service with more than one parts specified for the same message is not WS-I compliant.

e.g.

<wsdl:message name="HelloDudeSoapIn">
      <wsdl:part name="name" element="tns:name"/>
      <wsdl:part name="age" element="tns:age"/>
 </wsdl:message>

The SOAP message would look like:

<soap-env:envelope>
     <soap-env:body>
              <m:name>Baby</m:name>
              <m:age>1</m:age>
     </soap-env:body>
</soap-env:envelope>

Why the restriction, think about validation of the message. I would have to extract the name and age elements seperately and then validate them against an XSD. If the elements were within a single tag...

<soap-env:envelope>
     <soap-env:body>
             <m:person>
                    <m:name>Baby</m:name>
                    <m:age>1</m:age>
             </m:person>
     </soap-env:body>
 </soap-env:envelope>

The person element can be completely validated after the person element is extracted.

Wednesday, November 14, 2007

XPath injection

What is XPath?

XPath (XML Path Language) is an expression language for addressing portions of an XML document, or for computing values (strings, numbers, or boolean values) based on the content of an XML document.
For more information see XPath tutorial.

Understanding the attack

XPath injection is an attack where data is taken from the user without validation (or incomplete validation) and which modifies the behavior of the XPath expression by masquerading XPath as data.

Assume that we have user id and password stored in xml files and we use XPath for validating them. The xml containing the user id and password looks like:

<security-check>
<user>
<id>sash</id>
<password>sash123</password>
</user>
<user>
<id>abhinav</id>
<password>abhinav123</password>
</user>
</security-check>

To validate the user id and password against the xml we use the XPath expression
//user[id/text()='+ {input user id} +' and password/text()='+ {input password} +']
we execute the XPath and check if it returns any nodes, if it returns any nodes then the password is valid. If the entered used id is sash and the password is sash123 the XPath would become
//user[id/text()='sash' and password/text()='sash123']
and would return the user node and the password would be validated. If a wrong password is used no node would be returned and the validation would fail.

Now while injecting XPath in the password field ' or 'a' = 'a is entered. The XPath would become
//user[id/text()='sash' and password/text()='' or 'a' = 'a']
which would return multiple rows and the validation would pass.

Simulating the attack

Sample C# code

XmlDocument XmlDoc = new XmlDocument();
XmlDoc.Load("XPATH_INJECT.xml"); // use the same xml as above
XPathNavigator nav = XmlDoc.CreateNavigator();
XPathExpression expr = nav.Compile("//user[id/text()='"
+ textBox1.Text + "' and password/text()='" + textBox2.Text + "']");

XPathNodeIterator iterator = nav.Select(expr);
if (iterator.MoveNext())
{
result.Text = "passed";
}
else
{
result.Text = "failed";
}

Sample Java code

XPathFactory factory = XPathFactory.newInstance();
XPath xPath = factory.newXPath();
File xmlDocument = new File("XPATH_INJECT.xml");
InputSource inputSource = new InputSource(
new FileInputStream(xmlDocument));

String user = jTextField1.getText().trim();
String pwd = jTextField2.getText().trim();
XPathExpression expr = xPath.compile(
"//user[id/text()='" + user +
"' and password/text()='" + pwd +
"' ]");
Object result = expr.evaluate(inputSource, XPathConstants.NODESET);

NodeList nodes = (NodeList) result;

if(nodes.getLength()>0)
{
jLabel3.setText("Valid"); }
else
{
jLabel3.setText("Failed"); }
}

How to protect against the attack:

There are many ways of preventing this attack

Validate the input
Escape the ' or '' characters
This attack is similar to SQL injection, the most common solution to SQL injection attack is using a prepared statement, but something similar is not available in XPath. This though can be achieved using XQuery but XQuery is not directly supporeted without the use of external libraries in .Net or Java.

The best solution would be escaping the ['] characters, in our example if we replace a ['] with [']['] in the input, we would avoid the attack.
In the previous case our password text entered was ' or 'a' = 'a but this would be modified to '' or ''a'' = ''a
//user[id/text()='sash' and password/text()=''' or ''a'' = ''a']
and would not produce any results (and it is a valid XPath).

This XPath passes in Altova XML spy but not in Java 6 or .Net 2.0

So we still need to find an elegant solution to the problem!!!!

In all the proposed solutions solution (3) is the most elegant but the support is still very limited.

Whats left

Some of the databases now support XPath, in case your database supports XPath be very careful about the inputs (don't forget the validation).

References

http://www.ibm.com/developerworks/xml/library/x-xpathinjection.html
http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf

Sunday, November 26, 2006

XML hints

These are some things needed to know about xml.....

Fact 1:
Comments should not have '--' in between.. This is a very common knowledge but what is not so common knowledge is that '--->' is an invalid end of comment.
ref:- http://www.w3.org/TR/2004/REC-xml-20040204/#sec-comments

Fact 2:
Do you know about dtd entities?

<!ENTITY copyright SYSTEM "http://www.w3schools.com/dtd/entities.dtd">

You define a dtd entity and then it can be used in the dtd anywhere, its like a constant being declared.

It can be used in xml as:-
<author>&writer;©right;</author>
ref:- http://www.w3schools.com/dtd/dtd_entities.asp

Fact 3
Now whats the difference between a/b and a[b] in XPath?

a/b selects the node b which is a child of node a.
a[b] selects node a which has a child b.

The difference is very subtile but the node selected is different.

Fact 4
What does the translate function in XSLT do:-
translate("abcdef--","ab-","AB")

will result in:-
ABcdef

ref:- http://www.w3.org/TR/xpath#section-String-Functions

Fact 5

Stylesheet Inclusion
It is an error if a stylesheet directly or indirectly includes itself.

Including a stylesheet multiple times can cause errors because of duplicate definitions. Such multiple inclusions are less obvious when they are indirect. For example, if stylesheet B includes stylesheet A, stylesheet C includes stylesheet A, and stylesheet D includes both stylesheet B and stylesheet C, then A will be included indirectly by D twice. If all of B, C and D are used as independent stylesheets, then the error can be avoided by separating everything in B other than the inclusion of A into a separate stylesheet B' and changing B to contain just inclusions of B' and A, similarly for C, and then changing D to include A, B', C'.

This can be tested by having templates of the same name in all the stylesheets and play around with importing the style sheets. Thus u'll notice
1. the curent style sheet takes the precidence
2. if more than one stylesheets are included then the last one imported takes precidence.

ref:- http://www.w3.org/TR/xslt#import