Wednesday, July 20, 2011

Extreme XML processing with VTD-XML


Last week I had encounted a performance issue with the generaic DOM XML parser when processing the Camel message in Servicemix ESB. Generally, to evaluate the xpath in an XML content, the DOM XML parser gives a generic and easy integration with the XML document and the result.
But when I use this generic DOM XML parser with the huge xml files and with too many searching criteries, it had taken hours and hours to finished the process. It was a terrible bottleneck for the whole process. After some work arrounds and with the help of Pham Ngoc Hai, I used the VTD-XML to do the XML parsering and its gave a tremendous improvement in the performance.

The performance was really amazing. VTD-XML is, far and away, its the advanced and powerful XML processing model for SOA and cloud computing!
To learnn more about the VTD-XML, refer the XimpleWare's VTD-XML website. Following are the code snap to use VTD-XML instead of DOM-XML parser.
Last week I had en-counted a performance issue with the generic DOM XML parser when processing the Camel message in ServiceMix ESB. Generally, to evaluate the xpath in an XML content, the DOM XML parser gives a generic and easy integration with the XML document and the result. But when I use this generic DOM XML parser with the huge XMLl files and with too many searching criteria, it had taken hours and hours to finished the process. It was a terrible bottleneck for the whole process. After some work around and with the help of Pham Ngoc Hai, I used the VTD-XML to do the XML parsering and its gave a tremendous improvement in the performance. Thank you Pham Ngoc Hai, and VTD-XML.

The performance was really amazing. The VTD-XML is, far and away, its the advanced and powerful XML processing model for SOA and cloud computing! To learn more about the VTD-XML, refer the XimpleWare's VTD-XML website.

Following are the code snap to use VTD-XML instead of DOM-XML parser.

Import the following package from VTD-XML Library.
    import com.ximpleware.*;
    import com.ximpleware.xpath.*;
    import java.io.*;
Loading the XML sources files in to the VTDGen object and then generate the VTDNav object from VTDGen object.
   
        ArrayList vtdNavList = new ArrayList();
       
        File xmlDirectory = new File(xmlDir);
        File[] fileList = xmlDirectory.listFiles();
        if (fileList.length > 0) {
            for (File xmlFile : fileList) {
                VTDGen vtdGen = new VTDGen();
                vtdGen.parseFile(xmlFile.getAbsolutePath(), true);
                vtdNavList.add(vtdGen.getNav());
            }
        }
 

Now we can effectily do the XML searching in the VTDNav Object. You can use the namespace declaration in the AutoPilot object. Then defined the search-able xPath in the AutoPilot object. For this example, I searched for all the XML elements in the //i:ips/i:MailItem[@ItemId]. SO I defined the xPath as like following on the AutoPilot object. ap.selectXPath("//i:ips/i:MailItem[@ItemId='" + itemId + "']");

        for (VTDNav vn : vtdNavList) {
            AutoPilot ap = new AutoPilot(vn);
            ap.declareXPathNameSpace("i", "http://upu.int/ips");
            try {
                ap.selectXPath("//i:ips/i:MailItem[@ItemId='" + itemId + "']");
                    try {
                        while ((result = ap.evalXPath()) != -1) {
                            long t = vn.getElementFragment();
                            if (t != -1) {
                                System.out.println((vn.toString((int) t, (int) (t >> 32))));
                            }
                        }
                    } catch (XPathEvalException e) {
                        e.printStackTrace();
                    } catch (NavException e) {
                        e.printStackTrace();
                    }
                } catch (XPathParseException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
                ap.resetXPath();
            }       

System.out.println((vn.toString((int) t, (int) (t >> 32)))); Writting the finding to the console.

Further, just go through this benchmark report. It will help you to understand the performance advantage about the VTD-XML.

No comments:

Post a Comment