Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to use the javax.xml.xpath package to run XPath expressions on a document with multiple namespaces, and I'm having goofy performance problems.

My test document is pulled from a real, production example. It is about 600k of xml. The document is a fairly complex Atom feed.

I realize that what I'm doing with XPath could be done without. However, the same implementation on other, vastly inferior platforms performs absurdly better. Right now, rebuilding my system to not use XPath is beyond the scope of what I can do in the time that I have.

My test code is something like this:



void testXPathPerformance()
{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    Document doc = builder.parse(loadTestDocument());

    XPathFactory xpf = XPathFactory.newInstance();
    XPath xp = xpf.newXPath();

    NamespaceContext names = loadTestNamespaces();
    //there are 12 namespaces in names.  In this example code, I'm using
    //'samplens' instead of the actual namespaces that my application uses
    //for simplicity.  In my real code, the queries are different text, but
    //precisely the same complexity.

    xp.setNamespaceContext(names);

    NodeList nodes = (NodeList) xp.evaluate("/atom:feed/atom:entry",
                     doc.getDocumentElement(), XPathConstants.NODESET);


    for(int i=0;i<nodes.getLength();i++)
    {
        printTimestamp(1);
        xp.evaluate("atom:id/text()", nodes.item(i));
        printTimestamp(2);
        xp.evaluate("samplens:fieldA/text()", nodes.item(i));
        printTimestamp(3);
        xp.evaluate("atom:author/atom:uri/text()", nodes.item(i));
        printTimestamp(4);
        xp.evaluate("samplens:fieldA/samplens:fieldB/&at;attrC", nodes.item(i));
        printTimestamp(5);

        //etc.  My real example has 10 of these xp.evaluate lines

     }
}

When I run on a Nexus One, (not in the debugger, but with USB connected), the first time through the loop, each xp.evaluate takes somewhere from 10ms to 20ms. By the 15th time through the loop, each xp.evaluate takes somewhere from 200ms to 300ms. By the end of the loop (there are 150 items in nodes), it takes about 500ms-600ms for each xp.evaluate.

I've tried using xp.compile(). The compiles all take <5ms. I've done xp.reset() (makes no difference). I've done a new XPath object for each evaluate (adds about 4ms).

Memory usage does not appear to spiral out of control during execution.

I'm running this on a single thread in a JUnit test case that doesn't create an activity or anything.

I'm really puzzled.

Does anybody have any idea what else to try?

Thanks!

update

If I run the for loop backwards (for(int i=nodes.getLength()-1;i>=0;i--)), then the first few nodes take the 500ms-600ms, and the last ones go fast 10ms-20ms. So, this seems like it has nothing to do with the number of calls, but instead that expressions whose context is near the end of the document take longer than expressions whose context is near the beginning of the document.

Does anybody have any thoughts on what I can do about this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
339 views
Welcome To Ask or Share your Answers For Others

1 Answer

Try adding this code inside the loop at the top;

Node singleNode = nodes.item(i);
singleNode.getParentNode().removeChild(singleNode);

then run each evaluation using the singleNode variable instead of nodes.item(i); (of course you change the name)

Doing this detaches the node you are working with from the large main document. This will speed up the evaluate methods processing time by a huge amount.

EX:

for(int i=0;i<nodes.getLength();i++)
{
    Node singleNode = nodes.item(i);
    singleNode.getParentNode().removeChild(singleNode);

    printTimestamp(1);
    xp.evaluate("atom:id/text()", singleNode );
    printTimestamp(2);
    xp.evaluate("samplens:fieldA/text()", singleNode );
    printTimestamp(3);
    xp.evaluate("atom:author/atom:uri/text()", singleNode );
    printTimestamp(4);
    xp.evaluate("samplens:fieldA/samplens:fieldB/&at;attrC", singleNode );
    printTimestamp(5);

    //etc.  My real example has 10 of these xp.evaluate lines

 }

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...