Customer Portal

Non sequential processing of large xml files

Comments 2

  • Avatar
    admin
    0
    Comment actions Permalink
    Dear evn,

    thank you for questions:

    1) We have few components for XML reading. For your purpose I would recommend XMLExtract.

    XMLExtract
    * http://doc.cloveretl.com/documentation/ ... tract.html
    * reads data in SAX/stream way
    * memory efficient, can read XMLs of few GB size without problems
    * have nice mapping designer UI

    XMLReader
    * http://doc.cloveretl.com/documentation/ ... eader.html
    * reads data into memory (DOM) and allows you to extract data using XPath selectors
    * it can't be used for large files because of tremendous DOM memory usage

    XMLXPathReader
    * uses DOM+XPath
    * deprecated, replaced by XMLReader and XMLExtract

    2) For SHA calculation I would recommend to utilize external executable (http://linux.die.net/man/1/sha1sum) via http://doc.cloveretl.com/documentation/ ... cript.html or http://doc.cloveretl.com/documentation/ ... ecute.html instead of writing own java code.

    3)

    a] I am not sure whether I understand here. Every received message is sent immediately over port to next component. If next component is in same phase as JMSReader then processing start also immediately. If you would like to stop graph after receiving e.g. 10 messages then you can use "Max msg count" property(http://doc.cloveretl.com/documentation/ ... eader.html).

    b] I am afraid there is no support for ACK mode instead of AUTO ACK. Maybe you can limit risk of failure by writing response into file and process in second step.

    c] SSL should work, configuration depends on your JMS vendor. For example http://activemq.apache.org/how-do-i-use-ssl.html Just use "ssl://" prefix inside "URL" field of JMS Connection wizard (http://doc.cloveretl.com/documentation/ ... izard.html) and configure trust/keystores.

    d] I am going to answer this part later - I need to check for details.

    I hope you will find my answers useful.
  • Avatar
    admin
    0
    Comment actions Permalink
    Dear evn,

    to the 3-d question: Internally we use receive(long) method which indeed indicates POLL approach.

    Regarding possible race conditions:
    * Possible problems are discussed here. CloverETL should be free of them. Each JMSReader uses own JMS session - therefore thread-based race condition should not appear (caused by using same session by more threads).
    * JMS queue itself was designed for concurrent access by multiple consumers having own session, only one consumer should obtain single message. Some information may be found here or here and here.

    Please let me know if you need more information.

Please sign in to leave a comment.