Customer Portal

Optimizing for ExtSort with very large file

Comments 3

  • Avatar
    Pedro Vazquez Rosario
    0
    Comment actions Permalink
    Hi anyeone,

    That is quite a strange issue, in most cases, the ExtSort should be able to handle that amount of records without causing CG out memory. I would recommend at the moment to put the ExtSort in a different phase, which should force Disk Swapping does allow some of the memory to be released. Could you also answer the following questions:

    • CloverETL Designer and Server version

    • Your current memory settings

    • If possible please attached your graph (remove any sensitive data)
  • Avatar
    anyeone
    0
    Comment actions Permalink
    The graphs involved are extremely complicated, and the ExtSort occurs many places in the process so it will be kind of hard to send it in a usable way for you.

    We upped the JVM heap to 8GB and that seems to have resolved the issue at least for now, but it seems to me that there should be a way to optimize it, and if we ever get even larger data sets we could end up having a ridiculously sized heap requirement.

    I will try putting the sorts in their own different phase and see if that helps as well. I did notice sometimes the process is going in parallel with a bunch of other tasks so that could be affecting it.
  • Avatar
    admin
    0
    Comment actions Permalink
    Hi Anye,

    Yes, indeed, with increasing size of files being processed there should also be more resources available for the JVM. Therefore increasing the JVM heap memory assigned to the process is a right decision. Just please be aware that you should not use the whole capacity of the physical RAM (especially if there is not just CloverETL running on that physical machine). The general recommendation is to assign half of the physical RAM to the CloverETL heap memory, but it depends on many circumstances (direct memory enabled/disabled and so on).

    I would also like to point out again, that using Phases in the design might help distribute the resources more efficiently, as Pedro suggested in his update.

    Nevertheless, if you want to learn more about ExtSort component and possibly other sorting options in CloverETL, you can read the following blog posts:

    https://blog.cloveretl.com/sorting-data-extsort-vs-fastsort
    https://blog.cloveretl.com/sorting-data-extsort-vs-fastsort-part-2

    Note that these articles are from 2010 and might not be up to date in all details, but the principle hasn't changed.

    Please let me know if this is what you have been looking for or if any follow-up question arises.

    Best Regards, Eva

Please sign in to leave a comment.