Customer Portal

ExtHashJoin doesn't work with large recordsets

Comments 2

  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Craibuc,

    When you use ExtHashJoin, you should keep in mind that this joiner cache slave data in the memory. Due to this fact, using of this joiner should be avoided in case of large inputs on the slave port. It is also the reason why you experience this issue on large dataset only.

    I can see, you have a very few records incoming to ExtHashJoin through master port. You may resolve this issue by switching the ports with each other (so the master will be slave, and slave will be master). Then, you will have more records on master and less on slave port.

    In order to avoid the memory issues when you need to process larger datasets incoming from both master and slave ports, you may use ExtMergeJoin instead. However, comparing to ExtHashJoin, it does not sort input data, so you are supposed to sort it in advance. For more information about ExtMergeJoin, see http://doc.cloveretl.com/documentation/UserGuide/index.jsp?topic=/com.cloveretl.gui.docs/docs/extmergejoin.html.
  • Avatar
    craibuc
    0
    Comment actions Permalink
    That worked. The video tutorial didn't mention the caching bit; it makes, however. When I was testing with a CSV representation of the table (21MB), I would get Java heap exceptions--that should have given me a clue.

    Thanks.

    PS. I don't see the images that I posted.

Please sign in to leave a comment.