Customer Portal

clover etl performance issue

Comments 5

  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hi, odedbobi,

    Can you post your graph and sample input data? It would be very helpful in solving your issue. As well as any information about CloverETL Designer (and Server if present) version, Java version and vendor, memory settings of Designer (Server), OS version, components used, number of input records, etc. Thank you in advance.

    Best regards,
  • Avatar
    odedbobi
    0
    Comment actions Permalink
    hi.
    here are some more info:
    1. OS - ed Hat Enterprise Linux Server release 5.7 (Tikanga) - 64bit
    2. java version - java version "1.4.2" gij (GNU libgcj) version 4.1.2 20080704 (Red Hat 4.1.2-51)
    3. clover version - how do i check it out ?

    i did make some progress in my investigation (and i must say that debugging the clover and understanding what it does is very difficult ):
    i have attached 2 graphs:
    1. Glr.grf - this is my full grf file where the run time is 19 minutes..
    2. partGrf.grf - this grf is the same as Glr.grf, with a small modification: to one of the filters i added 'and false' which causes one of the aggregation to not function. on this run i on the same data i get 7.5 minutes.
    3.qna-glr-1... : is a sample data file. of course that the 7.5 or 19 minutes run uses a much larger data input.

    the strange behavior i see is that it does not matter which aggregation i disable (by adding the 'false and...' term to an filter) i get the 7.5 minutes run. further more - on one of the aggregations i added a filter before and after the aggregation: the after filter was set to false. once the aggregation got data - slow run, once data was blocked in the before filter - 7.5 minutes run.
    so i figured that i have passed some kind of aggregations limit.
    i have tried work arounds, but nothing seems to help, so far.
    thanks.
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hello again,

    Thanks for the info. You can find out the version of CloverETL Designer by clicking on Help -> About CloverETL Designer.

    I have a few notes:
    1. We support only Java versions 6 and 7. Version 1.4 is too old and also possible source of slow processing.
    2. I have noticed that you have your transformations written in Java. This is very hard to support, especially without source code of the transformations. You may for example import something unappropriate which slows the whole graph run down.
    3. You can divide your graph into phases, e.g. one phase per a graph branch. This way you can find in graph run log which phase took the most of the processing time and it may help you with locating the source of your issue. Then you can send me some simplified graph just with the problematic part of the original graph.

    Best regards,
  • Avatar
    dpavlis
    0
    Comment actions Permalink
    A general performance related advice - DO NOT use GNU java - the "gij (GNU libgcj)" regardless of the version. It is just painfully slow. Use the Oracle/Sun JVM which is properly tuned. In the worst case use IBM Java which also has its issues, but is still better choice.
  • Avatar
    odedbobi
    0
    Comment actions Permalink
    thank you guys for the advice.
    but this graph is running on production environments for huge clients.
    changing java version or clover version or any other component is out of the question.
    plus you are forgetting that the performance issue that i has gotten worse once i have added some more aggregations and db writing.
    not i have broken the problem into the following amazing fact:
    my graph counts 19 time unique count for a parameter.
    once i remove one of them, no matter which one my performance is half of what it is when working with 19 unique counts.

    now, i suppose this has to do with memory or something like that.
    i cant remove any one of my unique counts, i can separate the graph into 2 different graphs.

    thanks.

Please sign in to leave a comment.