Customer Portal

Large Files read to create json is slow

Comments 7

  • Avatar
    imriskal
    0
    Comment actions Permalink
    Dear Prashant,

    For such a large data sets, I would definitely recommend increasing of memory limit. In Designer, you can do that via Run -> Run Configurations... -> Java memory size. Please set it as high as possible, according to your hardware. In your case, I would recommend at least 4GB or 8GB if possible.

    Beside that, you can try to experiment with properties of JSONWriter, with Cache size and Cache in Memory to be more specific. Try various values and how the values affect the overall performance. See the documentation page for more details about the properties:

    http://doc.cloveretl.com/documentation/ ... riter.html

    If nothing from above helps please send me some run logs and the graph itself.

    Kind regards,
  • Avatar
    prashantkk
    0
    Comment actions Permalink
    I tried following.. with 64GB ram on my machine.. I increase JVM -Xmx size to 16384m and set cache in memory to true. Also cache size to 8GB.

    Ran into following problems:
    1. The json output is incomplete.

    2. The cache size on json object gave error - cant set cache size more than jvm size

    3. the performance is still the same.

    where should i setup the jvm size? Run Configuration -> Main or Run Configuration -> Aruguments
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Please see my notes:

    1. Could you please describe the incomplete output? Some records are missing? How many? Is the output syntactically valid?
    2. Cache size has no meaning if Cache in memory is set to true, please see more details is the documentation link I provided before.
    3. Could you please send your graph and graph run logs?

    Memory settings for the graph is set in Main tab, see the screenshot.
    memory.png
    Kind regards,
  • Avatar
    prashantkk
    0
    Comment actions Permalink
    Thanks for your reply. I replaced the universaldatareader component with parallel reader and the performance improved 5 times :)

    Please find answers to your questions INLINE

    1. Could you please describe the incomplete output? Some records are missing? How many? Is the output syntactically valid?
    >> the json structure is incomplete. syntactically invalid (missing the braces and elements, array etc..). Also, noticed it is not consistent behavior. if i reduce the number of input records, than the json are ok, otherwise :(

    2. Cache size has no meaning if Cache in memory is set to true, please see more details is the documentation link I provided before.

    3. Could you please send your graph and graph run logs?
    >> attached please find the graph.

    Memory settings for the graph is set in Main tab, see the screenshot.
    >> yes i have provided 32GB to the attached graph
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hello Prashant,

    If the incomplete output is produced by a graph run which has ended with error then the situation is understandable, writing has ended in the middle of the process as well. But if the incomplete output is produced by a successful graph run then it is of course an error and we should search for the reason. So which situation is it, please?

    Another possible reason could be not enough space in the output location. Do you think it is possible?

    Regarding the memory setting, JSONWriter needs also some free space for storage of it's internal structures so it is possible that, in the worst case, even 32GB may not be enough with 22GB of input data. In this case, please turn off the Memory in Cache again and make sure you have enough space on HDD where CloverETL stores temp files.

    And the last thing, please try to replace FastSort components with ExtSort. FastSort may be extremely memory (and CPUs and HDD performance) demanding sometimes which could lead to slow performance with not enough free memory. And with such a large data input as yours, this situation is quite possible. Could you please send me some of your graph run logs so we could see where the bottleneck is?

    Kind regards,
  • Avatar
    prashantkk
    0
    Comment actions Permalink
    I will try adding more RAM - 64GB. I was trying to pass zip or gzip input file instead of the large csv file. But, clover throws exception like - ' not a valid stream' Any ideas?

    i will provide you the graph with more details tomorrow.

    thanks
    prashant
  • Avatar
    admin
    0
    Comment actions Permalink
    Hi,

    for reading zip files please see http://doc.cloveretl.com/documentation/ ... aders.html

Please sign in to leave a comment.