Customer Portal

Write out to multiple files with partition key

Comments 7

  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Perri,

    In writer components, there are "Records per file" and "Bytes per file" attributes that do exactly what you are looking for. In case you use any of these attributes, you must add dollar signs ($) placeholders (file numbering) into URL.

    For more information about these attributes, please refer to our documentation: http://doc.cloveretl.com/documentation/ ... riter.html

    Hope this helps.
  • Avatar
    yichuansancun
    0
    Comment actions Permalink
    I tried that, it still outputs 1 file per employee. I attached the configuration of my writer component.

    Let me know what I didnt configure correctly.

    Thanks,
    Perri
  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Perry,

    what Clover version do you use? It would be helpful to see your graph and some sample input data if it's possible.

    Thanks,
  • Avatar
    yichuansancun
    0
    Comment actions Permalink
    Hi, I attached the graph. I changed the byes to 5000. Right now, it is still writing out 74 files because there are 74 emplids. For some reason, I cannot attach the test data, it says that "The extension xlsx is not allowed".

    I am using Clover Designer Community version.

    Thanks,
    P
  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Perry,

    Please take a look at the attach graph (whole project is in the zip archive). Could you please try to run the graph and check if the input data in the zip archive are transferred into correctly partitioned output files? You should get three files partitioned per key and per number of records. In the graph you have attached, you were missing dollar sign, which is necessary when you use "Records per file" or "Bytes per file" attributes.

    Let me know if you have any questions.
  • Avatar
    yichuansancun
    0
    Comment actions Permalink
    Hi, I ran your graph and it did output three files. However, what I wanted is that employees with the same emplid should be in the same file. In your example emplid - 123186741 is getting outputed into two files, since you set the records per file being 2.

    Is there a way I can get the output based on 'byes per record', but don't parse same emplid into two different file?

    In sample file you sent, the output should be either both employees John and Martin be in the same file, or John will be in one file and Martin will be in one file (if byes per records is set). My real data set will have over 500K rows of data with 170K emplid. Right now, if it is writing 170K output files because it is putting each emplid into 1 file.

    Hope I explained my issue, do let me know if you have any questions.

    P
  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Perry,

    Okay, I think I got what you are saying. In order to do this, you will have to introduce some auxiliary (artificial) key that will help you partition data into files. For this purpose I have calculated byte length of all records with the same emplid. Then I simply assign the auxiliary id to each row based on previously calculated byte length (for more information take please refer to Denormalizer and ExtMergeJoin transformations in the graph).

    The project is again attached. Hope this helps.

Please sign in to leave a comment.