Customer Portal

File Operations and JobFlow patterns

Comments 9

  • Avatar
    admin
    0
    Comment actions Permalink
    Hi M. Elshami,

    What you need can be done using 3 components in CloverETL. Please see attached JobFlow. It works like this:


    • ListFiles - list files using pattern, in my case *.csv

    • In output mapping of ListFiles I map URL of each matched file to custom metadata which contains only from and to fields.

    • In Reformat I fill to field. In my case it is just replacing folder csvs by csvs2 in URL. But you can do any complex logic you want - put there another server, change filename, add timestamp to name, ...

    • So in output metadata we have both original URL of file and desired new URL.

    • Now we can use CopyFiles or MoveFiles with proper input mapping to copy/move files.


    If you need to process files on Server then process would be:

    • Use steps above to copy files to local Server storage

    • Process files locally

    • Use steps above to copy files to destination directory


    Or alternatively use Supported File URL Formats for Readers and Supported File URL Formats for Writers for direct file access from Readers/Writers. You would load remote file for processing and save remote file with results - so no local copy would be necessary.

    You can use JobFlow to simplify processing logic, for example:

    • Prepare master JobFlow which only list files and then for each executes sub JobFlow (passing file URL as parameter)

    • In sub JobFlow download file from original location to local file, execute processing graph, and move/copy result to destination server/folder

    • In processing graph you just process local file and save result into local file


    I hope this helps.
  • Avatar
    melshami
    0
    Comment actions Permalink
    Thanks a lot Jaroslav, this was helpful.

    I am very new to CloverETL, not sure how the binding works in the reformat component, looks like you've assigned the out.0.from and out.0.to in the ListFiles component?

    function integer transform() {
    $out.0.from = $in.0.from;
    $out.0.to = $in.0.from.replace("DailyTrans", "DailyTrans_20150102.csv" );
    printLog(info, "out.0.from: " + $out.0.from);
    return ALL;
    }

    Also, how do I specify more dynamic input pattern? so instead of wildcards, I would like to specify input pattern based on the date, e.g. DailyTrans_<date>*.csv

    Regards,
    Mohamed
  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Mohamed,

    As you can see Clover components have input and/or output ports. And in order to work with data coming through these ports in CTL you work with the following values:
    $in.0 - Which represents the first input port. Zero is the index of the input port and the word in tells clover it is input port. Numbering of port indexes begins with 0 (0 – first port, 1 – second port etc.)
    $out.0 - Similarly to previous example, this stands for the first (index 0) output port (out)

    Regarding the pattern you would like to use, there are two ways:
    • For simple patterns (just like you have) you may still use wildcards (just like DailyTrans_????-??-??*.csv – which can handles strings like DailyTrans_2012-11-27_Monaco.csv etc.).

    • For more complicated patterns you may at first list all files from a folder (using ListFiles) and after that use ExtFilter component to filter out unwanted records based on regular expression comparison.


    Hope this helps.
    Jan
  • Avatar
    melshami
    0
    Comment actions Permalink
    Thanks a lot Jan

    I made another attempt to create basic ListFiles -> CopyFiles flow.

    I can't figure out how URL metadata is propagated, I thought it's automatically recognised, but then looking at your example it seems that I have to do input mapping and output mapping.

    I've trying the below CTL2 the CopyFiles input mapping:

    // Transforms input record into output record.
    function integer transform() {
    $out.0.sourceURL = $in.0.URL;

    return ALL;
    }

    I am getting the following error:

    Caused by: java.lang.IllegalArgumentException: Copy source is empty
    at org.jetel.component.fileoperation.FileManager.copy(FileManager.java:271)

    I've attached the jobflow example.

    Regards,
    Mohamed
  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi Mohamend,

    as you can see you are getting "Copy source is empty" message, which means the URL string is empty. If you enable debug on the edge between ListFiles and CopyFiles you can view the data that goes through it. In your case it is only empty record. The reason why the record does not contain any data is that you haven't defined Output mapping in ListFiles component. I've prepared a short example for you (copies all files from data-in to data-out).

    local-files-copy.jbf
  • Avatar
    melshami
    0
    Comment actions Permalink
    Thanks Jan,

    I got it now.
  • Avatar
    pintail
    0
    Comment actions Permalink
    Hi - do you have a jobflow example graph that runs a series of graphs in sequence assuming that the previous graph executes successfully for the next one to run? I'm looking to automate a series of graphs that take a long time to run, so trying to break them up into smaller running parts to the memory can flush itself out as well as join small partitions of data to speed it up a little bit. I can't find a good example out there of how to use jobflow to call more graphs once one has finished without error.

    thanks for any help!
  • Avatar
    slechtaj
    0
    Comment actions Permalink
    Hi pintail,

    CloverETL Server comes with set of examples in which you may find the answers to your questions. You might want to start with jobflows in JobflowExamples sandbox.

    Hope this helps.
  • Avatar
    pintail
    0
    Comment actions Permalink
    It does thanks - I didn't even think to look in the example sandboxes...completely slipped my mind. thanks!

Please sign in to leave a comment.