Customer Portal

Easiest way to check a history file and only process new records

Comments 1

  • Avatar
    Lukas Cholasta
    0
    Comment actions Permalink
    Hi pintail

    Probably the most suitable solution is to use the DataIntersection component which is designed to this kind of job. However, you will need to sort the inputs, so if you process large amount of data it can be memory or HDD space consuming (depending on which sort component you choose, your memory settings and number of records). I attached a sample graph, where it is shown.

    The DataIntersection has 3 outputs. Following is the layout of the DataIntersection outputs in the sample graph.

      1. Records that weren't processed yet.
      2. Records that are processed again.
      3. Records that were already processed, but aren't processed again.

    Here are some hints, you may find useful. If you don't need to update the status (or contents) of already processed records, you can save only the key values, which are used to match processed and unprocessed records. This can save you memory or HDD space when the sort component process the records. Using the Clover file format can speed up processing the file. However, if you plan to use it outside of the CloverETL software, it will be wise to use some universal file format.

    Best regards

Please sign in to leave a comment.