Customer Portal

Large map object in dictionary fails with persistence truncation error in Designer, Clover engine (command line) works?

Comments 3

  • Avatar
    Lukas Cholasta
    0
    Comment actions Permalink
    Hi

    This behavior is a bug and I reported it.

    However in your case it should be possible to avoid using dictionary, which is causing this issue. You can declare a whole new map in the component where you fill it with data and then send this map over the edge to the other component. The filling and reading functions will stay almost the same, you just need to replace the dictionary.map with your new map.

    If such solution is not possible, please send me your graph and some input file with a representative sample of the data you are working with.

    Also, graph created in Designer shouldn't be run by the open-source engine as it could behave unexpectedly. E.g. the set of components is not exactly the same and that could cause issues.

    Best regards.
  • Avatar
    fvanderham
    0
    Comment actions Permalink
    Thanks for confirming/creating the bug report!

    I learned something new from your note regarding the ability to define a container (list, map) as a metadata field type. That is really powerful. Thank you for that!

    I think I understand what you meant by sending the map "over the edge" to another component: you mean adding a container field to the metadata and then setting the output port's value for that field to the generated map value:


    map[string,string] theMap;
    function integer transform() {
    theMap[$in.0.KEY] = $in.0.VALUE;
    $out.0.MAPFIELD = theMap;
    return ALL;
    }


    However, implementing it this way would be very memory intensive, wouldn't it? If the map has many elements, then each output row would contain a deep-copy of the (partial) map up to that point and the very last output row would contain the final map.

    So, I started experimenting with using two Reformat components, one that is simply counting the number of rows for the map and storing that count in the dictionary (integer). The next Reformat executes at a later phase and uses that count to SKIP all output rows except the last one:


    map[string, string] theMap;
    integer rowCount = 0;
    function integer transform() {
    theMap[$in.0.KEY] = $in.0.VALUE;
    rowCount++;
    if (rowCount == dictionary.mapCount) {
    $out.0.MAPFIELD = theMap;
    return ALL;
    }
    return SKIP;
    }


    The reduces the output of the "read the map" phase to a single row.

    The next challenge is to pass this map to the "using the map" phase. In the phase where we need to use the map, the metadata is very different from the previous phase. I could not find a suitable component that can combine that single input row that contains the map with an input containing many rows with a different metadata.

    So I resorted to using an "intermediate" metadata that is the target metadata with that map field added. Then, "read the map" phase creates an output with one single row using this metadata, containing our map. I then use a SimpleGather to combine this "fake" single row with the actual data in the subsequent rows. A Reformat then treats that first row differently from the remaining rows, pulls the map off the first row only and uses that map for processing the remaining "real" rows:


    map[string,string] theMap;
    integer rowCounter = 0;
    function integer transform() {
    if (rowCounter == 0) {
    theMap = $in.0.MAPFIELD;
    printLog(info, "The map size: " + length(theMap));
    rowCounter++;
    return SKIP;
    } else {
    $out.0.ORIGORDER = $in.0.ORIGORDER;
    $out.0.CHILD = theMap["SOME_KEY_ON_THE_MAP"];
    }
    rowCounter++;
    return ALL;
    }


    It does work, but it's a lot of moving parts compared to being able to use a dictionary map, so I appreciate any tips or comments you may have!
  • Avatar
    Lukas Cholasta
    0
    Comment actions Permalink
    Hi

    You don't need to use two reformats as CloverETL software has a component to do this kind of job, it's called Denormalizer. It takes all the input records you send to it and creates one output record, a map in this case.

    The workaround you developed to get the map to the second component is ok. Alternatively you can use lookup table to store the data instead of map. This way you can avoid sending the map over the edge completely.

    I attached a sample graph, where the aforementioned is shown.

    map_lookup_sample.grf
    Hope this helps.

Please sign in to leave a comment.