Customer Portal

Issue upgrading from 3.0.1 to 3.3.0

Comments 9

  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hello, Anna,

    I am sorry, but the attached graph is too big for non-involved person. And it also have external metadata which are not attached. Can you please reduce the graph to minimal graph containing this error and attach also all metadata files? (And any other necessary files.) Thank you.

    Best regards,
  • Avatar
    anweston
    0
    Comment actions Permalink
    Heya,

    I will ask if we can cut it down and see if the problem is still there. What other files would you require? What I'm looking for is to see if you might have some general ideas if something changed in v3.3.0 that could cause rows to go out-of-order (multithreading? reallocating buffers?), because the relevant nodes leading up to JOIN_3 are:


    SORT_70-----
    |
    SORT_71-- JOIN_0
    |
    FILTER_0
    |
    ---------JOIN_1
    |
    FILTER_2
    |
    ---------JOIN_3


    SORT_70 is

    sortKey="FIELD_FILE_Main_file_txt_parcelid;FIELD_FILE_Main_file_txt_CLOVER_ROW_NUM;"
    sortOrder="A;A;"

    SORT_71 is

    sortKey=FIELD_CMERGE_Prep_Key_Prep_Parcel;FIELD_CMERGE_Prep_Key_SECNBR;FIELD_CMERGE_Prep_Key_BLD_ID;FIELD_CMERGE_Prep_Key_IN_PORT;FIELD_CMERGE_Prep_Key_CLOVER_ROW_NUM;
    sortOrder="A;A;A;A;A;

    and JOIN_0 is

    joinKey="FIELD_FILE_Main_file_txt_parcelid=FIELD_CMERGE_Prep_Key_Prep_Parcel"


    so it should be in the right order. There are no further sorts on the port 0 path down (and the port 1 sorts shouldn't cause an out-of-order), so what could cause JOIN_3 to throw the error:

    Data input 0 is not sorted in ascending order. Record #51: Key fields="FIELD_CMERGE_Prep_Key_Prep_Parcel:FIELD_CMERGE_Prep_Key_SECNBR:FIELD_CMERGE_Prep_Key_BLD_ID". Current="FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:1"; Previous="FIELD_CMERGE_Prep_Key_Prep_Parcel:0100036 FIELD_CMERGE_Prep_Key_SECNBR:000 FIELD_CMERGE_Prep_Key_BLD_ID:2".


    I'm trying to track it down myself, just hoping to get some pointers on where to look...

    Thanks,
    Anna
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hi, Anna,

    Are you sure you posted the right graph? As far as I can see, the graph looks little bit different:


    SORT_70-----
    |
    SORT_71-- JOIN_0
    |
    FILTER_0
    |
    ---------JOIN_17
    |
    FILTER_1
    |
    ---------JOIN_18


    JOIN_3 is not even among the other components in graph outline in the bottom left pane so I do not understand how can JOIN_3 throw any exception.

    Regarding your question about files, if we are supposed to reproduce the issue on the minimal graph, we need also all possible externalized files you have in your project - metadata, connections, java transformations, sequences, ... And of course a sample of your input data.

    Thanks and best regards,
  • Avatar
    anweston
    0
    Comment actions Permalink
    Heya,

    I double checked that I posted the right graph and took the time to hand draw it out edge-to-edge (we don't have the designer - the graph is autogenerated by code). Following the edges, it is the way I drew it, not the way you drew it. Perhaps there's so many nodes that it does not show up in the designer correctly (Although that makes me worry that if it is showing up wrong in the Deisgner that something *is* going on)?

    We are not going to be able to cut it down or provide sample data for you to reproduce, so I will try and run it with full debug on to see if I can see what's going on, node-wise. When I find a solution, I will post the results...


    Thanks,
    Anna
  • Avatar
    admin
    0
    Comment actions Permalink
    Hi Anna,

    We checked once again your graph in Designer and it really shows something different than saved in graph file. We suspect duplicate IDs in XML for this problem. Unfortunately your graph is too big to check this by manual change.

    Can you please generate your graph in the way all used IDs will be unique? At least we see duplicity among Metadata and Component elements. There is requirement of Engine and Designer that IDs used in graph must be unique.

    Sample of problem:

    <Metadata id="JOIN_1" fileURL="/home/xxx/clover_staging/work/config/NC117/E6/V1/PFA/202547/JOIN_1.fmt"/>
    ....
    <Node id="JOIN_1" type="EXT_MERGE_JOIN" joinKey="FIELD_CMERGE_Prep_Key_Prep_Parcel=FIELD_FILE_ComPCL_txt_parcel_id" transformClass="com.facorelogic.core.etl.transform.LinkFiles" joinType="fullOuter" joinMetadataID="JOIN_2" slaveKeyFields="FIELD_FILE_ComPCL_txt_parcel_id" includeInOrphanRowReport="true" />
  • Avatar
    anweston
    0
    Comment actions Permalink
    Heya,

    I think I am bumping up against this one: https://bug.javlin.eu/browse/CLD-4137

    On Clover v3.0.1, it apparently does not check the sorting order of the master on a ExtMergeJoin. BUT, if they are out of order it silently drops the slave records for out-of-order master records. For example, I had sample records:

    Driver

    ID,D_Seq
    1,1
    1,2


    Secondary1

    ID,S1_Seq
    1,1
    1,2


    Secondary2

    ID,S2_Seq
    1,1
    1,2



    I linked Driver to Secondary1 on ID, then linked Secondary1 to Secondary2 on ID, S1_seq=S2_seq with a graph that looks like:

    DRIVER_INPUT --> SORT ON Driver.ID -----------------------------------
    |
    SECONDARY_1_INPUT --->SORT ON Secondary1.ID, Secondary1.S1_Seq------JOIN_0
    |
    SECONDARY_2_INPUT --->SORT ON Secondary2.ID, Secondary2.S2_Seq------JOIN_1


    When I run this with Clover 3.0.1, I get:

    Driver.ID,Driver.D_Seq,Secondary1.ID,Secondary1.S1_Seq,Secondary2.ID,Secondary2.S2_Seq
    1,1,1,1,1,1
    1,1,1,2,1,2
    1,2,1,1
    1,2,1,2


    This is bad. In Clover v3.3.0, I get an Exception on the ordering of Secondary1 going into JOIN_1. If I add a sort:


    DRIVER_INPUT --> SORT ON Driver.ID -----------------------------------
    |
    SECONDARY_1_INPUT --->SORT ON Secondary1.ID, Secondary1.S1_Seq------JOIN_0
    |
    SORT ON Secondary1.ID, Secondary1.S1_Seq
    |
    SECONDARY_2_INPUT --->SORT ON Secondary2.ID, Secondary2.S2_Seq------JOIN_1


    I get the correct results:

    Driver.ID,Driver.D_Seq,Secondary1.ID,Secondary1.S1_Seq,Secondary2.ID,Secondary2.S2_Seq
    1,1,1,1,1,1
    1,2,1,1,1,1
    1,1,1,2,1,2
    1,2,1,2,1,2


    I noticed that this bug item is marked as Unresolved, but it looks like the behaviour changed in v3.3.0? An exception is much better than silently dropping records, so I'm glad something was done!

    I will change our autogenerator to add a sort for this kind of condition. It is an extra sort (which is expensive), but I assume that from now on the master port will always have to be sorted, too?

    Thanks,
    Anna
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hi, Anna,

    You are right, this bug is already fixed. Thanks for noticing the inconsistency. Information about this fix is now mentioned also in our bugtracking system.

    Best regards,
  • Avatar
    anweston
    0
    Comment actions Permalink
    Heya,

    Is it now a requirement that ID must be unique across ALL components? Our understanding when we started using Clover (back in 2007, so our understanding may be very old!) was that the ID had to be unique within a type (e.g. you could not have 2 nodes with an ID of "JOIN_1", but you COULD have a metadata with an ID of "JOIN_1" and node with an ID of "JOIN_1"). We normally have "checkconfig" turned off, but when it is on the engine does not complain about it and the graphs run fine. Is this globally unique ID a requirement of the Engine or the Designer? We will most likely make the change either way in case we use the Designer at some point, but if it's not critical it can wait until out next release cycle. If you think it could cause a nasty side effect, we may need to fix it right away...

    Thanks,
    Anna
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Hello again,

    Officially, we support just one way of graph creation - via CloverETL Designer. This is the reason why request for unique IDs is not documented - Designer does it automatically. So the only way how to be (relatively) sure when autogenerating the graph is to copy the behavior of Designer as much as possible. We do not know what happens if IDs are not globally unique, this situation can not happen in supported use-cases.

    Best regards,

Please sign in to leave a comment.