Customer Portal

How to read mapping with same xmlFields using XMLExtract

Comments 15

  • Avatar
    twaller
    0
    Comment actions Permalink
    Hello,

    Your mapping should look like the following:


    <Mappings>
    <Mapping element="Item_Number">
    <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    <Mapping element="Mill_Description">
    <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    <Mapping element="ManufactureName">
    <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    </Mappings>


    You need to create a sequence (internal or external), XMLExtract will have three output ports and all edges will have identical metadata:


    <Metadata id="Metadata0" previewAttachmentCharset="ISO-8859-1">
    <Record fieldDelimiter="|" name="recordName1" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
    <Field name="count" type="integer"/>
    <Field name="ID" type="string"/>
    </Record>
    </Metadata>


    Then you need to use ExtMergeJoin with the following Transform attribute:


    function transform() {
    $0.field1 := $0.ID;
    $0.field2 := $1.ID;
    $0.field3 := $2.ID;
    }


    The "count" field will serve as Join key field for each of the three edges.

    The output edge of the ExtMergeJoin will have the following metadata:


    <Metadata id="Metadata1">
    <Record fieldDelimiter="|" name="recordName2" recordDelimiter="\r\n" type="delimited">
    <Field name="field1" type="string"/>
    <Field name="field2" type="string"/>
    <Field name="field3" type="string"/>
    </Record>
    </Metadata>


    Then you will obtain the result you request. And you only need to use any writer you want: file writer, db writer, advanced writer, etc.

    Best regards,

    Tomas Waller
  • Avatar
    twaller
    0
    Comment actions Permalink
    Note that the elements without outPort:

    .
    <Mapping element="nameOfTheElement">
    ...
    ...
    </Mapping>


    serve to specify the parent element for each child element (ID, in our case) that already contains some outPort in its mapping.

    Regards,

    Tomas Waller
  • Avatar
    twaller
    0
    Comment actions Permalink
    I forgot mention that you need to create a sequence whose ID will be Sequence0. Its values will be used as values of the "count" fields.
  • Avatar
    pgargdd
    0
    Comment actions Permalink
    Thanks Tomas for your quick reply. Now I am able to run XML to Database transform successfully using clover graph.

    But when I am trying to replicate things in Java, I am facing an issue "how to create Mapping element without specifying output port in java".

    I have gone through the XMLExtract.Mapping javadocs API, but I am not able to found any constructor which works only for element.
    new XMLExtract.Mapping(java.lang.String element) ;

    Also is there any alternate available to do mappings things without the use of Sequence as we need to provide string fileName while creating sequence which can cause IO problem when dealing with concurrent access to the file.

    Thanks
    Pushpendra
  • Avatar
    pgargdd
    0
    Comment actions Permalink
    My java code which I am using to run the above XML transformation are as follows:
    As I not able to get any constructor in XMLExtract.Mapping which works only for element and not for outPort, I am passing outPort as -ve nos.

    // create connection object. Get driver and connect string from cfg file
    // specified as a first argument
    DBConnection dbCon = new DBConnection("Conn0", "D:/postgre.cfg");
    dbCon.setName("Test");

    EngineInitializer.initEngine("plugins", null, null);
    EngineInitializer.forceActivateAllPlugins();
    GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

    TransformationGraph graph = new TransformationGraph();
    graph.setDebugMode(true);
    graph.addConnection(dbCon);

    Phase phase = new Phase(0);

    Sequence sequence = new SimpleSequence("Sequence0",graph,"Sequence0" , "C:/misc/seq.dat",0,1,1000);
    graph.addSequence(sequence);
    sequence.isInitialized();

    DataRecordMetadata metadata1=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
    metadata1.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
    metadata1.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));
    Edge inEdge1 = new Edge("InEdge1",metadata1);

    DataRecordMetadata metadata2=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
    metadata2.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
    metadata2.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));
    Edge inEdge2 = new Edge("InEdge2",metadata2);

    XMLExtract nodeXML=new XMLExtract("Smallest");
    nodeXML.setInputFile("C:/misc/smallest.xml");
    nodeXML.setEnabled("enabled");
    nodeXML.setUseNestedNodes(true);
    nodeXML.addOutputPort(0, inEdge1);
    nodeXML.addOutputPort(1, inEdge2);

    XMLExtract.Mapping childMap1=nodeXML.new Mapping("Item_Number",-1);
    XMLExtract.Mapping childMap2=nodeXML.new Mapping("ID",0);
    childMap2.setSequenceField("count");
    childMap2.setSequenceId("Sequence0");
    childMap1.addChildMapping(childMap2);
    nodeXML.addMapping(childMap1);

    XMLExtract.Mapping childMap4=nodeXML.new Mapping("Mill_Description",-2);
    XMLExtract.Mapping childMap3=nodeXML.new Mapping("ID",1);
    childMap3.setSequenceField("count");
    childMap3.setSequenceId("Sequence0");
    childMap4.addChildMapping(childMap3);
    nodeXML.addMapping(childMap4);

    DataRecordMetadata metadata3=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
    metadata3.addField(new DataFieldMetadata("field1", DataFieldMetadata.STRING_FIELD,"\n"));
    metadata3.addField(new DataFieldMetadata("field2", DataFieldMetadata.STRING_FIELD,"\n"));
    Edge inEdge3 = new Edge("InEdge3",metadata3);

    String[] joinKeys = {"count"};
    MergeJoin join = new MergeJoin("join1","count;#count", null,"com.clover.test.Transform","",MergeJoin.Join.LEFT_OUTER,true,true);
    join.addInputPort(0, inEdge1);
    join.addInputPort(1, inEdge2);
    join.setGraph(graph);
    join.setPhase(phase);
    join.addOutputPort(0, inEdge3);

    DBOutputTable dbOutputTable = new DBOutputTable("DBTABLE",dbCon.getId(),"product");
    dbOutputTable.setSqlQuery("insert into product (product_id,name) values(?,?)");
    dbOutputTable.addInputPort(0, inEdge3);

    String[] cloverFields = {"field1;field2"};
    dbOutputTable.setCloverFields(cloverFields);

    //add nodes to phase
    try {
    graph.addPhase(phase);
    phase.addNode(nodeXML);
    phase.addNode(join);
    phase.addNode(dbOutputTable);
    graph.addEdge(inEdge1);
    graph.addEdge(inEdge2);
    graph.addEdge(inEdge3);

    EngineInitializer.initGraph(graph, runtimeContext);
    } catch (GraphConfigurationException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (ComponentNotReadyException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    IThreadManager threadManager = new SimpleThreadManager();
    WatchDog watchDog = new WatchDog(graph,runtimeContext);
    runtimeContext.setUseJMX(true);
    watchDog.getGraphRuntimeContext().setVerboseMode(true);
    threadManager.executeWatchDog(watchDog);


    and my Transform method code :
    public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords) throws TransformException {
    try {
    // user's code STARTs from here !
    System.err.println("### Running Transformation ");
    (outputRecords[0].getField(0)).setValue((inputRecords[0].getField(0).toString()));
    (outputRecords[0].getField(1)).setValue((inputRecords[1].getField(0).toString()));
    //(outputRecords[0].getField(2)).setValue((inputRecords[2].getField(0).toString()));

    // user's code ENDs here !
    } catch(Exception e) {
    throw new TransformException("Error in transformation class " + Transform.class.getName() + ": " + e.getMessage(), e);
    }
    return ALL;
    }


    I am getting NullRecord at outPort 1 from XMLExtract to MergeJoin when transform method gets invoked. And also because of NullRecord I am getting exception as parameter number: 2 missing in insert query.

    Please suggest some solution for the above issue.

    Thanks
    Pushpendra
  • Avatar
    jausperger
    0
    Comment actions Permalink
    Try this, there was wrong id and maybe something else:



    EngineInitializer.initEngine("plugins", null, null);
    EngineInitializer.forceActivateAllPlugins();
    GraphRuntimeContext runtimeContext = new GraphRuntimeContext();

    TransformationGraph graph = new TransformationGraph();
    graph.setDebugMode(true);

    Phase phase = new Phase(0);

    DataRecordMetadata metadata1=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
    metadata1.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
    metadata1.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));
    Edge inEdge1 = new Edge("InEdge1",metadata1);

    DataRecordMetadata metadata2=new DataRecordMetadata("RecordMetadata0",DataRecordMetadata.DELIMITED_RECORD);
    metadata2.addField(new DataFieldMetadata("ID", DataFieldMetadata.STRING_FIELD,"\n"));
    metadata2.addField(new DataFieldMetadata("count", DataFieldMetadata.INTEGER_FIELD,"\n"));
    Edge inEdge2 = new Edge("InEdge2",metadata2);

    XMLExtract nodeXML=new XMLExtract("Smallest");
    nodeXML.setInputFile("c:/Workspaces/wruntime/examples/smallest.xml");
    nodeXML.setEnabled("enabled");
    nodeXML.setUseNestedNodes(true);
    nodeXML.addOutputPort(0, inEdge1);
    nodeXML.addOutputPort(1, inEdge2);

    XMLExtract.Mapping childMap1=nodeXML.new Mapping("Item_Number",-1);
    childMap1.prepareProcessSkipOrNumRecords();
    XMLExtract.Mapping childMap2=nodeXML.new Mapping("ID",0);
    childMap2.setSequenceField("count");
    childMap2.setSequenceId(null);
    childMap2.setSequenceId("Sequence1");
    childMap1.addChildMapping(childMap2);
    childMap2.setParent(childMap1);
    childMap2.prepareProcessSkipOrNumRecords();
    nodeXML.addMapping(childMap1);

    XMLExtract.Mapping childMap4=nodeXML.new Mapping("Mill_Description",-2);
    childMap4.prepareProcessSkipOrNumRecords();
    XMLExtract.Mapping childMap3=nodeXML.new Mapping("ID",1);
    childMap3.setSequenceField("count");
    childMap3.setSequenceId(null);
    childMap3.setSequenceId("Sequence1");
    childMap4.addChildMapping(childMap3);
    childMap3.setParent(childMap4);
    childMap3.prepareProcessSkipOrNumRecords();
    nodeXML.addMapping(childMap4);

    Trash trash1 = new Trash("trash1");
    trash1.setDebugPrint(true);
    trash1.addInputPort(0, inEdge1);

    Trash trash2 = new Trash("trash2");
    trash2.setDebugPrint(true);
    trash2.addInputPort(0, inEdge2);

    //add nodes to phase
    try {
    graph.addPhase(phase);
    phase.addNode(nodeXML);
    phase.addNode(trash1);
    phase.addNode(trash2);
    graph.addEdge(inEdge1);
    graph.addEdge(inEdge2);

    EngineInitializer.initGraph(graph, runtimeContext);
    } catch (GraphConfigurationException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    } catch (ComponentNotReadyException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    IThreadManager threadManager = new SimpleThreadManager();
    WatchDog watchDog = new WatchDog(graph,runtimeContext);
    runtimeContext.setUseJMX(true);
    watchDog.getGraphRuntimeContext().setVerboseMode(true);
    threadManager.executeWatchDog(watchDog);

  • Avatar
    avackova
    0
    Comment actions Permalink
    Hello Pushpendra,
    it may be impossible to configure XMLExtract correctly from java code, while some setters are private (http://bug.cloveretl.org/view.php?id=2819).
    As a workaround I can advice you to use fromXML method of XMLExtract object:
    		XMLExtract nodeXML = null;
    try {
    nodeXML = (XMLExtract) XMLExtract.fromXML(graph, read(new FileInputStream("data-tmp/Smallest.xml")).getDocumentElement());
    } catch (Exception e1) {
    // TODO Auto-generated catch block
    e1.printStackTrace();
    return;
    }
    with the Tomas Waller's mapping
  • Avatar
    pgargdd
    0
    Comment actions Permalink
    Thanks Agata and Jausperger for your ruick reply.

    Agata, this workaround won't work because size of xml which I am using is more than 500MB, so it won't be feasible to load that much amount of data in memory. I have currently creating object by passing port as negative nos.

    Jausperger, actually the thing which I am looking is to merge the outPort 0 & 1 output of XMLExtract to database table in one single row at a time using ExtMergeJoin as suggested by Tomas. I was able to do so using graph but I am not able to do so using clover Java API.
    You have given example code for outPort to different trash but I am looking to merge those different port output to one single database row. I have tried this using MergeJoin component as it is the replica component of ExtMergeJoin but not able to do so.

    Please suggest some solution to do so. You can refer code for Merge Join in my earlier post.

    Thanks
    Pushpendra
  • Avatar
    avackova
    0
    Comment actions Permalink
    You don't need to load the input file to memory. Create the xml file with node definition:
    <Node id="Smallest" sourceUri="data-in/smallest.xml" type="XML_EXTRACT" useNestedNodes="true">
    <attr name="mapping"><![CDATA[<Mappings>
    <Mapping element="Item_Number">
    <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    <Mapping element="Mill_Description">
    <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    <Mapping element="ManufactureName">
    <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    </Mappings>]]></attr>
    </Node>

    and this file load in fromXML method
  • Avatar
    pgargdd
    0
    Comment actions Permalink
    Thanks again Agata for your quick reply.

    I have tried your code and it works fine. But still it won't work in my case as my xml structure is not defined and also mapping elements are not defined, they are changing dynamically. So with your approach I need to write an xml file with logic catering to defined mapping structure.

    Please send me some solution for my issue regarding Merge Join in my earlier post.

    Thanks in advance

    -Pushpendra
  • Avatar
    avackova
    0
    Comment actions Permalink
    Could you send the error stack trace? I can't reproduce the problem.
  • Avatar
    pgargdd
    0
    Comment actions Permalink
    Thanks Agata, now I am able to successfully executing XML to Database transformation using clover Java API. I have created logic to create mapping stucture dynamically according to my requirement.

    I just want to know why a restrictions is put on Merge Join to have atleast two input port, also if you can give some inputs regarding the transformation language we can use at the time of merging the inputs using Merge Join.

    Thanks
    Pushpendra
  • Avatar
    avackova
    0
    Comment actions Permalink
    MergeJoin has to have at least two input ports as it merges data from two different streams.
    For CTL see CloverETL Transformation Language concept and Clover ETL Transformation Language functions.
  • Avatar
    pgargdd
    0
    Comment actions Permalink
    Hi,

    Using above mapping structure I am not able to read attributes values from xml files. The above mapping structure works fine for reading elements values, but in my XML file I want to read attributes as well. So for reading attributes values as well what are the changes that I need to do in mapping structure.

    Thanks
    Pushpendra
  • Avatar
    twaller
    0
    Comment actions Permalink
    Hello,

    I have an input file with attributes:


    <?xml version="1.0" encoding="UTF-8"?>
    <Rows>
    <Row>
    <Item_Number>
    <ID myattribute="valueItem1">Item 1</ID>
    </Item_Number>
    <GTIN_Number>00erer013616</GTIN_Number>
    <Mill_Code>03</Mill_Code>
    <Mill_Description>
    <ID myattribute="valueProduct1">Product 1</ID>
    </Mill_Description>
    <Style_Number>
    <Style>
    <ManufactureName>
    <ID myattribute="valueMC1">MC1</ID>
    </ManufactureName>
    </Style>
    </Style_Number>
    </Row>
    <Row>
    <Item_Number>
    <ID myattribute="valueItem2">Item 2</ID>
    </Item_Number>
    <GTIN_Number>0004erere3623</GTIN_Number>
    <Mill_Code>03</Mill_Code>
    <Mill_Description>
    <ID myattribute="valueProduct2">Product 2</ID>
    </Mill_Description>
    <Style_Number>
    <Style>
    <ManufactureName>
    <ID myattribute1="valueMC2">MC2</ID>
    </ManufactureName>
    </Style>
    </Style_Number>
    </Row>
    </Rows>


    And use the following graph:


    <?xml version="1.0" encoding="UTF-8"?>
    <Graph author="cloveruser" created="Tue Jan 12 11:32:50 CET 2010" guiVersion="0.0.0.devel" id="1263292668555" licenseType="Evaluation Devel" modified="Tue Jan 12 15:59:35 CET 2010" modifiedBy="cloveruser" name="XMLForumAttributes" revision="1.63">
    <Global>
    <Metadata id="Metadata5" previewAttachmentCharset="ISO-8859-1">
    <Record fieldDelimiter="|" name="recordName6" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
    <Field name="ID" type="string"/>
    <Field name="count" type="integer"/>
    <Field name="myattribute" type="string"/>
    </Record>
    </Metadata>
    <Property fileURL="workspace.prm" id="GraphParameter0"/>
    <Sequence cached="1" fileURL="${SEQ_DIR}/seq.txt" id="Sequence0" name="counting" start="1" step="1" type="SIMPLE_SEQUENCE"/>
    </Global>
    <Phase number="0">
    <Node enabled="enabled" guiHeight="0" guiName="Trash" guiWidth="0" guiX="334" guiY="350" id="TRASH0" type="TRASH"/>
    <Node enabled="enabled" guiHeight="0" guiName="Trash" guiWidth="0" guiX="330" guiY="140" id="TRASH1" type="TRASH"/>
    <Node enabled="enabled" guiHeight="0" guiName="Trash" guiWidth="0" guiX="335" guiY="255" id="TRASH2" type="TRASH"/>
    <Node enabled="enabled" guiHeight="0" guiName="XMLExtract" guiWidth="0" guiX="18" guiY="241" id="XML_EXTRACT0" sourceUri="${DATAIN_DIR}/XMLforumFINALComplex.xml" type="XML_EXTRACT">
    <attr name="mapping"><![CDATA[<Mappings>
    <Mapping element="Item_Number">
    <Mapping element="ID" outPort="0" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    <Mapping element="Mill_Description">
    <Mapping element="ID" outPort="1" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    <Mapping element="ManufactureName">
    <Mapping element="ID" outPort="2" sequenceField="count" sequenceID="Sequence0" />
    </Mapping>
    </Mappings>]]></attr>
    </Node>
    <Edge debugMode="true" fromNode="XML_EXTRACT0:0" guiBendpoints="" id="Edge1" inPort="Port 0 (in)" metadata="Metadata5" outPort="Port 0 (out)" toNode="TRASH1:0"/>
    <Edge debugMode="true" fromNode="XML_EXTRACT0:1" guiBendpoints="" id="Edge6" inPort="Port 0 (in)" metadata="Metadata5" outPort="Port 1 (out)" toNode="TRASH2:0"/>
    <Edge debugMode="true" fromNode="XML_EXTRACT0:2" guiBendpoints="" id="Edge0" inPort="Port 0 (in)" metadata="Metadata5" outPort="Port 2 (out)" toNode="TRASH0:0"/>
    </Phase>
    </Graph>


    As the result, both the values and the attributes are sent to output ports.

    Or did you have other graph or other mapping?

    Best regards,

    Tomas Waller

Please sign in to leave a comment.