Customer Portal

How do you remove bad data?

Comments 10

  • Avatar
    wmdbowler
    0
    Comment actions Permalink
    A colleague of my recommended I use the Universal Data Reader rather than a Delimited Data Reader and use the Quoted String option. I had tried this and still error on the same bad single quote in a field. Here is the error:

    ERROR [WatchDog] - Node DATA_READER0 finished with status: ERROR caused by: Pars
    ing error: Bad quote format in record # 259445 in field # 5
    DEBUG [WatchDog] - Node DATA_READER0 error details:
    org.jetel.exception.BadDataFormatException: Parsing error: Bad quote format in r
    ecord # 259445 in field # 5
    at org.jetel.data.parser.DataParser.parsingErrorFound(DataParser.java:44
    7)
    at org.jetel.data.parser.DataParser.parseNext(DataParser.java:330)
    at org.jetel.data.parser.DataParser.getNext(DataParser.java:140)
    at org.jetel.util.MultiFileReader.getNext(MultiFileReader.java:233)
    at org.jetel.component.DataReader.execute(DataReader.java:181)
    at org.jetel.graph.Node.run(Node.java:371)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
    utor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
    .java:675)
    at java.lang.Thread.run(Thread.java:595)
  • Avatar
    avackova
    0
    Comment actions Permalink
    Hi,
    Universal Data Reader doesn't help, because start and end quote have to be the same - in other case exception is thrown.
    Problem with your reformat is that you mix Transformation Language and Transformation Language Lite. The last one should be used for mappings with small modifications eventually; code written in TLLite is converted to java code. You can use replace function in CTL:

    //#TL
    string variable1;
    string variable2;
    function transform(){
    variable1 = replace(${stline1},"\'","");
    variable2 = replace(${stline2},"\'","");
    ${curentrecno} := ${curentrecno};
    ${caudrecno} := ${caudrecno};
    ${maudrecno} := ${maudrecno};
    ${stline1} := variable1;
    ${stline2} := variable2;
    ${stline3} := ${stline3};
    ${stline4} := ${stline4};
    ${city} := ${city};
    ${state} := ${state};
    ${zipcode} := ${zipcode};
    }
  • Avatar
    wmdbowler
    0
    Comment actions Permalink
    I tried placing the code you provided in as a CTL tranformation (in the GUI I chose the Clover Transformation Language choice) but I get the following error:

    Graph definition file: testformat.grf
    WARN [main] - Can't resolve reference to graph property: stline1
    WARN [main] - Can't resolve reference to graph property: stline2
    WARN [main] - Can't resolve reference to graph property: curentrecno
    WARN [main] - Can't resolve reference to graph property: curentrecno
    WARN [main] - Can't resolve reference to graph property: caudrecno
    WARN [main] - Can't resolve reference to graph property: caudrecno
    WARN [main] - Can't resolve reference to graph property: maudrecno
    WARN [main] - Can't resolve reference to graph property: maudrecno
    WARN [main] - Can't resolve reference to graph property: stline1
    WARN [main] - Can't resolve reference to graph property: stline2
    WARN [main] - Can't resolve reference to graph property: stline3
    WARN [main] - Can't resolve reference to graph property: stline3
    WARN [main] - Can't resolve reference to graph property: stline4
    WARN [main] - Can't resolve reference to graph property: stline4
    WARN [main] - Can't resolve reference to graph property: city
    WARN [main] - Can't resolve reference to graph property: city
    WARN [main] - Can't resolve reference to graph property: state
    WARN [main] - Can't resolve reference to graph property: state
    WARN [main] - Can't resolve reference to graph property: zipcode
    WARN [main] - Can't resolve reference to graph property: zipcode
    INFO [main] - Checking graph configuration...
    INFO [WatchDog] - Thread started.
    INFO [WatchDog] - Running on 16 CPU(s) max available memory for JVM 3728320 KB
    INFO [WatchDog] - [Clover] Initializing phase: 0
    DEBUG [WatchDog] - initializing edges:
    DEBUG [WatchDog] - all edges initialized successfully...
    DEBUG [WatchDog] - initializing nodes:
    ERROR [WatchDog] - org.jetel.interpreter.ParseException: Encountered "( $" at line 5, column 21.
    Was expecting one of:
    ";" ...
    <OR> ...
    <AND> ...
    <EQUAL> ...
    <NON_EQUAL> ...
    ".in." ...
    <LESS_THAN> ...
    <LESS_THAN_EQUAL> ...
    <GREATER_THAN> ...
    <GREATER_THAN_EQUAL> ...
    <REGEX_EQUAL> ...
    "-" ...
    "+" ...
    "*" ...
    "/" ...
    "%" ...
    "++" ...
    "--" ...
    "(" "isnull(" ...
    "(" "nvl(" ...
    "(" "nvl2(" ...
    "(" "iif(" ...
    "(" "sequence(" ...
    "(" "lookup(" ...
    "(" "lookup_next(" ...
    "(" "lookup_found(" ...
    "(" "lookup_admin(" ...
    "(" "eval(" ...
    "(" "eval_exp(" ...
    "(" "print_err(" ...
    "(" "print_stack(" ...
    "(" "breakpoint(" ...
    "(" "print_log(" ...
    "(" "raise_error(" ...
    "(" <IDENTIFIER> ...
    "(" "int" ...
    "(" "long" ...
    "(" "date" ...
    "(" <DOUBLE_VAR> ...
    "(" "decimal" ...
    "(" "boolean" ...
    "(" "string" ...
    "(" "bytearray" ...
    "(" "list" ...
    "(" "map" ...
    "(" "record" ...
    "(" "object" ...
    "(" "year" ...
    "(" "month" ...
    "(" "week" ...
    "(" "day" ...
    "(" "hour" ...
    "(" "minute" ...
    "(" "second" ...
    "(" "millisec" ...
    "(" <FLOATING_POINT_LITERAL> ...
    "(" <HEX_LITERAL> ...
    "(" <OCTAL_LITERAL> ...
    "(" <INTEGER_LITERAL> ...
    "(" <STRING_LITERAL> ...
    "(" "\'" ...
    "(" <BOOLEAN_LITERAL> ...
    "(" <DATE_LITERAL> ...
    "(" <DATETIME_LITERAL> ...
    "(" "null" ...
    "(" "[" ...
    "(" <UNTERMINATED_STRING_LITERAL> ...
    "(" <FIELD_ID> ...
    "(" <REC_NAME_FIELD_ID> ...
    "(" <REC_NUM_FIELD_ID> ...
    "(" <REC_NUM_FIELD_NUM> ...
    "(" <REC_NAME_FIELD_NUM> ...
    "(" <REC_NAME_ID> ...
    "(" <REC_NUM_ID> ...
    "(" "(" ...
    "(" "++" ...
    "(" "--" ...
    "(" "+" ...
    "(" "-" ...
    "(" "~" ...
    "(" <NOT> ...
    "(" "," ...
    "(" ")" ...

    ERROR [WatchDog] - Phase initialization failed with reason: REFORMAT0 ...FAILED !
    REFORMAT0 ...FAILED !
    at org.jetel.graph.Phase.init(Phase.java:161)
    at org.jetel.graph.runtime.WatchDog.executePhase(WatchDog.java:565)
    at org.jetel.graph.runtime.WatchDog.call(WatchDog.java:155)
    at org.jetel.graph.runtime.WatchDog.call(WatchDog.java:70)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
    at java.util.concurrent.FutureTask.run(FutureTask.java:123)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:595)

    Below is what my reformat node looks like in the graph after changing it to your suggestion. Is there a way to log out the java to review what is generated?

    <Node enabled="enabled" guiHeight="0" guiName="Reformat" guiWidth="0" guiX="453" guiY="283" id="REFORMAT0" type="REFORMAT">
    <attr name="transform"><![CDATA[//#TL
    string variable1;
    string variable2;
    function transform(){
    variable1 = replace(${stline1},"\'","");
    variable2 = replace(${stline2},"\'","");
    ${curentrecno} := ${curentrecno};
    ${caudrecno} := ${caudrecno};
    ${maudrecno} := ${maudrecno};
    ${stline1} := variable1;
    ${stline2} := variable2;
    ${stline3} := ${stline3};
    ${stline4} := ${stline4};
    ${city} := ${city};
    ${state} := ${state};
    ${zipcode} := ${zipcode};
    }]]></attr>
    </Node>

    The graph is using this metadata:
    <Metadata id="memaddr">
    <Record name="memaddr" recordSize="-1" type="delimited">
    <Field delimiter="|" name="curentrecno" nullable="true" shift="0" type="long"/>
    <Field delimiter="|" name="caudrecno" nullable="true" shift="0" type="long"/>
    <Field delimiter="|" name="maudrecno" nullable="true" shift="0" type="long"/>
    <Field delimiter="|" name="stline1" nullable="true" shift="0" type="string"/>
    <Field delimiter="|" name="stline2" nullable="true" shift="0" type="string"/>
    <Field delimiter="|" name="stline3" nullable="true" shift="0" type="string"/>
    <Field delimiter="|" name="stline4" nullable="true" shift="0" type="string"/>
    <Field delimiter="|" name="city" nullable="true" shift="0" type="string"/>
    <Field delimiter="|" name="state" nullable="true" shift="0" type="string"/>
    <Field delimiter="\n" name="zipcode" nullable="true" shift="0" type="string"/>
    </Record>
    </Metadata>
  • Avatar
    avackova
    0
    Comment actions Permalink
    I'm sorry data fields are without brackets (Mapping), so correct code is:
    //#TL
    string variable1;
    string variable2;
    function transform(){
    variable1 = replace($stline1,"\'","");
    variable2 = replace($stline2,"\'","");
    $curentrecno := $curentrecno;
    $caudrecno := $caudrecno;
    .
    .
    .
    }
  • Avatar
    wmdbowler
    0
    Comment actions Permalink
    Thank you for your help. I correctly have this reformat working now with this code:
    //#TL
    string variable1;
    string variable2;
    function transform(){
    if (!isnull($stline1)) {
    variable1 = replace($stline1,"\'","");
    }
    if (!isnull($stline2)) {
    variable2 = replace($stline2,"\'","");
    }
    $curentrecno := $curentrecno;
    $caudrecno := $caudrecno;
    $maudrecno := $maudrecno;
    $stline1 := variable1;
    $stline2 := variable2;
    $stline3 := $stline3;
    $stline4 := $stline4;
    $city := $city;
    $state := $state;
    $zipcode := $zipcode;
    }

    But the issue I am hitting currently is a buffer error when reformating this data. Why is this simple replace adding so much data and then causing my buffer to be exceeded. See log for graph failure below:
    FATAL [WatchDog] - !!! Fatal Error !!! - graph execution is aborting
    ERROR [WatchDog] - Node REFORMAT0 finished with status: ERROR caused by: The size of data buffer is only 262144. Set appropriate parameter in defautProperties file.
    DEBUG [WatchDog] - Node REFORMAT0 error details:
    java.lang.RuntimeException: The size of data buffer is only 262144. Set appropriate parameter in defautProperties file.
    at org.jetel.data.StringDataField.serialize(StringDataField.java:378)
    at org.jetel.data.DataRecord.serialize(DataRecord.java:450)
    at org.jetel.graph.DirectEdge.writeRecord(DirectEdge.java:238)
    at org.jetel.graph.Edge.writeRecord(Edge.java:342)
    at org.jetel.graph.Node.writeRecord(Node.java:678)
    at org.jetel.component.Reformat.execute(Reformat.java:198)
    at org.jetel.graph.Node.run(Node.java:371)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:595)
    Caused by: java.nio.BufferOverflowException
    at java.nio.Buffer.nextPutIndex(Buffer.java:425)
    at java.nio.DirectByteBuffer.putChar(DirectByteBuffer.java:463)
    at org.jetel.data.StringDataField.serialize(StringDataField.java:375)
    ... 9 more
    WARN [WatchDog] - Interrupted node: REFORMAT0
    WARN [WatchDog] - Interrupted node: delimited_addr
    INFO [WatchDog] - ----------------------** Final tracking Log for phase [1] **---------------------
    INFO [WatchDog] - Time: 29/04/08 18:34:04
    INFO [WatchDog] - Node Status Port #Records #KB Rec/s KB/s
    INFO [WatchDog] - ----------------------------------------------------------------------------------
    INFO [WatchDog] - REFORMAT0 RUNNING
    INFO [WatchDog] - %cpu:15.3 In:0 6533 639 294 28
    INFO [WatchDog] - Out:0 6532 779157 294 65578
    INFO [WatchDog] - delimited_addr RUNNING
    INFO [WatchDog] - %cpu:24.7 In:0 6532 779157 294 65600
    INFO [WatchDog] - ---------------------------------** End of Log **--------------------------------

    I have 4.5 million records to process through and I can't even make it through 6500 without exceeding the buffer. Is something incorrect in my reformat or is the answer to just increase my buffer? But what would be reasonable?
  • Avatar
    avackova
    0
    Comment actions Permalink
    Hi,
    it seams that you have found a bug: replace function not only replaces required text but append it to the previous result. To the fix you can use following transformation
    import org.jetel.component.DataRecordTransform;
    import org.jetel.data.DataRecord;
    import org.jetel.exception.TransformException;


    public class Transform extends DataRecordTransform {

    @Override
    public boolean transform(DataRecord[] arg0, DataRecord[] arg1)
    throws TransformException {
    if (defaultTransform(arg0, arg1)){
    arg1[0].getField("stline1").setValue(arg0[0].getField("stline1").toString().replace("\'", ""));
    arg1[0].getField("stline2").setValue(arg0[0].getField("stline2").toString().replace("\'", ""));
    return true;
    }
    return false;
    }

    }
  • Avatar
    wmdbowler
    0
    Comment actions Permalink
    I added the code you provided to the beginning of my reformat and I get this exception error:

    08:07:31,630 DEBUG [WatchDog] - initializing nodes:
    org.jetel.interpreter.ParseException: Encountered "org" at line 2, column 8.
    Was expecting one of:
    <STRING_LITERAL> ...
    "\'" ...

    at org.jetel.interpreter.TransformLangParser.generateParseException(TransformLangParser.java:5263)
    at org.jetel.interpreter.TransformLangParser.jj_consume_token(TransformLangParser.java:5142)
    at org.jetel.interpreter.TransformLangParser.ImportSource(TransformLangParser.java:453)
    at org.jetel.interpreter.TransformLangParser.CompilationUnit(TransformLangParser.java:303)
    at org.jetel.interpreter.TransformLangParser.Start(TransformLangParser.java:216)
    at org.jetel.component.WrapperTL.init(WrapperTL.java:154)
    at org.jetel.component.RecordTransformTL.init(RecordTransformTL.java:77)
    at org.jetel.component.RecordTransformFactory.createTransform(RecordTransformFactory.java:99)
    at org.jetel.component.Reformat.init(Reformat.java:240)
    at org.jetel.graph.Phase.init(Phase.java:158)
    at org.jetel.graph.runtime.WatchDog.executePhase(WatchDog.java:565)
    at org.jetel.graph.runtime.WatchDog.call(WatchDog.java:155)
    at org.jetel.graph.runtime.WatchDog.call(WatchDog.java:70)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
    at java.util.concurrent.FutureTask.run(FutureTask.java:123)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
    at java.lang.Thread.run(Thread.java:595)
    08:07:31,693 ERROR [WatchDog] - org.jetel.interpreter.ParseException: Encountered "org" at line 2, column 8.
    Was expecting one of:
    <STRING_LITERAL> ...
    "\'" ...

    I thought it was related to the "\'" we are searching for in the replace so I just changed to a character (you'll see below) and it still errors with the same exception. Here is my reformat configuration now:


    <Phase number="1">
    <Node enabled="enabled" guiHeight="0" guiName="Reformat" guiWidth="0" guiX="403" guiY="136" id="REFORMAT0" type="REFORMAT">
    <attr name="transform"><![CDATA[//#TL
    import org.jetel.component.DataRecordTransform;
    import org.jetel.data.DataRecord;
    import org.jetel.exception.TransformException;

    public class Transform extends DataRecordTransform {

    @Override
    public boolean transform(DataRecord[] arg0, DataRecord[] arg1)
    throws TransformException {
    if (defaultTransform(arg0, arg1)){
    arg1[0].getField("stline1").setValue(arg0[0].getField("stline1").toString().replace(t, ""));
    arg1[0].getField("stline2").setValue(arg0[0].getField("stline2").toString().replace(t, ""));
    return true;
    }
    return false;
    }
    }

    $curentrecno := $curentrecno;
    $caudrecno := $caudrecno;
    $maudrecno := $maudrecno;
    $stline1 := $stline1;
    $stline2 := $stline2;
    $stline3 := $stline3;
    $stline4 := $stline4;
    $city := $city;
    $state := $state;
    $zipcode := $zipcode;
    }]]></attr>
    </Node>
  • Avatar
    wmdbowler
    0
    Comment actions Permalink
    After a few attempts and changing the suggested code to include all steps of the transformation I was able to get this to work. Here was the final code:


    import java.util.*;
    import org.jetel.data.*;
    import org.jetel.graph.*;
    import org.jetel.metadata.*;
    import org.jetel.component.*;
    import org.jetel.exception.*;
    import org.jetel.data.sequence.*;
    import org.jetel.tlfunction.*;
    import org.jetel.component.DataRecordTransform;
    import org.jetel.data.DataRecord;
    import org.jetel.exception.TransformException;

    public class Transform extends DataRecordTransform {

    //Override
    public boolean transform(DataRecord[] arg0, DataRecord[] arg1)
    throws TransformException {
    if (defaultTransform(arg0, arg1)){
    arg1[0].getField("stline1").setValue(arg0[0].getField("stline1").toString().replace("\'", ""));
    arg1[0].getField("stline2").setValue(arg0[0].getField("stline2").toString().replace("\'", ""));
    return true;
    }
    return false;
    }
    }

    public class TransformTransformREFORMAT0 extends DataRecordTransform {


    // CONSTANTS definition of input fields
    private final static int IN0_CURENTRECNO = 0;
    private final static int IN0_CAUDRECNO = 1;
    private final static int IN0_MAUDRECNO = 2;
    private final static int IN0_STLINE1 = 3;
    private final static int IN0_STLINE2 = 4;
    private final static int IN0_STLINE3 = 5;
    private final static int IN0_STLINE4 = 6;
    private final static int IN0_CITY = 7;
    private final static int IN0_STATE = 8;
    private final static int IN0_ZIPCODE = 9;

    // CONSTANTS definition of output fields
    private final static int OUT0_CURENTRECNO = 0;
    private final static int OUT0_CAUDRECNO = 1;
    private final static int OUT0_MAUDRECNO = 2;
    private final static int OUT0_STLINE1 = 3;
    private final static int OUT0_STLINE2 = 4;
    private final static int OUT0_STLINE3 = 5;
    private final static int OUT0_STLINE4 = 6;
    private final static int OUT0_CITY = 7;
    private final static int OUT0_STATE = 8;
    private final static int OUT0_ZIPCODE = 9;

    /**
    * Initializes reformat class/function. This method is called only once at then
    * beginning of transformation process. Any object allocation/initialization should
    * happen here.
    */
    public boolean init() throws ComponentNotReadyException {
    return true;
    }

    /**
    * Performs reformat of source records to target records.
    * This method is called as one step in transforming flow of
    * records.
    */
    public boolean transform(DataRecord[] inputRecords, DataRecord[] outputRecords) throws TransformException {
    try {
    // user's code STARTs from here !

    ((LongDataField)outputRecords[0].getField(OUT0_CURENTRECNO)).setValue( (((LongDataField)inputRecords[0].getField(IN0_CURENTRECNO)).getLong()));
    ((LongDataField)outputRecords[0].getField(OUT0_CAUDRECNO)).setValue( (((LongDataField)inputRecords[0].getField(IN0_CAUDRECNO)).getLong()));
    ((LongDataField)outputRecords[0].getField(OUT0_MAUDRECNO)).setValue( (((LongDataField)inputRecords[0].getField(IN0_MAUDRECNO)).getLong()));
    (outputRecords[0].getField(OUT0_STLINE1)).setValue( (inputRecords[0].getField(IN0_STLINE1).toString()));
    (outputRecords[0].getField(OUT0_STLINE2)).setValue( (inputRecords[0].getField(IN0_STLINE2).toString()));
    (outputRecords[0].getField(OUT0_STLINE3)).setValue( (inputRecords[0].getField(IN0_STLINE3).toString()));
    (outputRecords[0].getField(OUT0_STLINE4)).setValue( (inputRecords[0].getField(IN0_STLINE4).toString()));
    (outputRecords[0].getField(OUT0_CITY)).setValue( (inputRecords[0].getField(IN0_CITY).toString()));
    (outputRecords[0].getField(OUT0_STATE)).setValue( (inputRecords[0].getField(IN0_STATE).toString()));
    (outputRecords[0].getField(OUT0_ZIPCODE)).setValue( (inputRecords[0].getField(IN0_ZIPCODE).toString()));

    // user's code ENDs here !
    } catch(Exception e) {
    throw new TransformException("Error in extern transformation class " + TransformTransformREFORMAT0.class.getName() + ": " + e.getMessage());
    }
    return true;
    }

    /**
    * Method called at the end of transformation process. No more
    * records will be processed. The implementing class should release
    * any resource reserved during init() or runtime at this point.
    */
    public void finished() {

    }
    }
    //end of transform class
  • Avatar
    avackova
    0
    Comment actions Permalink
    Try node:
    <Node id="REFORMAT0" transformURL="Transform.java" type="REFORMAT"/>

    with Transform.java:
    import org.jetel.component.DataRecordTransform;
    import org.jetel.data.DataRecord;
    import org.jetel.exception.TransformException;


    public class Transform extends DataRecordTransform {

    public boolean transform(DataRecord[] arg0, DataRecord[] arg1)
    throws TransformException {
    if (defaultTransform(arg0, arg1)){
    arg1[0].getField("stline1").setValue(arg0[0].getField("stline1").toString().replace("\'", ""));
    arg1[0].getField("stline2").setValue(arg0[0].getField("stline2").toString().replace("\'", ""));
    return true;
    }
    return false;
    }

    }

    It should do all you need:
    defaultTransform(arg0, arg1) maps all input fields to all output fields and, if it is successful returns true
    if body makes the replacement you need
  • Avatar
    wmdbowler
    0
    Comment actions Permalink
    OK, great thanks for that information. I have changed to your recommended code and it works as expected. The code did error when the @Override was included so I removed it and I am getting my expected results so it does not seem to be required.

Please sign in to leave a comment.