Customer Portal

How to read documents separation of binary data

Comments 9

  • Avatar
    mzila
    0
    Comment actions Permalink
    I can't find out what is a reason of the problem. Could you please send me an example of data file and graph file (or metadata at least) to milan.zila at javlinconsulting.cz

    Milan
  • Avatar
    mzila
    0
    Comment actions Permalink
    Your problem is probably caused by too large data in blob field. Try to increase following values in file:
    cloveretl.engine.jar\org\jetel\data\defaultProperties

    Record.MAX_RECORD_SIZE = 8192
    DataParser.FIELD_BUFFER_LENGTH = 512
    DataFormatter.FIELD_BUFFER_LENGTH = 512

    Default values are optimized for speed.

    Milan
  • Avatar
    hwhwhw
    0
    Comment actions Permalink
    Try the two days, without succe :(

    I modified defaultProperties
    +++++++++++++++++++++++++++++++++++++++
    Record.MAX_RECORD_SIZE = 2192000
    DataParser.FIELD_BUFFER_LENGTH = 2192000
    DataFormatter.FIELD_BUFFER_LENGTH = 2192000
    ++++++++++++++++++++++++++++++++++++++++++

    blobfile.grf first run, and then run blobfiletoblobfile.grf and fileblob.grf


    run blobfile.grf output files blobfile.txt,run blobfiletoblobfile.grf output files test_blobfile.txt .
    run fileblob.grf the test1_1 table did not record.
    Please help me .

    The source table t_blob and target table test1_1 is structured as follows

    db2 => describe table t_blob

    Column Type Type
    name schema name Length Scale Nulls
    ------------------------------ --------- ------------------ -------- ----- ------
    F1 SYSIBM DECIMAL 12 0 Yes
    F2 SYSIBM DECIMAL 10 2 Yes
    F3 SYSIBM VARCHAR 50 0 Yes
    F4 SYSIBM BLOB 8388608 0 Yes
    F5 SYSIBM DATE 4 0 Yes

    5 record(s) selected.

    db2 => describe table test1_1

    Column Type Type
    name schema name Length Scale Nulls
    ------------------------------ --------- ------------------ -------- ----- ------
    FIELD0 SYSIBM DECIMAL 12 0 Yes
    FIELD1 SYSIBM DECIMAL 12 2 Yes
    FIELD2 SYSIBM VARCHAR 50 0 Yes
    FIELD3 SYSIBM BLOB 8388608 0 Yes
    FIELD4 SYSIBM DATE 4 0 Yes

    5 record(s) selected.

    db2 => select f1,f2,f3,length(f4) from t_blob

    F1 F2 F3 F4
    -------------- ------------ -------------------------------------------------- -----------
    2. 88.67 wzhy 515569
    3. 987.40 def 1666156
    1. 100.23 abc 726486

    3 record(s) selected.

    Details of fat to your mailbox a.
  • Avatar
    kaamoss
    0
    Comment actions Permalink
    I'm experiencing a similar problem with CloverETL right now. I'm parsing a huge csv file, and one of the columns, called "options"(the options on a car) is an incredibly long string(1060 chars) with options separated by ';' the field delimiter is ','. I did as you suggested hwhwhw and changed my defaultProperties and repacked the jar. Unfortunately I still get the same problem...Is this column just too huge to deal with? Here is the error I am currently getting with a simple Delimited Data Reader -> Broadcast -> Trash:


    INFO [WatchDog] - Sucessfully started all nodes in phase!
    FATAL [WatchDog] - !!! Fatal Error !!! - graph execution is aborting
    ERROR [WatchDog] - Node DELIMITED_DATA_READER0 finished with status: ERROR caused by: java.io.IOException:Field too long or can not find delimiter [,]
    when parsing record #2 field options
    DEBUG [WatchDog] - Node DELIMITED_DATA_READER0 error details:
    java.lang.RuntimeException: java.io.IOException:Field too long or can not find delimiter [,]
    when parsing record #2 field options
    at org.jetel.data.parser.DelimitedDataParser.parseNext(DelimitedDataParser.java:437)
    at org.jetel.data.parser.DelimitedDataParser.getNext0(DelimitedDataParser.java:170)
    at org.jetel.data.parser.DelimitedDataParser.getNext(DelimitedDataParser.java:166)
    at org.jetel.util.MultiFileReader.getNext(MultiFileReader.java:229)
    at org.jetel.component.DelimitedDataReader.execute(DelimitedDataReader.java:148)
    at org.jetel.graph.Node.run(Node.java:366)
    at java.lang.Thread.run(Unknown Source)
    Caused by: java.io.IOException: Field too long or can not find delimiter [,]

    at org.jetel.data.parser.DelimitedDataParser.parseNext(DelimitedDataParser.java:411)
    ... 6 more


    I'd appreciate any help that anyone could give me, as I'm new to CloverETL . Thanks
  • Avatar
    hwhwhw
    0
    Comment actions Permalink
    If it is read very large string, the following parameters by modifying the problem can be solved.
    +++++++++++++++++++++++++++++++
    Record.MAX_RECORD_SIZE = 65536
    DataParser.FIELD_BUFFER_LENGTH = 32768
    DataFormatter.FIELD_BUFFER_LENGTH = 32768
    DEFAULT_INTERNAL_IO_BUFFER_SIZE = 131072
    +++++++++++++++++++++++++++++++
    This should be ok for record sizes up to 65KBs.

    I found that the value of these parameters is a set of rules
  • Avatar
    kaamoss
    0
    Comment actions Permalink
    Thanks for the advice, I tried using the values you suggested, and still got the error, so I tried some absurdly high values. Unfortunately I'm still having no luck. Here are the values that I'm currently using:


    Record.MAX_RECORD_SIZE = 10240000
    DataParser.FIELD_BUFFER_LENGTH = 10485760
    DataFormatter.FIELD_BUFFER_LENGTH = 10485760
    DEFAULT_INTERNAL_IO_BUFFER_SIZE = 25600000


    10485760 Bytes is 10240KB
    25600000 Bytes is 25000KB (*roughly 2.5x the MAX_RECORD_SIZE)
    10240000 Bytes is 10000KB

    I read in the comments in the defaultProperties file that "(java stores strings in unicode - 16bits per character)"

    Since my graph breaks on a field named options, which is a string looking like: {Option One; Option Two; Option Another; And So Forth; I think you get the idea,} I know that the end delimiter(',') is at the end of the field for each record, so it must be choking on the size of the string. But the string ranges in length from 1600-2000 chars long, which should only be (3200 Bytes - 4000 Bytes).

    Thanks for your help.
  • Avatar
    hwhwhw
    0
    Comment actions Permalink
    If the problem still exists, you can release the graphFile and fmtFile out that all parties analysis.
  • Avatar
    kaamoss
    0
    Comment actions Permalink
    I'm still having the same issue, so I threw the files in question up on my webserver. If anyone could take a look and let me know, I'd greatly appreciate it.

    Graph
    Metadata
    Data
  • Avatar
    hwhwhw
    0
    Comment actions Permalink
    According to my observation, vehicleImport.fmt.xml definition of the number of fields for 86, but I found vehicleImport.csv of records from the 17th, a comma-separated fields of less than 86, which may cause the procedures in the absence of from a sufficient number of field values, did not stop to read the record of 17, continue to read the records of 18 fields, precise speaking from the 18th of the record "Vehicle_ID" value out of the first to write 17 record "Window_Sticker_Last_Published" field, and the same token, that the procedures in Article 17 of the record 86 of the 18 fields into the records of all remaining field values.
    So I think the problem from vehicleImport.csv of data.

Please sign in to leave a comment.