Metadata Question

CloverDX Customer Portal
Forum
Metadata Question

anweston

May 10, 2010 00:00

Answered

Heya,

We have a series of files that we are running through Clover. We have the metadata set up so that the recordDelimiter="\n" and the DataReader has dataPolicy="Controlled". Every so often, one of the source files does not have a "\n" for the final line, so Clover is dropping this row. Is there a way for me to indicate that the recordDelimiter is either "\n" or end-of-file? I can see from the wiki you can have multiple delimiters (e.g. recordDelimiter="\n\\|\r\n"), but I do not see this case handled.

Thanks,
Anna

Comments 8

avackova

May 11, 2010 07:02
0

Comment actions Permalink
Hello Anna,
set eofAsDelimiter="true" on the last field.
anweston

May 11, 2010 19:57
0

Comment actions Permalink
Heya,

This almost works the way I want it to. We are parsing the source file(s) with each row as a single field, then breaking the "field" into separate fields. When I have the FMT as:

<?xml version="1.0" encoding="UTF-8"?>
<Record name="RECORD_ONE_FIELD_RECORD_" type="delimited" recordDelimiter="\n">
<Field name="FIELD_INPUT_ROW_NUM" type="numeric" nullable="false" auto_filling="source_row_count" />
<Field name="FIELD_ROW" type="string" nullable="false" eofAsDelimiter="true" />
</Record>

any source files that do not have an "\n" on the final row of data parses to the correct number of rows. BUT, a source file that does have a "\n" on its final data row (the last row is just the end-of-file character) now parses an extra row with one empty field.

Is there any way to configure the FMT so that both cases will parse the correct number of rows?

Thanks,
Anna
avackova

May 12, 2010 09:07
0

Comment actions Permalink
Hello Anna,
when I added missing delimiter for FIELD_INPUT_ROW_NUM field, CloverETL 2.9.2 reads data properly.
anweston

May 12, 2010 18:03
0

Comment actions Permalink
Heya,

Turns out the file I was using as a test was messed up (there were some dos-to-linux end-of-line stuff going on) - it now works the way I want it to. :-)

I did not have to alter the FMT, though. What do you mean by "I added missing delimiter for FIELD_INPUT_ROW_NUM"? We have defined the delimiter in the <Record> tag as 'recordDelimiter="\n"' Am I missing something? just asking in case there's something that's working that shouldn't be and could cause an issue later on..

Thanks,
Anna
avackova

May 14, 2010 08:04
0

Comment actions Permalink
Hello Anna,
the field has no delimiter, neither default field delimiter is specified. The checkConfig method reports “Graph configuration is invalid (Field delimiter for the field 'FIELD_INPUT_ROW_NUM' in the record element 'RECORD_ONE_FIELD_RECORD_' not found!).“ I can’t guarantee, that graph with such metadata will always work.
anweston

May 18, 2010 22:18
0

Comment actions Permalink
Heya,

Interesting.

I am now testing an upgrade to 2.9.2 with this test case. We run with "-skipcheckconfig" because we are auto-generating our graph and we get a small performance gain by not running that. The FMT I provided runs just fine with "-skipcheckconfig" on.

Omit the "-skipcheckconfig" and I get the error you report.

ERROR - Field delimiter for the field 'FIELD_INPUT_ROW_NUM' in the record element 'RECORD_ONE_FIELD_RECORD_' not found!

If I add delimiter="%" to the <Field> tag or fieldDelimiter="%" to the <Record> tag, it runs with no warnings.

This delimiter is bogus - The first field (FIELD_INPUT_ROW_NUM) is a auto-generated field, so there is only one field in the file. If I choose a character that might be in my file, it may try to parse as two fields. Is this correct behaviour?

Thanks,
Anna
avackova

May 19, 2010 07:06
0

Comment actions Permalink
Hello Anna,
if you add delimiter to the <Field> tag it is not taken into account when reading the file, so it can't cause any problem. But if you use the metadata somewhere else for formatting the data, the graph may fail.
anweston

May 19, 2010 15:58
0

Comment actions Permalink
Heya,

OK, I will try adding "delimiter" to the <Field> tag in case we decide to run graphs without "-skipcheckconfig" I was concerned because I found that using the fieldDelimiter="," to the <Record> tag (and there are commas in the file) seemed to cause parsing errors.

Thanks,
Anna

Please sign in to leave a comment.

Quick links

Access my products

SUPPORT & SERVICES

Community

RESOURCES