I have a huge file which has multiple complete XMLs joined one after another. I want to work on these XMLs individually, so I am looking for a way to extract these. I tried 'Universal Reader' with delimiter '<?' and it worked fine except for the fact that each extracted XML then had the '<?' part missing (as it was marked as delimited). Can someone please suggest the right approach to do this task.
I am using the latest CloverETL community edition.
Input:
<?xml version="1.0" encoding="UTF-8"?>
XML1...
<?xml version="1.0" encoding="UTF-8"?>
XML2...
<?xml version="1.0" encoding="UTF-8"?>
XML3...
Output:
<?xml version="1.0" encoding="UTF-8"?>
XML1...
and
<?xml version="1.0" encoding="UTF-8"?>
XML2...
and
<?xml version="1.0" encoding="UTF-8"?>
XML3...
I am using the latest CloverETL community edition.
Input:
<?xml version="1.0" encoding="UTF-8"?>
XML1...
<?xml version="1.0" encoding="UTF-8"?>
XML2...
<?xml version="1.0" encoding="UTF-8"?>
XML3...
Output:
<?xml version="1.0" encoding="UTF-8"?>
XML1...
and
<?xml version="1.0" encoding="UTF-8"?>
XML2...
and
<?xml version="1.0" encoding="UTF-8"?>
XML3...
-
Hi,
The easiest solution would probably be a Reformat component after your UniversalDataReader which would put "<?" back. Something like:
$out.0.field1 = "<?" + $in.0.field1;
I can not see any other way as your input file is not a valid XML file. If you knew the length of each XML part, you could use fixed-length metadata instead of the delimited one. Or if each XML part was on separate line, you could use \r\n as a delimiter.
Nevertheless, the Reformat workaround should help you if the two missing characters are the only flaw in your output.
Regards, -
Aren't you guys the best support one can dream of.
Thank you so much.
Please sign in to leave a comment.
Comments 2