Customer Portal

XML Parser failed due to illegal unicode chareters

Comments 7

  • Avatar
    avackova
    0
    Comment actions Permalink
    Can you describe the problem more? What did you try and what did go wrong?
  • Avatar
    rstark
    0
    Comment actions Permalink
    An example would be a control character such as Acknowledge "/u0006" which is not legal in XML.
  • Avatar
    rstark
    0
    Comment actions Permalink

    An example would be a control character such as Acknowledge "/u0006" which is not legal in XML.

    "rstark"


    I have tried multiple regular expression to remove these prior to parsing the xml.
  • Avatar
    avackova
    0
    Comment actions Permalink
    Hello,
    following expression works for me:
    regex = "([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007f]+)"
  • Avatar
    rstark
    0
    Comment actions Permalink
    Agata,

    The provided regex is still not working. Can you send me the Clover syntax e.g. replace(<field>, regex, ''); ? Also what other options do I have besides replacing explict chars or ranges?

    Thanks,
    Ryan
  • Avatar
    avackova
    0
    Comment actions Permalink
    Hello Ryan,
    following code replaces invalid parts with empty strings:
    //#CTL2

    // Transforms input record into output record.
    function integer transform() {
    string regex = "([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007f]+)";
    $0.Data = replace($Data,regex,"");

    return ALL;
    }

    I've also developed the transformation in java, that removes invalid element's value and sends it together with the number of invalid record and the name of invalid element to another output port of Reformat (see attached class).
  • Avatar
    rstark
    0
    Comment actions Permalink
    Thanks Agata this worked.

Please sign in to leave a comment.