Customer Portal

Emoji Unicode Characters in a File

Comments 3

  • Avatar
    imriskal
    0
    Comment actions Permalink
    Unfortunately, CloverETL does not support emojis by default. No CTL function can help you directly with emojis. Also, I am not sure how the emojis are represented in your files, what encoding is used etc.

    1) Do you have any sample file containing emojis, please? Could you post it here or send via email?
    2) How big your input files are?
    3) Do the input files contain just text or also some structured data?
    4) Are they binary files or plain text files?

    There is a java library that could be useful if you decide to write your own java transformation.
    You can also use find function of CTL and look for regular expressions
  • Avatar
    hneff1
    0
    Comment actions Permalink
    ClientFile_20161102.txt

    Thanks for the response.

    In regard to your questions:

    1) Do you have any sample file containing emojis, please? Could you post it here or send via email? I will attach here.
    2) How big your input files are? Input files can range from KB to a few MB
    3) Do the input files contain just text or also some structured data? The files are structured, typically pipe delimited. The emoji characters have been found in various fields
    Other emoji characters have been found in other fields in other files, and our goal is to just raise an error when any such character is found and return the file to the client to be corrected as they see fit, sent back to us, and reprocessed.
    4) Are they binary files or plain text files? Delimited text files

    Thanks!
    Heather
  • Avatar
    imriskal
    0
    Comment actions Permalink
    Thanks for the responses. Do you really want to focus only on emojis? Are other non-ascii characters allowed?

    If you want to check for all non-ascii characters, we have CTL functions like isAscii(string arg) or even removeNonAscii(string arg) that could be useful.

    If other non-ascii characters are allowed and you want to remove only emojis, I am affraid that the suggested java library or the CTL functions like find(string arg, string regex) or replace(string arg, string regex, string replacement) with regular expressions are the only reasonable options that come to my mind.

Please sign in to leave a comment.