Customer Portal

S3 and Wildcards

Comments 5

  • Avatar
    Pedro Vazquez Rosario
    0
    Comment actions Permalink
    Hi DTaylor

    Could you please provide us with more information on the behavior of your issue. Also, if possible can you post anonymized graph example. We would like to better understand what might be causing the issue.
  • Avatar
    dtaylor
    0
    Comment actions Permalink
    Sorry for the delay. I'm unfortunately unable to post an anonymized ETL at this time.

    We were accessing a bucket using the S3 protocol with the UniversalDataReader. The ETL needed to read the contents of all files that matched a particular pattern. Previously, we had been using the * character as a part of the pattern and it had matched the files appropriately. At the time of posting, S3 had started regarding the * character as a literal rather than a wildcard, so none of the filenames matched the pattern that I had set up. So, for example, rather than finding all files that matched the pattern 'ABC123*.txt', the S3 protocol started telling UniversalDataReader that there were no files with the name 'ABC123*.txt'. Since UniversalDataReader recognized the wildcard even though S3 did not, the ETL did not error and we had a process that was quietly failing.

    We managed to work around this by accessing the bucket via HTTPS, but that's not ideal when there is a dedicated protocol.
  • Avatar
    svecp
    0
    Comment actions Permalink
    In case, you have a Corporate server you can use component ListFiles to list all available files from an S3 bucket a feed those into UniversalDataWriter. Since version 4.2.0, we're using official Amazon SDK to access S3. Have you encountered this change after upgrade to later version? More details in: https://bug.javlin.eu/browse/CLO-7170.

    overview.png
  • Avatar
    dtaylor
    0
    Comment actions Permalink
    We encountered this issue on 4.3. We do not have Corporate Server unfortunately, so ListFiles was not an option.
  • Avatar
    svecp
    0
    Comment actions Permalink
    I just tried that on new 4.5.0-M2 and got info, algorithm changed in 4.4.0. Would it be possible for you to try a later version?

    s3://***:***@s3.amazonaws.com/cloveretl.svecp/Monitored/cust*.dat - works
    s3://***:***@s3.amazonaws.com/cloveretl.svecp/Monitored/* - works
    s3://***:***@s3.amazonaws.com/* - does not work, because of insufficient privileges

Please sign in to leave a comment.