Customer Portal

Hadoop Reader - Reading log file (JASON format)

Comments 2

  • Avatar
    Lukas Cholasta
    0
    Comment actions Permalink
    Hi

    The HadoopReader is designed to read sequence files, for more information please refer to it's documentation. If you want to read a JSON file, please use the JSONReader. Set the File URL in the JSONReader like this 'hdfs://HADOOP1/<URL>', where HADOOP1 is the Hadoop Connection ID, which in general is the string in uppercase you entered as connection name (when you were creating it). Just to be sure, you can open the edit dialog of the HadoopReader and click the downward arrow on the Hadoop connection line, you should be able to see the connection ID there.

    If the solution above doesn't work, please send me the following information.

      1. Is this a part of the CloverETL Server project? If so, what version is the CloverETL Server?
      2. What version of the Designer are you using?
      3. What version of Hadoop Server is used?
      4. If you now get different error message, please send me a screenshot and/or the log, where the error is visible.


    Thank you.
  • Avatar
    dpavlis
    0
    Comment actions Permalink
    You may also use JSONExtract if you are processing large (>>100kB) files. JSONReader is using XPath queries for extracting data (need to build DOM representation in memory first). JSONExtract is using SAX style parsing (events) which allows stream-based parsing.
    Which one to use will depend on your particular case. Both can be (actually all CloverETL readers) set to read data directly from HDFS (as described above).

Please sign in to leave a comment.