Hadoop Reader - Reading log file (JASON format)

CloverDX Customer Portal
Forum
Hadoop Reader - Reading log file (JASON format)

bhavinultimate

November 22, 2015 00:00

Answered

Hello,

I have problem with reading log from from HDFS.I have made all successful connection with HDFS. But while reading from Hadoop reader i am getting an error that This not a sequential file .

My Log file is in jason format contains following data
{"timestamp":"1445938229176","total":"340.0","shop_id":"6a7619eb30e81be","items":1113}
{"timestamp":"1445938229176","total":"340.0","shop_id":"6a7619eb30e81be","items":1112}
{"timestamp":"1445938229176","total":"340.0","shop_id":"6a7619eb30e81be","items":1114}
{"timestamp":"1445938229176","total":"340.0","shop_id":"6a7619eb30e81be","items":1167}
{"timestamp":"1445938229176","total":"340.0","shop_id":"6a7619eb30e81be","items":1116}
{"timestamp":"1445938229176","total":"340.0","shop_id":"6a7619eb30e81be","items":1100}

How to process log file with hadoop reader. It would be great if someone give a sample graph processing same log file .

Thanks
Bhavin
Ypoint Analytics

Comments 2

Lukas Cholasta

November 23, 2015 14:55
0

Comment actions Permalink
Hi

The HadoopReader is designed to read sequence files, for more information please refer to it's documentation. If you want to read a JSON file, please use the JSONReader. Set the File URL in the JSONReader like this 'hdfs://HADOOP1/<URL>', where HADOOP1 is the Hadoop Connection ID, which in general is the string in uppercase you entered as connection name (when you were creating it). Just to be sure, you can open the edit dialog of the HadoopReader and click the downward arrow on the Hadoop connection line, you should be able to see the connection ID there.

If the solution above doesn't work, please send me the following information.
Thank you.
dpavlis

November 23, 2015 15:51
0

Comment actions Permalink
You may also use JSONExtract if you are processing large (>>100kB) files. JSONReader is using XPath queries for extracting data (need to build DOM representation in memory first). JSONExtract is using SAX style parsing (events) which allows stream-based parsing.
Which one to use will depend on your particular case. Both can be (actually all CloverETL readers) set to read data directly from HDFS (as described above).

Please sign in to leave a comment.

Quick links

Access my products

SUPPORT & SERVICES

Community

RESOURCES