Avro Performance Jun 11th, 2013 | Comments Here are some un-scientific results on how Avro performs with various codecs, as well as vs JSON-lzo files in Hive and Impala. This testing was done using a 100 million row table that was generated using random two strings and an integer. 1 2 3 4 5 6 7 8 | Format | Codec | Data Size | Hive count(1) time | Impala count(1) time |-----------|----------------|---------------|--------------------|---------------------- | JSON | null | 686,769,821 | not tested | N/A | JSON | LZO | 285,558,314 | 79s | N/A | JSON | Deflate (gzip) | 175,878,038 | not tested | N/A | Avro | null | 301,710,126 | 40s | .4s | Avro | Snappy | 260,450,980 | 38s | .9s | Avro | Deflate (gzip) | 156,550,144 | 64s | 2.8s So the winner appears to be Avro/Snappy or uncompressed Avro.