As you embark on converting vast quantities of JSON to Avro, you soon discover that things are not as simple as they seem. Here is how it might happen.
A quick Google search eventually leads you to the avro-tools jar, and you find yourself attempting to convert some JSON, such as:
1 2 |
|
Having read Avro documentation and being the clever being that you are, you start out with:
1 2 3 4 5 6 7 8 9 |
|
A brief moment of disappointment is followed by the bliss of
enlightment: Duh, the “middle” element needs a default! And so you try
again, this time having tacked on a default to the definition of “middle”, so it looks like {"name":"middle","type":"string","default":""}
:
1 2 3 4 5 6 7 8 |
|
Why doesn’t this work? Well… You don’t understand Avro, as it turns out. You see, JSON is not Avro, and therefore the wonderful Schema Resolution thing you’ve been reading about does not apply.
But do not despair. I wrote a tool just for you:
json2avro. It does exactly what you want:
1 2 3 4 5 |
|
No errors, and we have an output.avro
file, let’s see what’s in it by using the aforementioned avro-tools:
1 2 3 |
|
Let me also mention that json2avro is written in C and is fast, it supports Snappy, Deflate and LZMA compression codecs, lets you pick a custom block size and is smart enough to (optionally) skip over lines it cannot parse.
Enjoy!