问题
I'm new to this Apache Avro(serialization framework). I know what serialization is but why there are separate frameworks lik avro, thrift, protocol buffers and
Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java serializatio api's.
What is the meaning of below phrase "does not require running a code-generation program when a schema changes" in avro or in any other serializatio framework.
Please help me to understand all these!!
回答1:
Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java serializatio api's.
I would assume you can use Java Serialization unless you know otherwise.
The main reasons not to use it are
- you know there is a performance problem.
- you need to exchange data across languages. Java Serialization is only for Java.
does not require running a code-generation program when a schema changes
I am guessing this means it can read serialized data with an older or newer model without having to re-generate and compile the code. i.e. it is tolerant of changes in the model.
BTW: As the data models I work with are usually a) very simple b) require maximum performance, I write my own Serialization without using a framework (or write my own framework) This is fine provided your model is very simple and won't change often.
In short, unless you know you can't, try Java Serialization first.
A comparison I did on different Serialization Methods
回答2:
1. The problem with java serialization is that it's not agnostic of your code. Meaning that is tightly coupled to the structure of you classes. Other serialization frameworks provide you with some flexibility/control that it's useful to bypass this kind of situations. Even though there is a way in java standard mechanism to control serialization through the writeObject readObject methods, it is a problem that other fwks have addressed in a more elegant way.
Second, you cannot interexchange the output of your java serialization with other language - platforms.
Last, but not least. Java serialization does not produce the more compact result possible, which might lead to performance degradation if you perform things like transfer data over a network. Other protocols (like Oracle's POF or protocol buffers) are more optimized to produce an smaller output.
2. Regarding your second question I guess that what that means is that you don't need to run any precompile job that generates code in the case that the structure of your serialized classes changes. I personally hate frameworks that force some kind of compile-time code generation. I hate the hassle of having to even have to look at generated code, but that is just me and my ocd.
回答3:
Two principle things Avro does well: Hadoop's MapReduce and communication protocol structures. I use it for MapReduce where I put numerous data instances in a single file all conforming to a particular schema; each record is stored very efficiently and markers delineate each individual record. Hadoop also uses it to communicate data between the Map and Reduce tasks. Much better than storing field names alongside data. These files are easy to split into multiple parts for processing in a distributed computing environment. Since the schema is embedded into the file, a reader doesn't have to know what the data looks like. Avro is not tied to any language and there are several language APIs for reading Avro data. If you want to write out a single complex object, then Java's serialization OR Avro will work. If you want more power and efficiency and are using millions of individual instances, then Avro is a good alternative. I am sure you can do this with the Java API, but why work that hard.
There are mechanisms to evolve schemas thru the schema resolution rules. There are also tools that will turn your java objects into schemas for you.
The best place to start is here: http://avro.apache.org/docs/current/spec.html It may take a couple of reads to get the gist. Read it again after trying to use some of the tools that come with the Avro package. Avro takes a while to get the hang of. JSON is only used as a data specification language it isn't used to store the data. You can generate schemas using the API or using a JSON file. Lots of flexibility and enough rope to easily get into trouble with -- well worth it tho.
来源:https://stackoverflow.com/questions/14257505/data-serialization-framework