Why does Hadoop need to introduce these new classes? They just seem to complicate the interface
From Apache documentation page:
Writable
interface is described as
A serializable object which implements a simple, efficient, serialization protocol, based on
DataInput
andDataOutput
.
With this new API, you don't have complications. Serialization process with these new classes is crisp
and compact
.
For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable. Due to this reason, Hadoop framework has come up with one IO classes to replace java primitive data types. e.g. IntWritbale
for int
, LongWritable
for long
, Text
for String
etc.
You can find more details about this topic in Hadoop The definitive guide : 4th Edition