Why does Hadoop need classes like Text or IntWritable instead of String or Integer?

后端 未结 4 1338
名媛妹妹
名媛妹妹 2020-12-23 16:38

Why does Hadoop need to introduce these new classes? They just seem to complicate the interface

4条回答
  •  旧时难觅i
    2020-12-23 17:37

    From Apache documentation page:

    Writable interface is described as

    A serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

    With this new API, you don't have complications. Serialization process with these new classes is crisp and compact.

    For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable. Due to this reason, Hadoop framework has come up with one IO classes to replace java primitive data types. e.g. IntWritbale for int, LongWritable for long, Text for String etc.

    You can find more details about this topic in Hadoop The definitive guide : 4th Edition

提交回复
热议问题