Why does Hadoop need classes like Text or IntWritable instead of String or Integer?

后端 未结 4 1362
名媛妹妹
名媛妹妹 2020-12-23 16:38

Why does Hadoop need to introduce these new classes? They just seem to complicate the interface

4条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-23 17:41

    Some more good info:

    they’ve got two features that are relevant

    they have the “Writable” interface -they know how to write to a DataOutput stream and read from a DataInput stream -explicitly.

    they have their contents updates via the set() operation. This lets you reuse the same value, repeatedly, without creating new instances. It’s a lot more efficient if the same mapper or reducer is called repeatedly: you just create your instances of the writables in the constructor and reuse them

    In comparison, Java’s Serializable framework “magically” serializes objects -but it does it in a way that is a bit brittle and is generally impossible to read in values generated by older versions of a class. the Java Object stream is designed to send a graph of objects back -it has to remember every object reference pushed out already, and do the same on the way back. The writables are designed to be self contained.

    This is from: http://hortonworks.com/community/forums/topic/why-hadoop-uses-default-longwritable-or-intwritable/

提交回复
热议问题