Handling Writables fully qualified name changes in Hadoop SequenceFile

六眼飞鱼酱① 提交于 2019-12-05 07:53:10

The org.apache.hadoop.io.WritableName class mentioned in the exception stack trace has some useful methods.

From the doc:

Utility to permit renaming of Writable implementation classes without invalidiating files that contain their class name.

// Add an alternate name for a class.
public static void addName(Class writableClass, String name)

In your case you could call this before reading from your SequenceFiles:

WritableName.addName(com.vertebrates.fishes.FishWritable.class, "com.mammals.fishes.FishWritable");

This way, when attempting to read a com.mammals.fishes.FishWritable from an old SequenceFile, the new com.vertebrates.fishes.FishWritable class will be used.

PS: Why was the fish in the mammals package in the first place? ;)

Looking at the spec for sequencefile it seems clear there isn't any consideration for alternative class names.

If I wasn't in a position to re-write the data, one more option is to have com.mammals.fishes.writable extend com.vertebrates.fishes.writable and just annotate it as deprecated so nobody accidentally adds code to the empty wrapper. After a long enough time, the data written with the old class will be obsoleted and you'll be able to safely delete the mammals class.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!