Re-parsing Blob data stored in HDFS imported from Oracle by Sqoop

问题

Using Sqoop I’ve successfully imported a few rows from a table that has a BLOB column.Now the part-m-00000 file contains all the records along with BLOB field as CSV.

Questions:

1) As per doc, knowledge about the Sqoop-specific format can help to read those blob records. So , What does the Sqoop-specific format means ?

2) Basically the blob file is .gz file of a text file containing some float data in it. These .gz file is stored in Oracle DB as blob and imported into HDFS using Sqoop. So how could I be able to get back those float data from HDFS file. Any sample code will of very great use.

回答1:

I see these options.

Sqoop Import from Oracle directly to hive table with a binary data type. This option may limit the processing capabilities outside hive like MR, pig etc. i.e. you may need to know the knowledge of how the blob gets stored in hive as binary etc. The same limitation that you described in your question 1.
Sqoop import from oracle to avro, sequence or orc file formats which can hold binary. And you should be able to read this by creating a hive external table on top of it. You can write a hive UDF to decompress the binary data. This option is more flexible as the data can be processed easily with MR as well especially the avro, sequence file formats.

Hope this helps. How did you resolve?

来源：https://stackoverflow.com/questions/33214368/re-parsing-blob-data-stored-in-hdfs-imported-from-oracle-by-sqoop

标签

MapReduce

blob

HDFS

sqoop

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!