问题
New user of hadoop and mapreduce, i would like to create a mapreduce job to do some measure on images. this why i would like to know if i can passe an image as input to mapreduce?if yes? any kind of example
thanks
回答1:
No.. you cannot pass an image directly to a MapReduce job as it uses specific types of datatypes optimized for network serialization. I am not an image processing expert but I would recommend to have a look at HIPI framework. It allows image processing on top of MapReduce framework in a convenient manner.
Or if you really want to do it the native Hadoop way, you could do this by first converting the image file into a Hadoop Sequence file and then using the SequenceFileInputFormat to process the file.
回答2:
Yes, you can totally do this.
With the limited information provided, I can only give you a very general answer.
Either way, you'll need to: 1) You will need to write a custom InputFormat that instead of taking chunks of files in HDFS locations (like TextInputFormat and SequenceFileInputFormat do), it actually passes to each map task the Image's HDFS path name. Reading the image from that won't be too hard.
If you plan to have a Reduce phase in which Images are passed around through the framework, you'll need to: 2) You will need to make an "ImageWritable" class that implements Writable (or WritableComparable if you're keying on the image). In your write() method, you'll need to serialize your image to a byte array. When you do this, what I would do is first write to the output an int/long which is the size of the array you're going to write. Lastly, you'll want to write the array as bytes.
In your read() method, you'll read an int/long first (which will describe the payload of the image), create an byte array of this size, and then read the bytes fully into your byte array up to the length of your int/long that you captured.
I'm not entirely sure what you're doing, but that's how I'd go about it.
来源:https://stackoverflow.com/questions/16368512/create-mapreduce-job-with-an-image-as-an-input