问题
I am running sort example on Hadoop using RandomWriter
function. This particular function writes 10 gig (by default) of random data/host to DFS using Map/Reduce.
bin/hadoop jar hadoop-*-examples.jar randomwriter <out-dir>.
Can anyone please tell how can I change the size 10GB of RandomWriter
function?
回答1:
That example have some configurable parameters. These parameters are given to jar in a config file. To run use it as (suppling a config file)
bin/hadoop jar hadoop-*-examples.jar randomwriter <out-dir> [<configuration file>]
or run it with parameters as
bin/hadoop jar hadoop-*-examples.jar randomwriter
-Dtest.randomwrite.bytes_per_map=<value>
-Dtest.randomwriter.maps_per_host=<value> <out-dir> [<configuration file>]
For details about all configurable parameters see : https://wiki.apache.org/hadoop/RandomWriter
回答2:
On Hadoop 2 (at least on the 2.7.2 version), the properties are now mapreduce.randomwriter.mapsperhost
and mapreduce.randomwriter.bytespermap
.
You can saw them on http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-examples/src/main/java/org/apache/hadoop/examples/RandomWriter.java?view=markup
So the correct answer on recent Hadoop 2 version is
bin/hadoop jar hadoop-*-examples.jar randomwriter
-Dmapreduce.randomwriter.bytespermap=<value>
-Dmapreduce.randomwriter.mapsperhost=<value> <out-dir> [<configuration file>]
来源:https://stackoverflow.com/questions/22053594/change-the-size-of-random-data-generation-on-hadoop