Difference between mapreduce split and spark paritition

做~自己de王妃 提交于 2019-12-23 08:02:04

问题


I wanted to ask is there any significant difference in data partitioning when working with Hadoop/MapReduce and Spark? They both work on HDFS(TextInputFormat) so it should be same in theory.

Are there any cases where the there procedure of data partitioning can differ? Any insights would be very helpful to my study.

Thanks


回答1:


Is any significant difference in data partitioning when working with Hadoop/mapreduce and Spark?

Spark supports all hadoop I/O formats as it uses same Hadoop InputFormat APIs along with it's own formatters. So, Spark input partitions works same way as Hadoop/MapReduce input splits by default. Data size in a partition can be configurable at run time and It provides transformation like repartition, coalesce, and repartitionAndSortWithinPartition can give you direct control over the number of partitions being computed.

Are there any cases where their procedure of data partitioning can differ?

Apart from Hadoop, I/O APIs Spark does have some other intelligent I/O Formats(Ex: Databricks CSV and NoSQL DB Connectors) which will directly return DataSet/DateFrame(more high-level things on top of RDD) which are spark specific.

Key points on spark partitions when reading data from Non-Hadoop sources

  • The maximum size of a partition is ultimately by the connectors,
    • for S3, the property is like fs.s3n.block.size or fs.s3.block.size.
    • Cassandra property is spark.cassandra.input.split.size_in_mb.
    • Mongo prop is, spark.mongodb.input.partitionerOptions.partitionSizeMB.
  • By default the number of partitions is the max(sc.defaultParallelism, total_data_size / data_block_size). some times number of available cores in the cluster also imflunce the number of partitions like sc.parallelize() without partitions param.

Read more.. link1



来源:https://stackoverflow.com/questions/39651842/difference-between-mapreduce-split-and-spark-paritition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!