I know the hashing principal for HashMap in Java, so wanted to know that how the hashing works for the Hive while we bucketing the data in various bucket.
Bucketing is used along with partitioning to have more decomposed structure for future analysis. As more partitions result in more hdfs files which can affect namenode performance, we resort to bucketing. The way bucketing actually works is : The number of buckets is determined by hashFunction(bucketingColumn) mod numOfBuckets numOfBuckets is chose when you create the table with partitioning. The hash function output depends on the type of the column choosen. To accurately set the number of reducers while bucketing and land the data appropriately, we use "hive.enforce.bucketing = true". Please refer to this, for more information