JDK1.7的HashMap源码分析

HashMap介绍
jdk1.7的HashMap是基于数组+链表
jdk1.8的HashMap是基于数组+链表+红黑树

1.7HashMap存储结构


 static final Entry<?,?>[] EMPTY_TABLE = {};
// HashMap底层使用Entry数组作为存储, 默认为: {}
 transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

// Entry对象封装key和value, 且使用的是单向链表
static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

        /**
         * Creates new entry.
         */
        Entry(int h, K k, V v, Entry<K,V> n) {
            value = v;
            next = n;
            key = k;
            hash = h;
        }
.................................
}

通过源码分析，1.7HashMap:
使用table 数组来存储对象
使用Entry对象作为单向链表封装了Key和value
所以 1.7HashMap底层采用的是 : 数组+链表

构造函数

// 默认初始容量   1<<4表示2的4次方: 16
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 
// 默认负载因子，0.75f
static final float DEFAULT_LOAD_FACTOR = 0.75f;
// 最大容量  1<<30表示2的30次方: 1073741824
static final int MAXIMUM_CAPACITY = 1 << 30;

// 实际数组容量 也叫扩容的阈值. 表示数组实际用到了这个值后就会扩容  阈值=容量*负载因子
int threshold;
// 实际负载因子
final float loadFactor;


public HashMap(){
    this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR)
}

/**
*  initialCapacity 初始化大小
*  loadFactor  负载因子
*/
public HashMap(int initialCapacity, float loadFactor) {
　　　　 // 检查初始容量值是否合法  0 < initialCapacity < 1073741824
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        // 检查负载因子值是否合法
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
　　　　   // 设置实际负载因子
        this.loadFactor = loadFactor;
　　　　 // 设置扩容阀值=初始容量，但是在put()方法初始化数组时, 其实际值=容量*负载因子
        threshold = initialCapacity;
        init();
    }

初始容量：table数组初始化的大小，默认=16
实际容量：虽然初始化的数组大小为16，但是实际添加时到了12，数组就会扩容，所以也叫扩容阈值。默认(12)=初始容量(16)* 负载因子(0.75)
在构造函数中，得出HashMap默认的初始容量=16，加载因子=0.75

put()方法


static final Entry<?,?>[] EMPTY_TABLE = {};

transient Entry<K,V>[] table = (Entry<K,V>[]) EMPTY_TABLE;

transient int size;　　// Entry数组实际大小

public V put(K key, V value) {
        // 判断数组是否为空, 第一次为空
        if (table == EMPTY_TABLE) {
            inflateTable(threshold);  // 初始化数组
        }
        // 判断添加的key是否null 
        if (key == null)
            return putForNullKey(value);  // 如果为null, 则对应的value存放下标为table[0]
　　　　　// 根据key计算哈希值
        int hash = hash(key);
　　　　　// 计算数组下标  i= hash & (length - 1)
        int i = indexFor(hash, table.length);
         // 获取相同hashCode对应下标的table[i]数据, 并遍历
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
　　　　　　// 判断hashcode相同并且key值相同(==表示:基本类型判断 equals()表示: 引用类型判断)
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                // 如果为true,表示key一同，则更新当前对应key的值
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;　　// fast-fail机制
        // 如果hash不同 或者 hansh相同对象不同， 则添加元素
        addEntry(hash, key, value, i);　　
        return null;
    }


 // 初始化table数组
 private void inflateTable(int toSize) {
        int capacity = roundUpToPowerOf2(toSize);  // 默认返回16
        // 计算阈值  16*0.75=12
        threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1);
        // 初始化数组  容量16
        table = new Entry[capacity];
        initHashSeedAsNeeded(capacity);
  }

  
    // 添加entry对象
    void addEntry(int hash, K key, V value, int bucketIndex) { // bucketIndex数组下标
　　　　　// 判断的数组的大小是否大于等于阀值
        if ((size >= threshold) && (null != table[bucketIndex])) {
　　　　　　  // 扩容数组长度为之前的2倍
            resize(2 * table.length);
            // 扩容后重新计算下标
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length); 
        }
　　　　
        createEntry(hash, key, value, bucketIndex);
    }


　　// 创建Entry对象
    void createEntry(int hash, K key, V value, int bucketIndex) {
        // 获取原来的oldEntry对象, 如果!=null表示发生hash冲突
        Entry<K,V> e = table[bucketIndex];　
        // 创建新的newEntry对象　并且newEntry.next=oldEntry
        table[bucketIndex] = new Entry<>(hash, key, value, e);  // e: next节点　　
        size++;
    }
    
    // 扩容数组  
    void resize(int newCapacity) {  // 新数组大小newCapacity = 2 * table.length(16)
　　　　 // 获取旧数组
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
　　　　　// 判断数组的长度是否达到了最大值
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
　　　　 // 创建一个新的数组
        Entry[] newTable = new Entry[newCapacity];
　　　　 // 将旧数组的内容转换到新的数组中
        transfer(newTable, initHashSeedAsNeeded(newCapacity));
        table = newTable;
　　　　 // 重新计算阀值
        threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1);
    }

   private static int roundUpToPowerOf2(int number) {
        return number >= MAXIMUM_CAPACITY
                ? MAXIMUM_CAPACITY
                : (number > 1) ? Integer.highestOneBit((number - 1) << 1) : 1;
  }

  put()方法流程
1) 初始化table数组。数组大小:16 阈值: 12
2) 判断key==null，将value存放数组table[0]位置
3) 判断key!=null，根据key获取hash，通过 hash&(length-1) 获取对应数组下标Entry
4) 遍历entry，判断hash是否相等 && key与entry.key是否相等，true则更新对应的value
5) 否则在对应下标数组中添加新entry，并判断size >= 12， true则扩容数组=2*16
6) size++

源码得出：
初始化table数组大小为16，加载因子=0.75，size=0，阈值(12) = 数组大小(16) * 加载因子(0.75)
当size>=阈值时，就会扩容，扩容后的新数组大小为32。并重新计算旧数组下标到新数组
  扩容数组大小 = 2 * table.length

get()方法

    public V get(Object key) {
　　　　// 判断key是否为null 
        if (key == null) {
            return getForNullKey(); // 直接去table[0]查找
        }
        // 否则 通过key获取获取对应下标的数组    
        Entry<K,V> entry = getEntry(key);
        // 返回value
        return null == entry ? null : entry.getValue();
    }


    final Entry<K,V> getEntry(Object key) {
        if (size == 0) {
            return null;
        }
　　　　　// 根据key获取hashcode值
        int hash = (key == null) ? 0 : hash(key);
　　　　 // 通过indexFor()计算下标， 获取对应数组下标的Entry对象, 并遍历 
        for (Entry<K,V> e = table[indexFor(hash, table.length)]; e != null; e = e.next) {
            Object k;
        // 判断hash值是否相同  key是否一样
            if (e.hash == hash &&((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

如何减少Hash碰撞问题
通过源码分析：HashMap使用链表解决了Hash冲突问题，但是链表的查询效率比较低。所以HashMap底层做了可以减少hash冲突的解决方法。如下:
```
    // 计算hash值   
   final int hash(Object key) {
        int h = hashSeed;
        if (0 != h && k instanceof String) {
            return sun.misc.Hashing.stringHash32((String) k);
        }
        h ^= k.hashCode();
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

    // 计算数组下标   hash:key的hashCode   length:当前数组大小
    static int indexFor(int hash, int length) {
        return h & (length-1);　　
    }
```
温习 &运算: 0&0=0，0&1=0，1&0=0，1&1=1
indexFor()方法为什么使用: h & (length - 1) ，测试

(length-1) length
h=103 103&(16-1) =7 103&16 =0
h=104 104&(16-1) =8 104&16 =0
h=105 105&(16-1) =8 105&16 =0
如上分析：如果使用(length-1）可以减少hash冲突问题
为什么负载载因子是0.75
源码分析：数组下标 = hash & (length-1)，所以:
加载因子越大，内存利用率越高，index下标冲突概率也就越大；
加载因子越小，内存利用率越低，index下标冲突概率也就越小；

如果index下标冲突越高，就会使用链表解决，但是链表查询效率低，反则反之；

HashMap扩容死循环问题
扩容源码

 void transfer(Entry[] newTable, boolean rehash) {
        int newCapacity = newTable.length;
        for (Entry<K,V> e : table) {
            while(null != e) {
                Entry<K,V> next = e.next;
                if (rehash) {
                    e.hash = null == e.key ? 0 : hash(e.key);
                }
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            }
        }
    }

因为table[]数组是共享变量，当数组在扩容的时候会修改e.next的值，并重新赋值，如果在多线程的情况下就会可能发生死循环。

如何解决：使用ConcurrentHashMap。

HashMap面试总结

HashMap线程是否安全
不安全
HashMap默认初始容量
默认初始容量16 ，
HashMap每次扩容的容量为多少
2倍， HashMap每次扩容的数组容量为原来的2倍
HashMap什么时候扩容？阈值多少
HashMap的size>=阈值时，就扩容数组为原来的2倍，重新计算阈值。
阈值=当前数组容量*0.75
HashMap如何存放key为null
如果key==null，存放的对应数组下标为table[0]
HashMap中hash冲突和index冲突的区别
hash冲突：key的hash值相同
index冲突：是在计算数组下标时产生相同问题 index = hash & (length - 1)
HashMap如何解决hash冲突、减少index冲突问题
解决hash、index冲突: 使用单向链表
减少index冲突: 使用 hash & (length - 1) ，（length-1)=15奇数时，在做 '&' 运算时，减少index相同的冲突问题。
HashMap扩容存在哪些问题
多线程情况下可能会存在死循环问题
HashMap的负载因子为什么时0.75
加载因子越大，内存利用率越高，index下标冲突概率也就越大；
加载因子越小，内存利用率越低，index下标冲突概率也就越小；
HashMap如果链表过长
如果hash、index冲突越高，链表就越长，查询效率越低，时间复杂度为On
HashMap的查询效率问题
如果没有发生冲突问题，直接通过数组下标定位，效率高
如果发生了冲突问题，使用链表查询，效率低

来源：CSDN

作者：xiaobo5264063

链接：https://blog.csdn.net/xiaobo5264063/article/details/104524787

标签

源码

	(length-1)	length
h=103	103&(16-1) =7	103&16 =0
h=104	104&(16-1) =8	104&16 =0
h=105	105&(16-1) =8	105&16 =0