浅析hashCode

匆匆过客 提交于 2020-08-19 16:43:55

写此文的起因是看到一个面试题:默认hashCode()返回的值是什么?GC后hashCode会改变吗?

在以前的认知里,默认hashCode()的返回值是java对象的内存地址没错。然而这又是一个经不起推敲的结论。

首先把一个对象作为key存入hashmap,如果hashCode返回内存地址,那么GC后地址必然改变,那么用这个对象就取不到值了,这显然是不合理的。

不知为何这样的谬误会存在脑子里这么久,从来没有怀疑过它。

既然想到这里,就再简单剖析一下hashCode原理吧。

1、预备知识

1.1 safepoint

GC相关。VM线程触发GC时,需要先stop-the-world,说人话就是设置一个safepoint。工作线程会时不时检查safepoint状态,一旦发现safepoint被设置就将自己挂起。VM线程发现工作线程都挂起以后就执行GC,完事后再唤醒工作线程。

用现实情况举例就是:leader在群里发了一条"一会儿开会"的消息(safepoint),每个成员时不时查看消息,看到消息第一时间放下手中工作前往会议室(线程挂起),然而有个人沉浸编码无法自拔,十分钟后才看到信息,结果导致会议延迟(GC期间程序暂停时间过长)

至于底层具体如何实现,呃,我没看懂,感兴趣的同学自行了解

https://www.jianshu.com/p/c79c5e02ebe6

1.2 Mark Word

就是Java对象头里面的Mark Word

转载,下面给出原文地址
|-------------------------------------------------------|--------------------|
|                  Mark Word (32 bits)                  |       State        |
|-------------------------------------------------------|--------------------|
| identity_hashcode:25 | age:4 | biased_lock:1 | lock:2 |       Normal       |
|-------------------------------------------------------|--------------------|
|  thread:23 | epoch:2 | age:4 | biased_lock:1 | lock:2 |       Biased       |
|-------------------------------------------------------|--------------------|
|               ptr_to_lock_record:30          | lock:2 | Lightweight Locked |
|-------------------------------------------------------|--------------------|
|               ptr_to_heavyweight_monitor:30  | lock:2 | Heavyweight Locked |
|-------------------------------------------------------|--------------------|
|                                              | lock:2 |    Marked for GC   |
|-------------------------------------------------------|--------------------|

需要了解的几个点:

  • Mark Word存储的数据会随着状态(右列)的变化而改变
  • identity_hashcode是延迟加载的
  • 当对象被锁定时(非Normal状态),identity_hashcode会移动到管程Monitor

https://www.jianshu.com/p/3d38cba67f8b

2、源码分析

2.1 FastHashCode

从hashCode源码一路往下找,很容易就找到了FastHashCode

openjdk11\src\hotspot\share\runtime\synchronizer.cpp

intptr_t ObjectSynchronizer::FastHashCode(Thread * Self, oop obj) {
  if (UseBiasedLocking) {
    // == safepoint相关,英文不好就不翻译了 ==
    // NOTE: many places throughout the JVM do not expect a safepoint
    // to be taken here, in particular most operations on perm gen
    // objects. However, we only ever bias Java instances and all of
    // the call sites of identity_hash that might revoke biases have
    // been checked to make sure they can handle a safepoint. The
    // added check of the bias pattern is to avoid useless calls to
    // thread-local storage.
    if (obj->mark()->has_bias_pattern()) {
      // Handle for oop obj in case of STW safepoint
      Handle hobj(Self, obj);
      // Relaxing assertion for bug 6320749.
      assert(Universe::verify_in_progress() ||
             !SafepointSynchronize::is_at_safepoint(),
             "biases should not be seen by VM thread here");
      BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
      obj = hobj();
      assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
    }
  }

  ObjectMonitor* monitor = NULL;
  markOop temp, test;
  intptr_t hash;
  markOop mark = ReadStableMark(obj); // == 读对象头MarkWord ==

  if (mark->is_neutral()) {           // == 判断是否normal状态 ==
    hash = mark->hash();              // == 从mark里取hash ==
    if (hash) {                       // == 已经存在,直接返回 ==
      return hash;
    }
    hash = get_next_hash(Self, obj);  // == 真正生成hash的方法 ==
    temp = mark->copy_set_hash(hash); // == 复制mark串并将新的hash写入 ==
    // use (machine word version) atomic operation to install the hash
    // 下面这段是CAS常规操作,Java源码里也随处可见,一定要熟悉
    // 结合ConcurrentHashMap可以给某些业务加分段锁,非常好用
    test = obj->cas_set_mark(temp, mark); // == 通过CAS将新Mark设置给obj,返回设置前的值 ==
    if (test == mark) { // == 如果设置前的值和刚开始读到的值相等,表明期间没被其他线程修改过 ==
      return hash;
    }
    // If atomic operation failed, we must inflate the header
    // into heavy weight monitor. We could add more code here
    // for fast path, but it does not worth the complexity.
  } else if (mark->has_monitor()) { // == 判断加锁(重量级锁) ==
    // == 前文提到加锁的对象`identity_hashcode`会移动到管程`Monitor`中 ==
    monitor = mark->monitor();
    // == 从monitor中获取hashcode ==
    temp = monitor->header();
    assert(temp->is_neutral(), "invariant");
    hash = temp->hash();
    if (hash) {
      return hash;
    }
    // Skip to the following code to reduce code size
  } else if (Self->is_lock_owned((address)mark->locker())) { // == 判断是否锁重入 ==
    temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
    assert(temp->is_neutral(), "invariant");
    hash = temp->hash();              // by current thread, check if the displaced
    if (hash) {                       // header contains hash code
      return hash;
    }
    // WARNING:
    //   The displaced header is strictly immutable.
    // It can NOT be changed in ANY cases. So we have
    // to inflate the header into heavyweight monitor
    // even the current thread owns the lock. The reason
    // is the BasicLock (stack slot) will be asynchronously
    // read by other threads during the inflate() function.
    // Any change to stack may not propagate to other threads
    // correctly.
  }
  
  // == 无锁或轻量级锁无法获得hash,则升级为重量级锁 ==
  // == 基本逻辑和上面相似,不再赘述 ==
  // Inflate the monitor to set hash code
  monitor = ObjectSynchronizer::inflate(Self, obj, inflate_cause_hash_code);
  // Load displaced header and check it has hash code
  mark = monitor->header();
  assert(mark->is_neutral(), "invariant");
  hash = mark->hash();
  if (hash == 0) {
    hash = get_next_hash(Self, obj);
    temp = mark->copy_set_hash(hash); // merge hash code into header
    assert(temp->is_neutral(), "invariant");
    test = Atomic::cmpxchg(temp, monitor->header_addr(), mark);
    if (test != mark) {
      // The only update to the header in the monitor (outside GC)
      // is install the hash code. If someone add new usage of
      // displaced header, please update this code
      hash = test->hash();
      assert(test->is_neutral(), "invariant");
      assert(hash != 0, "Trivial unexpected object/monitor header usage.");
    }
  }
  // We finally get the hash
  return hash;
}

2.2 get_next_hash

核心逻辑。根据全局hashCode的值决定使用哪种算法。可以通过配置-XX:hashCode=[0-5]修改。

openjdk11\src\hotspot\share\runtime\synchronizer.cpp

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0;
  // == hashCode是一个全局变量,指明要使用的hashCode算法 ==
  if (hashCode == 0) {
    // This form uses global Park-Miller RNG.
    // On MP system we'll have lots of RW access to a global, so the
    // mechanism induces lots of coherency traffic.
    value = os::random(); // == 0 使用系统随机数 ==
  } else if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    // == 1 在内存地址基础上进行二次运算 ==
    intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3;
    value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom;
  } else if (hashCode == 2) {
    // == 2 测试用途 ==
    value = 1;            // for sensitivity testing
  } else if (hashCode == 3) {
    value = ++GVars.hcSequence;  // == 3 自增序列 ==
  } else if (hashCode == 4) {
    value = cast_from_oop<intptr_t>(obj); // == 4 使用内存地址 ==
  } else {
    // == 5 某个"名字说了你也记不住"的算法,默认使用这个算法 ==
    // Marsaglia's xor-shift scheme with thread-specific state
    // This is probably the best overall implementation -- we'll
    // likely make this the default in future releases.
    unsigned t = Self->_hashStateX;
    t ^= (t << 11);
    Self->_hashStateX = Self->_hashStateY;
    Self->_hashStateY = Self->_hashStateZ;
    Self->_hashStateZ = Self->_hashStateW;
    unsigned v = Self->_hashStateW;
    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
    Self->_hashStateW = v;
    value = v;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD;
  assert(value != markOopDesc::no_hash, "invariant");
  TEVENT(hashCode: GENERATE);
  return value;
}

2.3 MarkWord

openjdk11\src\hotspot\share\oops\markOop.cpp

// 通过源码可以简单看出32位与64位的区别
// 本次分析以32位为例
class markOopDesc: public oopDesc {
 private:
  // Conversion
  uintptr_t value() const { return (uintptr_t) this; }

 public:
  // 字段大小
  // Constants
  enum { age_bits                 = 4, // age
         lock_bits                = 2, // lock
         biased_lock_bits         = 1, // biased_lock
         max_hash_bits            = BitsPerWord - age_bits - lock_bits - biased_lock_bits, // 32-4-2-1=25
         hash_bits                = max_hash_bits > 31 ? 31 : max_hash_bits, // identity_hashcode:25
         cms_bits                 = LP64_ONLY(1) NOT_LP64(0), // 64位包含1位unused,32位没有
         epoch_bits               = 2 // Biased状态下的epoch
  };

  // 字段所在位置偏移
  // The biased locking code currently requires that the age bits be
  // contiguous to the lock bits.
  enum { lock_shift               = 0,
         biased_lock_shift        = lock_bits, // << 2
         age_shift                = lock_bits + biased_lock_bits, // << 3
         cms_shift                = age_shift + age_bits, // << 7
         hash_shift               = cms_shift + cms_bits, // << 7 (32位)
         epoch_shift              = hash_shift
  };

  // 计算掩码
  enum { lock_mask                = right_n_bits(lock_bits), // b11
         lock_mask_in_place       = lock_mask << lock_shift, // b11
         biased_lock_mask         = right_n_bits(lock_bits + biased_lock_bits), // b111
         biased_lock_mask_in_place= biased_lock_mask << lock_shift, // b111
         biased_lock_bit_in_place = 1 << biased_lock_shift, // b100
         age_mask                 = right_n_bits(age_bits), // b1111
         age_mask_in_place        = age_mask << age_shift, // b111_1000
         epoch_mask               = right_n_bits(epoch_bits), // b11
         epoch_mask_in_place      = epoch_mask << epoch_shift, // b1_1000_0000
         cms_mask                 = right_n_bits(cms_bits), // b1
         cms_mask_in_place        = cms_mask << cms_shift // b1000_0000
#ifndef _WIN64 // 32位
         ,hash_mask               = right_n_bits(hash_bits), // b1_11111111_11111111_11111111
         hash_mask_in_place       = (address_word)hash_mask << hash_shift // b11111111_11111111_11111111_10000000
#endif
  };

....
}
// 没找到right_n_bits源码,随便写了个Java版的
public static int right_n_bits(int n) {
    int x = 1;
    for (int i = 1; i < n; i++) {
        x = (x << 1) + 1;
    }
    return x;
}

2.4 copy_set_hash

openjdk11\src\hotspot\share\oops\markOop.cpp

// 拷贝Mark值的副本,并将hash填入
// (不确定,从逻辑上看应该是拷贝了,但看tmp变量名和value()的定义又像指针操作)
markOop copy_set_hash(intptr_t hash) const { // 假设传入hash值为 b01010101
  /*
  hash_mask_in_place   b11111111_11111111_11111111_10000000
                   ~   b00000000_00000000_00000000_01111111
             value()   b00010010_00010001_00011000_00001001 假定值
                   &   b00000000_00000000_00000000_00001001 => tmp
  */
  intptr_t tmp = value() & (~hash_mask_in_place);
  /*
  hash        b0_00000000_00000000_01010101
  hash_mast   b1_11111111_11111111_11111111
          &   b0_00000000_00000000_01010101  截掉超出的位数
       << 7   b00000000_00000000_00101010_10000000
        tmp   b00000000_00000000_00000000_00001001
         |=   b00000000_00000000_00101010_10001001 > tmp
  */
  tmp |= ((hash & hash_mask) << hash_shift);
  return (markOop)tmp;
}

2.5 cas_set_mark

markOop oopDesc::cas_set_mark(markOop new_mark, markOop old_mark) {
  // atomic_cmpxchg_at 未找到源码
  // 推测返回值为修改前的值
  return HeapAccess<>::atomic_cmpxchg_at(new_mark, as_oop(), mark_offset_in_bytes(), old_mark);
}

这段代码没什么好说的,和我们常用的AtomicXXX很相似

AtomicReference<String> ref = new AtomicReference<>("A");
String old = ref.compareAndExchange("A", "B"); // V compareAndExchange(V expectedValue, V newValue)
System.out.println(old);  // A
System.out.println(ref.get());  // B

3、总结

  1. 默认情况下,hashCode跟内存地址无关,是通过某种(叫不上名字的)算法计算得来的
  2. hashCode是延迟计算的,一旦设置,不会改变
  3. 可以通过配置-XX:hashCode=[0-5]更改算法

4、Q&A

Q:我了解这个有用吗?

A:

  • 初/中级工作中:没啥用
  • 高级工作中:不知道,没达到
  • 面试中:能把不了解的面试官说懵逼。当然他不懂也不会问
  • 其他:把程序启动命令改成-XX:hashCode=2来坑队友(误。要用在正当用途比如学习)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!