写此文的起因是看到一个面试题:默认hashCode()返回的值是什么?GC后hashCode会改变吗?
在以前的认知里,默认hashCode()的返回值是java对象的内存地址没错。然而这又是一个经不起推敲的结论。
首先把一个对象作为key存入hashmap,如果hashCode返回内存地址,那么GC后地址必然改变,那么用这个对象就取不到值了,这显然是不合理的。
不知为何这样的谬误会存在脑子里这么久,从来没有怀疑过它。
既然想到这里,就再简单剖析一下hashCode原理吧。
1、预备知识
1.1 safepoint
GC相关。VM线程触发GC时,需要先stop-the-world
,说人话就是设置一个safepoint。工作线程会时不时检查safepoint状态,一旦发现safepoint被设置就将自己挂起。VM线程发现工作线程都挂起以后就执行GC,完事后再唤醒工作线程。
用现实情况举例就是:leader在群里发了一条"一会儿开会"的消息(safepoint),每个成员时不时查看消息,看到消息第一时间放下手中工作前往会议室(线程挂起),然而有个人沉浸编码无法自拔,十分钟后才看到信息,结果导致会议延迟(GC期间程序暂停时间过长)
至于底层具体如何实现,呃,我没看懂,感兴趣的同学自行了解
1.2 Mark Word
就是Java对象头里面的Mark Word
转载,下面给出原文地址
|-------------------------------------------------------|--------------------|
| Mark Word (32 bits) | State |
|-------------------------------------------------------|--------------------|
| identity_hashcode:25 | age:4 | biased_lock:1 | lock:2 | Normal |
|-------------------------------------------------------|--------------------|
| thread:23 | epoch:2 | age:4 | biased_lock:1 | lock:2 | Biased |
|-------------------------------------------------------|--------------------|
| ptr_to_lock_record:30 | lock:2 | Lightweight Locked |
|-------------------------------------------------------|--------------------|
| ptr_to_heavyweight_monitor:30 | lock:2 | Heavyweight Locked |
|-------------------------------------------------------|--------------------|
| | lock:2 | Marked for GC |
|-------------------------------------------------------|--------------------|
需要了解的几个点:
Mark Word
存储的数据会随着状态(右列)的变化而改变identity_hashcode
是延迟加载的- 当对象被锁定时(非Normal状态),
identity_hashcode
会移动到管程Monitor
中
2、源码分析
2.1 FastHashCode
从hashCode源码一路往下找,很容易就找到了FastHashCode
openjdk11\src\hotspot\share\runtime\synchronizer.cpp
intptr_t ObjectSynchronizer::FastHashCode(Thread * Self, oop obj) {
if (UseBiasedLocking) {
// == safepoint相关,英文不好就不翻译了 ==
// NOTE: many places throughout the JVM do not expect a safepoint
// to be taken here, in particular most operations on perm gen
// objects. However, we only ever bias Java instances and all of
// the call sites of identity_hash that might revoke biases have
// been checked to make sure they can handle a safepoint. The
// added check of the bias pattern is to avoid useless calls to
// thread-local storage.
if (obj->mark()->has_bias_pattern()) {
// Handle for oop obj in case of STW safepoint
Handle hobj(Self, obj);
// Relaxing assertion for bug 6320749.
assert(Universe::verify_in_progress() ||
!SafepointSynchronize::is_at_safepoint(),
"biases should not be seen by VM thread here");
BiasedLocking::revoke_and_rebias(hobj, false, JavaThread::current());
obj = hobj();
assert(!obj->mark()->has_bias_pattern(), "biases should be revoked by now");
}
}
ObjectMonitor* monitor = NULL;
markOop temp, test;
intptr_t hash;
markOop mark = ReadStableMark(obj); // == 读对象头MarkWord ==
if (mark->is_neutral()) { // == 判断是否normal状态 ==
hash = mark->hash(); // == 从mark里取hash ==
if (hash) { // == 已经存在,直接返回 ==
return hash;
}
hash = get_next_hash(Self, obj); // == 真正生成hash的方法 ==
temp = mark->copy_set_hash(hash); // == 复制mark串并将新的hash写入 ==
// use (machine word version) atomic operation to install the hash
// 下面这段是CAS常规操作,Java源码里也随处可见,一定要熟悉
// 结合ConcurrentHashMap可以给某些业务加分段锁,非常好用
test = obj->cas_set_mark(temp, mark); // == 通过CAS将新Mark设置给obj,返回设置前的值 ==
if (test == mark) { // == 如果设置前的值和刚开始读到的值相等,表明期间没被其他线程修改过 ==
return hash;
}
// If atomic operation failed, we must inflate the header
// into heavy weight monitor. We could add more code here
// for fast path, but it does not worth the complexity.
} else if (mark->has_monitor()) { // == 判断加锁(重量级锁) ==
// == 前文提到加锁的对象`identity_hashcode`会移动到管程`Monitor`中 ==
monitor = mark->monitor();
// == 从monitor中获取hashcode ==
temp = monitor->header();
assert(temp->is_neutral(), "invariant");
hash = temp->hash();
if (hash) {
return hash;
}
// Skip to the following code to reduce code size
} else if (Self->is_lock_owned((address)mark->locker())) { // == 判断是否锁重入 ==
temp = mark->displaced_mark_helper(); // this is a lightweight monitor owned
assert(temp->is_neutral(), "invariant");
hash = temp->hash(); // by current thread, check if the displaced
if (hash) { // header contains hash code
return hash;
}
// WARNING:
// The displaced header is strictly immutable.
// It can NOT be changed in ANY cases. So we have
// to inflate the header into heavyweight monitor
// even the current thread owns the lock. The reason
// is the BasicLock (stack slot) will be asynchronously
// read by other threads during the inflate() function.
// Any change to stack may not propagate to other threads
// correctly.
}
// == 无锁或轻量级锁无法获得hash,则升级为重量级锁 ==
// == 基本逻辑和上面相似,不再赘述 ==
// Inflate the monitor to set hash code
monitor = ObjectSynchronizer::inflate(Self, obj, inflate_cause_hash_code);
// Load displaced header and check it has hash code
mark = monitor->header();
assert(mark->is_neutral(), "invariant");
hash = mark->hash();
if (hash == 0) {
hash = get_next_hash(Self, obj);
temp = mark->copy_set_hash(hash); // merge hash code into header
assert(temp->is_neutral(), "invariant");
test = Atomic::cmpxchg(temp, monitor->header_addr(), mark);
if (test != mark) {
// The only update to the header in the monitor (outside GC)
// is install the hash code. If someone add new usage of
// displaced header, please update this code
hash = test->hash();
assert(test->is_neutral(), "invariant");
assert(hash != 0, "Trivial unexpected object/monitor header usage.");
}
}
// We finally get the hash
return hash;
}
2.2 get_next_hash
核心逻辑。根据全局hashCode的值决定使用哪种算法。可以通过配置-XX:hashCode=[0-5]
修改。
openjdk11\src\hotspot\share\runtime\synchronizer.cpp
static inline intptr_t get_next_hash(Thread * Self, oop obj) {
intptr_t value = 0;
// == hashCode是一个全局变量,指明要使用的hashCode算法 ==
if (hashCode == 0) {
// This form uses global Park-Miller RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random(); // == 0 使用系统随机数 ==
} else if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
// == 1 在内存地址基础上进行二次运算 ==
intptr_t addrBits = cast_from_oop<intptr_t>(obj) >> 3;
value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom;
} else if (hashCode == 2) {
// == 2 测试用途 ==
value = 1; // for sensitivity testing
} else if (hashCode == 3) {
value = ++GVars.hcSequence; // == 3 自增序列 ==
} else if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj); // == 4 使用内存地址 ==
} else {
// == 5 某个"名字说了你也记不住"的算法,默认使用这个算法 ==
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = Self->_hashStateX;
t ^= (t << 11);
Self->_hashStateX = Self->_hashStateY;
Self->_hashStateY = Self->_hashStateZ;
Self->_hashStateZ = Self->_hashStateW;
unsigned v = Self->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
Self->_hashStateW = v;
value = v;
}
value &= markOopDesc::hash_mask;
if (value == 0) value = 0xBAD;
assert(value != markOopDesc::no_hash, "invariant");
TEVENT(hashCode: GENERATE);
return value;
}
2.3 MarkWord
openjdk11\src\hotspot\share\oops\markOop.cpp
// 通过源码可以简单看出32位与64位的区别
// 本次分析以32位为例
class markOopDesc: public oopDesc {
private:
// Conversion
uintptr_t value() const { return (uintptr_t) this; }
public:
// 字段大小
// Constants
enum { age_bits = 4, // age
lock_bits = 2, // lock
biased_lock_bits = 1, // biased_lock
max_hash_bits = BitsPerWord - age_bits - lock_bits - biased_lock_bits, // 32-4-2-1=25
hash_bits = max_hash_bits > 31 ? 31 : max_hash_bits, // identity_hashcode:25
cms_bits = LP64_ONLY(1) NOT_LP64(0), // 64位包含1位unused,32位没有
epoch_bits = 2 // Biased状态下的epoch
};
// 字段所在位置偏移
// The biased locking code currently requires that the age bits be
// contiguous to the lock bits.
enum { lock_shift = 0,
biased_lock_shift = lock_bits, // << 2
age_shift = lock_bits + biased_lock_bits, // << 3
cms_shift = age_shift + age_bits, // << 7
hash_shift = cms_shift + cms_bits, // << 7 (32位)
epoch_shift = hash_shift
};
// 计算掩码
enum { lock_mask = right_n_bits(lock_bits), // b11
lock_mask_in_place = lock_mask << lock_shift, // b11
biased_lock_mask = right_n_bits(lock_bits + biased_lock_bits), // b111
biased_lock_mask_in_place= biased_lock_mask << lock_shift, // b111
biased_lock_bit_in_place = 1 << biased_lock_shift, // b100
age_mask = right_n_bits(age_bits), // b1111
age_mask_in_place = age_mask << age_shift, // b111_1000
epoch_mask = right_n_bits(epoch_bits), // b11
epoch_mask_in_place = epoch_mask << epoch_shift, // b1_1000_0000
cms_mask = right_n_bits(cms_bits), // b1
cms_mask_in_place = cms_mask << cms_shift // b1000_0000
#ifndef _WIN64 // 32位
,hash_mask = right_n_bits(hash_bits), // b1_11111111_11111111_11111111
hash_mask_in_place = (address_word)hash_mask << hash_shift // b11111111_11111111_11111111_10000000
#endif
};
....
}
// 没找到right_n_bits源码,随便写了个Java版的
public static int right_n_bits(int n) {
int x = 1;
for (int i = 1; i < n; i++) {
x = (x << 1) + 1;
}
return x;
}
2.4 copy_set_hash
openjdk11\src\hotspot\share\oops\markOop.cpp
// 拷贝Mark值的副本,并将hash填入
// (不确定,从逻辑上看应该是拷贝了,但看tmp变量名和value()的定义又像指针操作)
markOop copy_set_hash(intptr_t hash) const { // 假设传入hash值为 b01010101
/*
hash_mask_in_place b11111111_11111111_11111111_10000000
~ b00000000_00000000_00000000_01111111
value() b00010010_00010001_00011000_00001001 假定值
& b00000000_00000000_00000000_00001001 => tmp
*/
intptr_t tmp = value() & (~hash_mask_in_place);
/*
hash b0_00000000_00000000_01010101
hash_mast b1_11111111_11111111_11111111
& b0_00000000_00000000_01010101 截掉超出的位数
<< 7 b00000000_00000000_00101010_10000000
tmp b00000000_00000000_00000000_00001001
|= b00000000_00000000_00101010_10001001 > tmp
*/
tmp |= ((hash & hash_mask) << hash_shift);
return (markOop)tmp;
}
2.5 cas_set_mark
markOop oopDesc::cas_set_mark(markOop new_mark, markOop old_mark) {
// atomic_cmpxchg_at 未找到源码
// 推测返回值为修改前的值
return HeapAccess<>::atomic_cmpxchg_at(new_mark, as_oop(), mark_offset_in_bytes(), old_mark);
}
这段代码没什么好说的,和我们常用的AtomicXXX
很相似
AtomicReference<String> ref = new AtomicReference<>("A");
String old = ref.compareAndExchange("A", "B"); // V compareAndExchange(V expectedValue, V newValue)
System.out.println(old); // A
System.out.println(ref.get()); // B
3、总结
- 默认情况下,hashCode跟内存地址无关,是通过某种(叫不上名字的)算法计算得来的
- hashCode是延迟计算的,一旦设置,不会改变
- 可以通过配置
-XX:hashCode=[0-5]
更改算法
4、Q&A
Q:我了解这个有用吗?
A:
- 初/中级工作中:没啥用
- 高级工作中:不知道,没达到
- 面试中:能把不了解的面试官说懵逼。当然他不懂也不会问
- 其他:把程序启动命令改成
-XX:hashCode=2
来坑队友(误。要用在正当用途比如学习)
来源:oschina
链接:https://my.oschina.net/u/580483/blog/4304057