What is an “internal address” in Java?

雨燕双飞 提交于 2019-11-26 20:17:27

This is clearly implementation-specific.

Below I include the Object.hashCode() implementation used in OpenJDK 7.

The function supports six different calculation methods, only two of which take any notice of the object's address (the "address" being the C++ oop cast to intptr_t). One of the two methods uses the address as-is, whereas the other does some bit twiddling and then mashes the result with an infrequently-updated random number.

Of the remaining methods, one returns a constant (presumably for testing), one returns sequential numbers, and the rest are based on pseudo-random sequences.

It would appear that the method can be chosen at runtime, and the default seems to be method 0, which is os::random(). The latter is a linear congruential generator, with an alleged race condition thrown in. :-) The race condition is acceptable because at worst it would result in two objects sharing the same hash code; this does not break any invariants.

The computation is performed the first time a hash code is required. To maintain consistency, the result is then stored in the object's header and is returned on subsequent calls to hashCode(). The caching is done outside this function.

In summary, the notion that Object.hashCode() is based on the object's address is largely a historic artefact that has been obsoleted by the properties of modern garbage collectors.

// hotspot/src/share/vm/runtime/synchronizer.hpp

// hashCode() generation :
//
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
//   2654435761 = 2^32 * Phi (golden ratio)
//   HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
//   in undesirable regularity in the hashCode values of adjacent objects
//   (objects allocated back-to-back, in particular).  This could potentially
//   result in hashtable collisions and reduced hashtable efficiency.
//   There are simple ways to "diffuse" the middle address bits over the
//   generated hashCode values:
//

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

It typically is the memory address of the object. However, the first time the hashcode method is called on an object, the integer is stored in the header of that object so that the next invocation will return the same value (as you say, compacting garbage collection can change the address). To my knowledge that is how it is implemented in the Oracle JVM.

EDIT: digging down into the JVM source code, this is what shows up (synchronizer.cpp):

// hashCode() generation :
//
// Possibilities:
// * MD5Digest of {obj,stwRandom}
// * CRC32 of {obj,stwRandom} or any linear-feedback shift register function.
// * A DES- or AES-style SBox[] mechanism
// * One of the Phi-based schemes, such as:
//   2654435761 = 2^32 * Phi (golden ratio)
//   HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ;
// * A variation of Marsaglia's shift-xor RNG scheme.
// * (obj ^ stwRandom) is appealing, but can result
//   in undesirable regularity in the hashCode values of adjacent objects
//   (objects allocated back-to-back, in particular).  This could potentially
//   result in hashtable collisions and reduced hashtable efficiency.
//   There are simple ways to "diffuse" the middle address bits over the
//   generated hashCode values:
//

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

So 6 different ways to do it in the Oracle JVM, one of them is equivalent to what I said... The value is stored in the header of the object by the method calling get_next_hash (that one is called FastHashCode and is called from the native version of Object.hashCode().

IMHO, although implementation dependent, Object references in JVM are never actual memory addresses, but internal JVM references that point to actual addresses. These internal references ARE generated initially based on memory address, but they remain associated with the object until it is discarded.

I say this because Java HotSpot GCs are copying collectors of some form or the other, and they work by traversing live objects and copying them from one heap to the other, destroying the old heap subsequently. So when GC occurs, the JVM does not have to update the references in all objects, but just change the actual memory address mapped to the internal reference.

The Javadoc's suggestion for Object.hashCode() that it is implemented by deriving the hashcode from the memory address is simply outdated.

Probably nobody bothered (or noticed) that this implementation path is no longer possible whith a copying garbage collector (Since it would change the hashcode when the Object is copied to another memory location).

It made a lot of sense to implement hashcode that way before there was a copying garbage collector, because it would save heap space. A non-copying GC (CMS) may still implement hashcode that way today.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!