Understanding String heap size in Javascript / V8

久未见 提交于 2019-12-21 16:18:08

问题


Does anyone have a good understanding/explanation of how the heap size of strings are determined in Javascript with Chrome(V8)?

Some examples of what I see in a heap dump:

1) Multiple copies of an identical 2 character strings (ie. "dt") with different @ object Ids all designated as OneByteStrings. The heapdump says each copy has a shallow & retained size of 32 bytes. It isn't clear how a two byte string has a retained size of 32 and why the strings don't appear to be interned.

2) Long object path string which is 78 characters long. All characters would be a single byte in utf8. It is classified as a InternalizedString. It has a 184 byte retained size. Even with a 2 byte character encoding that would still not account for the remaining 28 bytes. Why are these path strings taking up so much space? I could imagine another 4 bytes (maybe 8) being used for address and another 4 for storing the string length, but that still leaves 16 bytes even with a 2 byte character encoding.


回答1:


Internally, V8 has a number of different representations for strings:

  • SeqOneByteString: The simplest, contains a few header fields and then the string's bytes (not UTF-8 encoded, can only contain characters in the first 256 unicode code points)
  • SeqTwoByteString: Same, but uses two bytes for each character (using surrogate pairs to represent unicode characters that can't be represented in two bytes).
  • SlicedString: A substring of some other string. Contains a pointer to the "parent" string and an offset and length.
  • ConsString: The result of adding two strings (if over a certain size). Contains pointers to both strings (which may themselves be any of these types of strings).
  • ExternalString: Used for strings that have been passed in from outside of V8.

"Internalized" is just a flag, the actual string representation could be any of the above.

All of these have a common parent class String, whose parent is Name, whose parent is HeapObject (which is the root of the V8 class hierarchy for objects allocated on the V8 heap).

  • HeapObject has one field: the pointer to its Map (there's a good explanation of these here).
  • Name adds one additional field: a hash value.
  • String adds another field: the length.

On a 32-bit system, each of these is 4 bytes. On a 64-bit system, each one is 8 bytes.

If you're on a 64-bit system then the minimum size of a SeqOneByteString will be 32 bytes: 24 bytes for the header fields described above plus at least one byte for the string data, rounded up to a multiple of 8.

Regarding your second question, it's difficult to say exactly what's going on. It could be that the string is using a 2-byte representation and its header fields are pushing up the size above what you are expecting, or it could be that it's a ConsString or a SlicedString (whose retained sizes would include the strings that it points to).

V8 doesn't internalize strings most of the time - it internalizes string constants and identifier names that it finds during parsing, and strings that are used as object property keys, and probably a few other cases.



来源:https://stackoverflow.com/questions/40512393/understanding-string-heap-size-in-javascript-v8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!