What are the rules for cpython's string interning?

北城余情 提交于 2019-12-09 00:34:18

问题


In python 3.5, is it possible to predict when we will get an interned string or when we will get a copy? After reading a few Stack Overflow answers on this issue I've found this one the most helpful but still not comprehensive. Than I looked at Python docs, but the interning is not guaranteed by default

Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys.

So, my question is about inner intern() conditions, i.e. decision-making (whether to intern string literal or not): why the same piece of code works on one system and not on another one and what rules did author of the answer on mentioned topic mean when saying

the rules for when this happens are quite convoluted


回答1:


You think there are rules?

The only rule for interning is that the return value of intern is interned. Everything else is up to the whims of whoever decided some piece of code should or shouldn't do interning. For example, "left" gets interned by PyCodeNew:

/* Intern selected string constants */
for (i = PyTuple_GET_SIZE(consts); --i >= 0; ) {
    PyObject *v = PyTuple_GetItem(consts, i);
    if (!all_name_chars(v))
        continue;
    PyUnicode_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}

The "rule" here is that a string object in the co_consts of a Python code object gets interned if it consists purely of ASCII characters that are legal in a Python identifier. "left" gets interned, but "as,df" wouldn't be, and "1234" would be interned even though an identifier can't start with a digit. While identifiers can contain non-ASCII characters, such characters are still rejected by this check. Actual identifiers don't ever pass through this code; they get unconditionally interned a few lines up, ASCII or not. This code is subject to change, and there's plenty of other code that does interning or interning-like things.

Asking us for the "rules" for string interning is like asking a meteorologist what the rules are for whether it rains on your wedding. We can tell you quite a lot about how it works, but it won't be much use to you, and you'll always get surprises.




回答2:


From what I understood from the post you linked:

When you use if a == b, you are checking if the value of a is the value of b, whereas when you use if a is b, you are checking if a and b are the same object (or share the same spot in the memory).

Now python interns the constant strings (defined by "blabla"). So:

>>> a = "abcdef"
>>> a is "abcdef"
True

But when you do:

>>> a = "".join([chr(i) for i in range(ord('a'), ord('g'))])
>>> a
'abcdef'
>>> a is "abcdef"
False

In the C programming language, using a string with "" will make it a const char *. I think this is what is happening here.



来源:https://stackoverflow.com/questions/35805768/what-are-the-rules-for-cpythons-string-interning

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!