Python 3 string index lookup is O(1)?

試著忘記壹切 提交于 2020-03-22 06:43:05

问题


Short story:

Is Python 3 unicode string lookup O(1) or O(n)?

Long story:

Index lookup of a character in a C char array is constant time O(1) because we can with certainty jump to a contiguous memory location:

const char* mystring = "abcdef";
char its_d = mystring[3];

Its the same as saying:

char its_d = *(mystring + 3);

Because we know that sizeof(char) is 1 as C99, and because of ASCII one character fits in one byte.

Now, in Python 3, now that string literals are unicode strings, we have the following:

>>> mystring = 'ab€cd'
>>> len(mystring)
5
>>> mybytes = mystring.encode('utf-8')
>>> len(mybytes)
7
>>> mybytes
b'ab\xe2\x82\xaccd'
>>> mystring[2]
'€'
>>> mybytes[2]
226
>> ord(mystring[2])
8364

Being UTF-8 encoded, byte 2 is > 127 and thus uses a multibyte representation for the character 3.

I cannot other than conclude that a index lookup in a Python string CANNOT be O(1), because of the multibyte representation of characters? That means that mystring[2] is O(n), and that somehow a on-the-fly interpretation of the memory array is being performed ir order to find the character at index? If that's the case, did I missed some relevant documentation stating this?

I made some very basic benchmark but I cannot infer an O(n) behaviour: https://gist.github.com/carlos-jenkins/e3084a07402ccc25dfd0038c9fe284b5

$ python3 lookups.py
Allocating memory...
Go!
String lookup: 0.513942 ms
Bytes lookup : 0.486462 ms

EDIT: Updated with better example.


回答1:


UTF-8 is the default source encoding for Python. The internal representation uses fixed-size per-character elements in both Python 2 and Python 3. One of the results is that accessing characters in Python (Unicode) string objects by index has O(1) cost.

The code and results you presented do not demonstrate otherwise. You convert a string to a UTF-8-encoded byte sequence, and we all know that UTF-8 uses variable-length code sequences, but none of that says anything about the internal representation of the original string.



来源:https://stackoverflow.com/questions/41172997/python-3-string-index-lookup-is-o1

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!