What is Python's sequence protocol?

♀尐吖头ヾ 提交于 2019-11-29 11:31:55

问题


Python does a lot with magic methods and most of these are part of some protocol. I am familiar with the "iterator protocol" and the "number protocol" but recently stumbled over the term "sequence protocol". But even after some research I'm not exactly sure what the "sequence protocol" is.

For example the C API function PySequence_Check checks (according to the documentation) if some object implements the "sequence protocol". The source code indicates that this is a class that's not a dict but implements a __getitem__ method which is roughly identical to what the documentation on iter also states:

[...]must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).[...]

But the requirement to start with 0 isn't something that's "implemented" in PySequence_Check.

Then there is also the collections.abc.Sequence type, which basically says the instance has to implement __reversed__, __contains__, __iter__ and __len__.

But by that definition a class implementing the "sequence protocol" isn't necessarily a Sequence, for example the "data model" and the abstract class garantuee that a sequence has a length. But a class just implementing __getitem__ (passing the PySequence_Check) throws an exception when using len(an_instance_of_that_class).

Could someone please clarify for me the difference between a sequence and the sequence protocol (if there's a definition for the protocol besides reading the source code) and when to use which definition?


回答1:


It's not really consistent.

Here's PySequence_Check:

int
PySequence_Check(PyObject *s)
{
    if (PyDict_Check(s))
        return 0;
    return s != NULL && s->ob_type->tp_as_sequence &&
        s->ob_type->tp_as_sequence->sq_item != NULL;
}

PySequence_Check checks if an object provides the C sequence protocol, implemented through a tp_as_sequence member in the PyTypeObject representing the object's type. This tp_as_sequence member is a pointer to a struct containing a bunch of functions for sequence behavior, such as sq_item for item retrieval by numeric index and sq_ass_item for item assignment.

Specifically, PySequence_Check requires that its argument is not a dict, and that it provides sq_item.

Types with a __getitem__ written in Python will provide sq_item regardless of whether they're conceptually sequences or mappings, so a mapping written in Python that doesn't inherit from dict will pass PySequence_Check.


On the other hand, collections.abc.Sequence only checks whether an object concretely inherits from collections.abc.Sequence or whether its class (or a superclass) is explicitly registered with collections.abc.Sequence. If you just implement a sequence yourself without doing either of those things, it won't pass isinstance(your_sequence, Sequence). Also, most classes registered with collections.abc.Sequence don't support all of collections.abc.Sequence's methods. Overall, collections.abc.Sequence is a lot less reliable than people commonly expect it to be.


As for what counts as a sequence in practice, it's usually anything that supports __len__ and __getitem__ with integer indexes starting at 0 and isn't a mapping. If the docs for a function say it takes any sequence, that's almost always all it needs. Unfortunately, "isn't a mapping" is hard to test for for reasons similar to how "is a sequence" is hard to pin down.




回答2:


For a type to be in accordance with the sequence protocol, these 4 conditions must be met:

  • Retrieve elements by index

    item = seq[index]

  • Find items by value

    index = seq.index(item)

  • Count items

    num = seq.count(item)

  • Produce a reversed sequence

    r = reversed(seq)



来源:https://stackoverflow.com/questions/43566044/what-is-pythons-sequence-protocol

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!