Python: Similar functionality in struct and array vs ctypes

蓝咒 提交于 2021-02-07 22:17:37

问题


Python provides the following three modules that deal with C types and how to handle them:

  • struct for C structs
  • array for arrays such as those in C
  • ctypes for C functions, which necessarily entails dealing with C’s type system

While ctypes seems more general and flexible (its main task being “a foreign function library for Python”) than struct and array, there seems to be significant overlap in functionality between these three modules when the task is to read binary data structures. For example, if I wanted to read a C struct

struct MyStruct {
    int a;
    float b;
    char c[12];
};

I could use struct as follows:

a, b, c = struct.unpack('if12s', b'\x11\0\0\0\x12\x34\x56\x78hello world\0')
print(a, b, c)
# 17 1.7378244361449504e+34 b'hello world\x00'

On the other hand, using ctypes works equally well (although a bit more verbose):

 class MyStruct(ctypes.Structure):
     _fields_ = [
         ('a', ctypes.c_int),
         ('b', ctypes.c_float),
         ('c', ctypes.c_char * 12)
     ]
 s = MyStruct.from_buffer_copy(b'\x11\0\0\0\x12\x34\x56\x78hello world\0')
 print(s.a, s.b, s.c)
 # 17 1.7378244361449504e+34 b'hello world'

(Aside: I do wonder where the trailing '\0' went in this version, though…)

This seems to me like it violates the principles in “The Zen of Python”:

  1. There should be one—and preferably only one—obvious way to do it.

So how did this situation with several of these similar modules for binary data handling arise? Is there a historical or practical reason? (For example, I could imagine omitting the struct module entirely and simply adding a more convenient API for reading/writing C structs to ctypes.)


回答1:


Disclaimer: this post is speculation based on my understanding of the "division of labor" in Python stdlib, not on factual referenceable info.

Your question stems from the fact that "C structs" and "binary data" tend to be used interchangeably, which, while correct in practice, is wrong in a technical sense. The struct documentation is also misleading: it claims to work on "C structs", while a better description would be "binary data", with some disclaimers about C compatibility.

Fundamentally, struct, array and ctypes do different things. struct deals with converting Python values into binary in-memory formats. array deals with efficiently storing a lot of values. ctypes deals with the C language(*). The overlap in functionality stems from the fact that for C, the "binary in-memory formats" are native, and that "efficiently storing values" is packing them into a C-like array.

You will also note that struct lets you easily specify endianness, because it deals with packing and unpacking binary data in many different ways it can be packed; while in ctypes it is more difficult to get non-native byte order, because it uses the byte order that is native to C.

If your task is reading binary data structures, there's increasing levels of abstraction:

  1. Manually splitting the byte array and converting parts with int.from_bytes and the like
  2. Describing the data with a format string and using struct to unpack in one go
  3. Using a library like Construct to describe the structure declaratively in logical terms.

ctypes don't even figure here, because for this task, using ctypes is pretty much taking a round-trip through a different programming language. The fact that it works just as well for your example is incidental; it works because C is natively suited to expressing many ways of packing binary data. But if your struct was mixed-endian, for instance, it would be very difficult to express in ctypes. Another example is half-precision float which doesn't have a C equivalent (see here).

In this sense, it's also very reasonable that ctypes use struct - after all, "packing and unpacking binary data" is a subtask of "interfacing with C".

On the other hand, it would make no sense for struct to use ctypes: it would be like using the email library for character encoding conversions because it's a task that an e-mail library can do.

(*) well, basically. More precise would be something like "C-based environments", i.e., how modern computers work on low level due to co-evolution with C as the primary systems language.



来源:https://stackoverflow.com/questions/52004279/python-similar-functionality-in-struct-and-array-vs-ctypes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!