Can I use memcpy to write to multiple adjacent Standard Layout sub-objects?

99封情书 提交于 2019-12-07 02:59:39

问题


Disclaimer: This is trying to drill down on a larger problem, so please don't get hung up with whether the example makes any sense in practice.

And, yes, if you want to copy objects, please use / provide the copy-constructor. (But note how even the example does not copy a whole object; it tries to blit some memory over a few adjacent(Q.2) integers.)


Given a C++ Standard Layout struct, can I use memcpy to write to multiple (adjacent) sub-objects at once?

Complete example: ( https://ideone.com/1lP2Gd https://ideone.com/YXspBk)

#include <vector>
#include <iostream>
#include <assert.h>
#include <inttypes.h>
#include <stddef.h>
#include <memory.h>

struct MyStandardLayout {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;

    MyStandardLayout()
    : mem_a('a')
    , num_1(1 + (1 << 14))
    , num_2(1 + (1 << 30))
    , num_3(1LL + (1LL << 62))
    , mem_z('z')
    { }

    void print() const {
        std::cout << 
            "MySL Obj: " <<
            mem_a << " / " <<
            num_1 << " / " <<
            num_2 << " / " <<
            num_3 << " / " <<
            mem_z << "\n";
    }
};

void ZeroInts(MyStandardLayout* pObj) {
    const size_t first = offsetof(MyStandardLayout, num_1);
    const size_t third = offsetof(MyStandardLayout, num_3);
    std::cout << "ofs(1st) =  " << first << "\n";
    std::cout << "ofs(3rd) =  " << third << "\n";
    assert(third > first);
    const size_t delta = third - first;
    std::cout << "delta =  " << delta << "\n";
    const size_t sizeAll = delta + sizeof(MyStandardLayout::num_3);
    std::cout << "sizeAll =  " << sizeAll << "\n";

    std::vector<char> buf( sizeAll, 0 );
    memcpy(&pObj->num_1, &buf[0], sizeAll);
}

int main()
{
    MyStandardLayout obj;
    obj.print();
    ZeroInts(&obj);
    obj.print();

    return 0;
}

Given the wording in the C++ Standard:

9.2 Class Members

...

13 Nonstatic data members of a (non-union) class with the same access control (Clause 11) are allocated so that later members have higher addresses within a class object. (...) Implementation alignment requirements might cause two adjacent members not to be allocated immediately after each other; (...)

I would conclude that it is guaranteed that num_1 to num_3 have increasing addresses and are adjacent modulo padding.

For the above example to be fully defined, I see these requirements, of which I am not sure they hold:

  • memcpy must be allowed to write to multiple "memory objects" in this way at once, i.e. specifically

    • Calling memcpy with the target address of num_1 and a size that is larger than the size of the num_1 "object" is legal. (Given that num_1 is not part of an array.) (Is memcpy(&a + 1, &b + 1, 0) defined in C11? seems a good related question, but doesn't quite fit.)
    • The C++ (14) Standard, AFAICT, refers description of memcpy to the C99 Standard, and that one states:

    7.21.2.1 The memcpy function

    2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

    So for me the question here wrt. this is whether the target range we have here can be considered "an object" according to the C or C++ Standard. Note: A (part of an) array of chars, declared and defined as such, certainly can be assumed to count as "an object" for the purposes of memcpy because I'm pretty sure I'm allowed to copy from one part of a char array to another part of (another) char array.

    So then the question would be if it is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array.

  • Calculating sizeAll is legal, that is usage of offsetof is legal as shown.

  • Writing to the padding in between the members is legal.

Do these properties hold? Have I missed anything else?


回答1:


§8.5

(6.2) — if T is a (possibly cv-qualified) non-union class type, each non-static data member and each base-class subobject is zero-initialized and padding is initialized to zero bits;

Now the standard does not actually say that these zero-bits will be writeable, but I can't think of an architecture that has this level of granularity on memory access permissions (nor would we want one to).

So I would say that in practice this re-writing zeros will always be safe, even if not specifically declared so by the Powers that Be.




回答2:


is legal to reinterpret the memory range of the three members as a "conceptual"(?) char array

No, arbitrary subsets of members of objects are not themselves an object of any kind. If you can't take the sizeof something, it's not a thing. Similarly, as suggested by the link you provided, if you can't identify the thing to std::is_standard_layout, it's not a thing.

Analogous would be

size_t n = (char*)&num_3 - (char*)&num_1;

It would compile, but it's UB: subtracted pointers must belong to the same object.

That said, I think you're in safe territory even if the standard isn't explicit. If MyStandardLayout is a standard layout, it stands to reason that a subset of it also is, even if it has no name and is not an identifiable type of its own.

But I wouldn't do it. Assignment is absolutely safe, and potentially faster than memcpy. If the subset is meaningful and has many members, I would consider making it an explicit struct, and using assignment instead of memcpy, taking advantage of the default member-wise copy constructor supplied by the compiler.




回答3:


Putting this as a partial answer wrt. memcpy(&num_1, buf, sizeAll):

Note: James' answer is much more concise and definitive.

I asked:

  • memcpy must be allowed to write to multiple "memory objects" in this way at once, i.e. specifically

    • Calling memcpy with the target address of num_1 and a size that is larger than the size of the num_1 "object" is legal.
    • The [C++ (14) Standard][2], AFAICT, refers description of memcpy to the [C99 Standard][3], and that one states:

    7.21.2.1 The memcpy function

    2 The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.

    So for me the question here wrt. this is whether the target range we have here can be considered "an object" according to the C or C++ Standard.

Thinking and searching a bit more, I found in the C Standard:

§ 6.2.6 Representations of types

§ 6.2.6.1 General

2 Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

So at least it is implied that "an object" => "contiguous sequence of bytes".

I'm not so bold to claim that the inverse -- "contiguous sequence of bytes" => "an object" -- holds, but at least "an object" doesn't seem to be defined more strictly here.

Then, as quoted in Q, §9.2/13 of the C++ Standard (and § 1.8/5) seem to guarantee that we do have a contiguous sequence of bytes (including padding).

Then, §3.9/3 says:

3 For any trivially copyable type T, if two pointers to T point to distinct T objects obj1 and obj2, where neither obj1 nor obj2 is a base-class subobject, if the underlying bytes (1.7) making up obj1 are copied into obj2, obj2 shall subsequently hold the same value as obj1. [ Example:

T* t1p;
T* t2p;       
     // provided that t2p points to an initialized object ...         
std::memcpy(t1p, t2p, sizeof(T));  
     // at this point, every subobject of trivially copyable type in *t1p contains        
     // the same value as the corresponding subobject in *t2p

—end example ]

So this explicitly allows the application of memcpy to whole objects of Trivially Copyable types.

In the example, the three members comprise a "trivially copyable sub-object", and indeed I think wrapping them in an actual subobject of distinct type would still mandate exactly the same memory layout for the explicit object as for the three members:

struct MyStandardLayout_Flat {
    char mem_a;
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
    char mem_z;
};

struct MyStandardLayout_Sub {
    int16_t num_1;
    int32_t num_2;
    int64_t num_3;
};

struct MyStandardLayout_Composite {
    char mem_a;
    // Note that the padding here is different from the padding in MyStandardLayout_Flat, but that doesn't change how num_* are layed out.
    MyStandardLayout_Sub nums;
    char mem_z;
};

The memory layout of nums in _Composite and the three members of _Flat should be layed out completely the same, because the same basic rules apply.

So in conclusion, given that the "sub object" num_1 to num_3 will be represented by an equivalent contiguous sequence of bytes as a full Trivially Copyable sub-object, I:

  • have a very, very hard time imagining an implementation or optimizer that breaks this
  • Would say it either can be:
    • read as Undefined Behavior, iff we conclude that C++§3.9/3 implies that only (full) objects of Trivially Copyable Type are allowed to be be treated thusly by memcpy or conclude from C99§6.2.6.1/2 and the general spec of memcpy 7.21.2.1 that the contiguous sequence of the num_* bytes does not comprise a "valid object" for the purposes of memcopy.
    • read as Defined Behavior, iff we conclude that C++§3.9/3 does not normatively limit the applicability of memcpy to other types or memory ranges and conclude that the definition of memcpy (and the "object term") in the C99 Standard allows to treat adjacent variables as a single object contiguous bytes target.


来源:https://stackoverflow.com/questions/39026871/can-i-use-memcpy-to-write-to-multiple-adjacent-standard-layout-sub-objects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!