Why does push_back() cause crash within malloc()'ed data?

问题

Why does this crash? I did find out malloc() doesnt call constructors, so I called them myself manually, but it still crashes, I do not understand why.

PS. I know std::vector and new[] exists. Do not tell me to use vectors/new[] as an answer.

struct MyStruct {
    vector<int> list;
};
void make_crash(){
    MyStruct *array = (MyStruct *)malloc(100*sizeof(MyStruct));
    MyStruct element; // initialize element here since malloc() doesnt do it.
    array[0] = element; // copy, everything should be alright?
    array[0].list.push_back(1337); // nope, BANG!
    // The above line makes these:
    // First-chance exception at 0x7c970441 in test.exe: 0xC0000005: Access violation reading location 0xbaadf005.
    // First-chance exception at 0x00401cd0 in test.exe: 0xC0000005: Access violation reading location 0xbaadf00d.
    // Unhandled exception at 0x00401cd0 in test.exe: 0xC0000005: Access violation reading location 0xbaadf00d.
}

回答1:

When you assign to a MyStruct

array[0] = element;

there is first an attempt to destroy the old members of the struct - but there isn't any, because they were never constructed. Boom!

The easiest way to get a hundred MyStructs is to use another vector

vector<MyStruct>  v(100);

No need to use malloc.

回答2:

On the line array[0] = element; you're invoking the operator= of array[0]. Since array[0] is uninitialized, this is undefined behavior. Calling any method or operator, including operator= on an object whose constructor has not been invoked, is undefined behavior.

To fix your issue, you'd either need to use placement new to invoke the constructor of array[0] or just use new instead of malloc. Unless you have a good reason to use malloc, the latter is greatly preferable (or even better: use a vector).

回答3:

That's not how you initialize an element in-place. The (implicitly created) assignment operator (which calls vector's assignment operator) is being called on an object that doesn't exist which is obviously bad news.

You have to use placement new instead:

new (array) MyStruct;

For arrays:

new (array) MyStruct[100];

回答4:

MyStruct *array = (MyStruct *)malloc(100*sizeof(MyStruct));

This is where you go wrong.

array is not a pointer to one or more MyStruct objects, regardless of whatever type you gave it. The return value from malloc is a void*. The rules of C++ do not allow you to implicitly cast from a void* to other types, which is why you had to put that (MyStruct*) in there. The need for an explicit cast alone should tell you that you're doing something shady.

The rules of C++ state that if you explicitly cast a void* to some Type* (outside of certain, special types), this is only legal if the void* you're doing the cast on was originally a Type* that was itself cast into a void*. This is not the cause; this void* comes from malloc and never was a MyStruct*. You're lying to the compiler, and therefore provoking undefined behavior. Hence the crashing.

If you want defined behavior, then you need to actually use C++, instead of this "I can't believe it's not C++" language you're inventing. For example:

void *block = malloc(100 * sizeof(MyStruct));
MyStruct* array = new(block) MyStruct[100];

Notice the complete lack of cast operations here.

Of course, deleting this array is a pain:

for(int i = 99; i >= 0; --i)
  array[i].~MyStruct();

free(block);

Notice that you have to destroy them backwards, in the reverse order that they were constructed in.

wondering how these things even work internally. does new[] make some extra memory telling my CPU something that malloc doesnt? and why/what/how to imitate it without new[] ? or new placement? how do i write it in raw binary code for CPU? is it even possible? what exactly is the magic new[] does here?

All of that is implementation dependent. Exactly what happens, as far as the language is concerned, is clearly defined. Placement new will, among other things, call the constructor for the object. Array placement new will call the constructors for all of the objects in the array, in order from first to last. If one of them throws, then it will call the destructor on any previously constructed objects, then emit the exception.

Exactly what is being done is implementation dependent, no moreso than exactly how inheritance is implemented and so forth. Obviously the compiler emits some code for it, but again, exactly what is emitted is implementation dependent.

How you would "write it in raw binary code for CPU" is not possible, so long as you're actually writing C++. To implement placement new, you would have to be able to call the class's constructor. And... well, that's what placement new is for. You're not allowed to get even a member pointer to the constructor (and even if you could, the contents of member pointers are implementation dependent, and they're not always going to be a naked pointer to some assembly function). So there's no way to even identify the constructor code without platform-specific jury-rigging.

You can learn how to do it for a particular system by looking at the generated assembly for a call to placement new. But it would be different on every compiler.

来源：https://stackoverflow.com/questions/10857402/why-does-push-back-cause-crash-within-malloced-data

标签

c++

visual-c++

malloc

stdvector