I\'m writing a program with a consumer thread and a producer thread, now it seems queue synchronization is a big overhead in the program, and I looked for some lock free que
Adding malloc would kill any performance gain you may make and a lock based structure would be just as effective. This is so because malloc requires some sort of CAS lock over the heap and hence some forms of malloc have their own lock so you may be locking in the Memory Manager.
To use malloc you would need to pre allocate all the nodes and manage them with another queue...
Note you can make some form of expandable array which would need to lock if it was expanded.
Also while interlocked are lock free on the CPU they do placea memory lock and block memory for the duration of the instruction and often stall the pipeline.