I\'m writing a program with a consumer thread and a producer thread, now it seems queue synchronization is a big overhead in the program, and I looked for some lock free que
You should look at FastFlow library
If you are worried about performance, adding malloc() to the mix won't help things. And if you are not worried about performance, why not simply control access to the queue via a mutex. Have you actually measured the performance of such an implementation? It sounds to me as though you are going down the familar route of premature optimisation.
I think the allocator can be a performance problem. You can try to use a custom multithreaded memory allocator, that use a linked-list for maintaing freed blocks. If your blocks are not (nearly) the same size, you can implement a "Buddy system memory allocator", witch is very fast. You have to synchronise your queue (ring buffer) with a mutex.
To avoid too much synchronisation, you can try write/read multiple values to/from the queue at each access.
If you still want to use, lock-free algorithms, then you must use pre-allocated data or use a lock-free allocator. There is a paper about a lock-free allocator "Scalable Lock-Free Dynamic Memory Allocation", and an implementation Streamflow
Before starting with Lock-free stuff, look at:Circular lock-free buffer