I\'m writing a program with a consumer thread and a producer thread, now it seems queue synchronization is a big overhead in the program, and I looked for some lock free que
I think the allocator can be a performance problem. You can try to use a custom multithreaded memory allocator, that use a linked-list for maintaing freed blocks. If your blocks are not (nearly) the same size, you can implement a "Buddy system memory allocator", witch is very fast. You have to synchronise your queue (ring buffer) with a mutex.
To avoid too much synchronisation, you can try write/read multiple values to/from the queue at each access.
If you still want to use, lock-free algorithms, then you must use pre-allocated data or use a lock-free allocator. There is a paper about a lock-free allocator "Scalable Lock-Free Dynamic Memory Allocation", and an implementation Streamflow
Before starting with Lock-free stuff, look at:Circular lock-free buffer