I have a quantifiable & repeatable problem using the Task Parallel Library, BlockingCollection, ConcurrentQueue & GetCo
You can't use GetConsumingEnumerable() in Parallel.ForEach().
Use the GetConsumingPartitioner from the TPL extras
In the blog post you will also get an explanation why can't use GetConsumingEnumerable()
The partitioning algorithm employed by default by both Parallel.ForEach and PLINQ use chunking in order to minimize synchronization costs: rather than taking the lock once per element, it'll take the lock, grab a group of elements (a chunk), and then release the lock.
i.e. Parallel.ForEach wait until it receives a group of work items before continuing. Exactly what your experiment shows.