fastest way to write a bitstream on modern x86 hardware

后端 未结 4 1311
忘了有多久
忘了有多久 2020-12-14 22:59

What is the fastest way to write a bitstream on x86/x86-64? (codeword <= 32bit)

by writing a bitstream I refer to the process of concatenating variable bit-length

4条回答
  •  温柔的废话
    2020-12-14 23:10

    I don't have the time to write it for you (not too sure your sample is actually complete enough to do so) but if you must, I can think of

    • using translation tables for the various input/output bit shift offsets; This optimization would make sense for fixed units of n bits (with n sufficiently large (8 bits?) to expect performance gains) In essence, you'd be able to do

      destloc &= (lookuptable[bits_left_in_buffer][input_offset][codeword]);
      

    disclaimer: this is very sloppy pseudo code, I just hope it conveys my idea of a lookup table o prevent bitshift arithmetics

    • writing it in assembly (I know i386 has XLAT, but then again, a good compiler might already use something like that) ; Also, XLAT seems limited to 8 bits and the AL register, so it's not really versatile

    Update

    Warning: be sure to use a profiler and test your optimization for correctness and speed. Using a lookup table can result in poorer performance in the light of locality of reference. So, you might need to change the bit-streaming thread on a single core (set thread affinity) to get the benefits, and you might have to adapt the lookup table size to the processor's L2 cache.

    Als, have a look at SIMD, SSE4 or GPU (CUDA) instruction sets if you know you'll have certain features at your disposal.

提交回复
热议问题