Truth-table reduction to ternary logic operations, vpternlog

前端 未结 2 1352
梦谈多话
梦谈多话 2021-02-08 19:36

I have many truth-tables of many variables (7 or more) and I use a tool (eg logic friday 1) to simplify the logic formula. I could do that by hand but that is much too error pro

2条回答
  •  刺人心
    刺人心 (楼主)
    2021-02-08 20:16

    Outside of just leaving it to the compiler, or the hand-wavy suggestions in the 2nd section of my answer, see HJLebbink's self-answer using FPGA logic-optimization tools. (This answer ended up with the bounty because it failed to attract such an answer from anyone else; it's not really bounty-worthy. :/ I wrote it before there was a bounty, but don't have anything else useful to add.)


    ICC18 optimizes chained _mm512_and/or/xor_epi32 intrinsics into vpternlogd instructions, but gcc/clang don't.

    On Godbolt for this and a more complicated function using some inputs multiple times:

    #include 
    
    __m512i logic(__m512i a, __m512i b, __m512i c,
                   __m512i d, __m512i e, __m512i f, __m512i g) {
    //     return _mm512_and_epi32(_mm512_and_epi32(a, b), c);
         return a & b & c & d & e & f;
    }
    

    gcc -O3 -march=skylake-avx512 nightly build

    logic:
        vpandq  zmm4, zmm4, zmm5
        vpandq  zmm3, zmm2, zmm3
        vpandq  zmm4, zmm4, zmm3
        vpandq  zmm0, zmm0, zmm1
        vpandq  zmm0, zmm4, zmm0
        ret
    

    ICC18 -O3 -march=skylake-avx512

     logic:
        vpternlogd zmm2, zmm0, zmm1, 128                        #6.21
        vpternlogd zmm4, zmm2, zmm3, 128                        #6.29
        vpandd    zmm0, zmm4, zmm5                              #6.33
        ret                                                     #6.33
    

    IDK how good it is at picking optimal solutions when each variable is used more than once in different subexpressions.


    To see if it does a good job, you have to do the optimization yourself. You want to find sets of 3 variables that can be combined together into a single boolean value without still needing those 3 variables anywhere else in the expression.

    I think it's possible for a truth table with more than 3 inputs to not simplify down this way, to a smaller truth table where one of the columns is the result of a ternary combination of 3 of the inputs. e.g. I think it's not guaranteed that it's possible to simplify a 4 input function to vpternlog + AND, OR, or XOR.

    I'd definitely worry that compilers might pick 3 inputs to combine that didn't result in as much simplification as a different choice of 3.

    It might even be optimal for a compiler to start with a binary operation or two on a couple pairs to set up for a ternary operation, especially if that enables better ILP.

    You could probably write a brute-force truth-table optimizer that looked for triplets of variables that could be combined to make a smaller table for just the ternary result and the rest of the table. But I'm not sure a greedy approach is guaranteed to give the best results. If there are multiple ways to combine with the same total instruction count, they're probably not all equivalent for ILP (Instruction Level Parallelism).

提交回复
热议问题