What I\'m trying to achieve is based on each bit in a byte, set to all ones in each dword in a ymm register (or memory location)
e.g.
al = 0110 0001
Preface: I know that this doesn't fulfill the (whole) requirements of the question, so this answer is not acceptable. I just post it for future reference.
There is a new AVX512(VL|BW) instruction named VPMOVM2B which does what you want in exactly one instruction:
VPMOVM2B ymm1, k1
Sets each byte in YMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1.
I couldn't test it, but it should be what you want.