Say I have implemented all the ADD, AND, SHF, JUMP, BR, LDW, LDB(load word load byte...) ........except MUL (multiple) instructions in an assembly machine. Now I want to write a
Seems you are using 8/16-bit processor similar to 8080, 6502, 6800 and analogs. Yep, a 8-iteration cycle of shifts and adds are enough and almost optimal. OTOH, if you have 1020 bytes for a constant table, the approach using the following formula could be the fastest one:
a*b = square(a+b)/4 - square(a-b)/4
If the arguments are unsigned, max of a+b is 510. You need to keep only integer parts of x**2/4 for any x, because fractional ones in the formula will compensate each other; so, the mapping is: 0 -> 0, 1 -> 0, 2 -> 1, 3 -> 2, 4 -> 4, ..., 510 -> 65025. For signed arguments, the table is two times smaller.
There are many other approaches for fast multiplication, including almost linear cost; see e.g. Donald Knuth's "The Art of Computer Programming" legendary book series, volume 2. But all they have too huge overhead in case of 8-bit arguments.