You have reinvented RISC
Hmm, the objection you are making to the classic x86 (see CISC) is exactly what motivated the designers of the RISC CPU architectures to create simple, aligned, fixed-size instruction set architectures.
It turns out that x86 these days does in fact translate the user-visible ISA to a more RISC-like micro-op stream that lives in an internal cache.
Good observation.
Notes.
1. Micro-ops are just one available technique. In the general case, as long as the decoding and alignment of instructions takes place in one or more pipeline stages, the actual time taken will not be added to the average instruction execution time. If branch prediction is working and the pipeline is kept full, the extra time it takes to decode and align the instructions is handled by logic executing in parallel with the actual instruction operations. With millions of gates available to today's designers they can dedicate a lot of logic to decoding the complex x86 ISA.
2. You mentioned the memory bus width; it turns out that the memory path is typically greater than either 32 or 64-bits, also. The architectural word size simply refers to the ALU and pointer size. The actual width of memory and cache interfaces is often 2x or 4x the architectural word size.