The GNU assembler's AT&T syntax traces its origins to the Unix assembler 1, which itself took its input syntax mostly from the PDP-11 PAL-11 assembler (ca. 1970).
Can anyone explain to me why every constant in AT&T syntax has a '$' in front of it?
It allows to distinguish immediate constants from memory addresses. Intel syntax does it the other way around, with memory references as [foo].
Incidentally, MASM (the Microsoft Assembler) doesn't need a distinction at the syntax level, since it can tell whether the operand is a symbolic constant, or a label. Other assemblers for x86 actively avoid such guesses, since they can be confusing to readers, e.g: TASM in IDEAL mode (it warns on memory references not in brackets), nasm, fasm.
PAL-11 used # for the Immediate addressing mode, where the operand followed the instruction. A constant without # meant Relative addressing mode, where a relative address followed the instruction.
Unix as used the same syntax for addressing modes as DEC assemblers, with * instead of @, and $ instead of #, since @ and # were apparently inconvenient to type 2.
Why do all registers have a '%'?
In PAL-11, registers were defined as R0=%0, R1=%1, ... with R6 also referred to as SP, and R7 also referred to as PC. The DEC MACRO-11 macro-assembler allowed referring to registers as %x, where x could be an arbitrary expression, e.g. %3+1 referred to %4.
Is this just another attempt to get me to do a lot of lame typing?
Nope.
Also, am I the only one that finds: 16(%esp) really counterintuitive compared to [esp+16]?
This comes from the PDP-11 Index addressing mode, where a memory address is formed by summing the contents of a register and an index word following the instruction.
I know it compiles to the same thing but why would anyone want to type a lot of '$' and '%'s without a need to? - Why did GNU choose this syntax as the default?
It came from the PDP-11.
Another thing, why is every instruction in at&t syntax preceded by an: l? - I do know its for the operand sizes, however why not just let the assembler figure that out? (would I ever want to do a movl on operands that are not that size?)
gas can usually figure it out. Other assemblers also need help in particular cases.
The PDP-11 would use b for byte instructions, e.g: CLR vs CLRB. Other suffixes appeared in VAX-11: l for long, w for word, f for float, d for double, q for quad-word, ...
Last thing: why are the mov arguments inverted?
Arguably, since the PDP-11 predates Intel microprocessors, it is the other way around.