The normal answers to why data alignment is to access more efficiently and to simplify the design of CPU.
A relevant question and its answers is here. And another s
The answer to your question is in the question itself.
The CPU has access granularity of 4 bytes. So it can only slurp up data in chunks of 4 bytes.
If you had accessed the address 0x0, the CPU would give you the 4 bytes from 0x0 to 0x3.
When you issue an instruction to access data from address 0x1, the CPU takes that as a request for 4 bytes of data starting at 0x1 ( ie. 0x1 to 0x4 ). This can't be interpreted in any other way essentially because of the granularity of the CPU. Hence, the CPU slurps up data from 0x0 to 0x3 & 0x4 to 0x7 (ergo, 2 accesses), then puts the data from 0x1 to 0x4 together as the final result.