To get a better understanding about parallel, I am comparing a set of different pieces of code.
Here is the basic one (code_piece_1).
Q : Why did parallel computing take longer than a serial one?
Because there are way more instructions loaded onto CPU to get executed ( "awfully" many even before a first step of the instructed / intended block of calculations gets first into the CPU ), then in a pure-[SERIAL] case, where no add-on costs were added to the flow-of-execution.
For these (hidden from the source-code) add-on operations ( that you pay both in [TIME]-domain ( duration of such "preparations" ) and in [SPACE]-domain ( allocating more RAM to contain all involved structures needed for [PARALLEL]-operated code ( well, most often a still just-[CONCURRENT]-operated code, if we are pedantic and accurate in terminology ), which again costs you in [TIME], as each and every RAM-I/O costs you about more than 1/3 of [us] ~ 300~380 [ns] )
Unless your workload-package has "sufficient enough" amount of work, that can get executed in parallel ( non-blocking, having no locks, no mutexes, no sharing, no dependencies, no I/O, ... indeed independent having minimum RAM-I/O re-fetches ), it is very easy to "pay way more than you ever get back".
For details on the add-on costs and things that have such strong effect on resulting Speedup, start reading the criticism of blind using the original, overhead naive formulation of the Amdahl's law here.