When I used to program embedded systems and early 8/16-bit PCs (6502, 68K, 8086) I had a pretty good handle on exacly how long (in nanoseconds or microseconds) each instruct
Modern processors do even more tricky things.
Out-of-order execution. If it is possible to do so without affecting correct behavior, the processors may execute instructions in a different order than they are listed in your program. This can hide the latency of long-running instructions.
Register renaming. Processors often have more physical registers than addressable registers in their instruction set (so-called "architectural" registers). This can be either for backward compatibility, or simply to enable efficient instruction encodings. As a program runs, the processor will "rename" the architectural registers it uses to whatever physical registers are free. This allows the processor to realize more parallelism than existed in the original program.
For instance, if you have a long sequence of operations on EAX and ECX, followed by instructions that re-initialize EAX and ECX to new values and perform another long sequence of operations, the processor can use different physical registers for both tasks, and execute them in parallel.
The Intel P6 microarchitecture does both out-of-order execution and register renaming. The Core 2 architecture is the latest derivative of the P6.
To actually answer your question - it is basically impossible for you to determine performance by hand in the face of all these architectural optimizations.