The problem comes when I looked up Wikipedia page of Matrix multiplication algorithm
It says:
This algorithm has a critical path length of
There are two aspects to this question, addressing which the question will be completely answered.
Going after the questions one by one.
The simple answer to this point is in understanding two terms viz. Task Granularity and Task Dependency.
So, for a process that has four steps A, B, C, D such that D is dependent on C, C is dependent on B and B is dependent on A, then a single processor will work as fast as 2 processors, will work as fast as 4 processors, will work as fast as infinite processors.
This explains the first bullet.
n X n into four blocks of size [n/2] X [n/2] each and then continue dividing until you reach down to a single element (or matrix of size 1 X 1) the number of levels this tree-like design would have is O(log (n)).If we go about proving the run-time by using †Master Theorem, we could calculate the same using the recurrence:
M(n) = 8 * M(n/2) + Θ(Log n)
This is case - 2 of Master Theorem and gives the run-time as Θ(log2n).
The difference between Big O and Theta is that Big O only tells that a process won't go above what's mentioned by Big O, while Theta tells that function is not just having an upper bound, but also the lower bound with what's mentioned in Theta. Hence, effectively, the plot of the complexity of the function would be sandwiched between the same function, multiplied with two different constants as depicted in the image below, or in other words, the function will grow at the same rate:
Image taken from: http://xlinux.nist.gov/dads/Images/thetaGraph.gif
So, I'd say that for your case, you can ignore the notation and you are not "gravely" mistaken between the two.
I'd like to define another term called Speedup or Parallelism. It is defined as the ratio of best sequential execution time (also called work) and parallel execution time. The best sequential access time, already given on the wikipedia page you've linked to is O(n3). The parallel execution time is O(log2n).
Hence, the speedup is = O(n3/log2n).
And even though the speedup looks so simple and straightforward, achieving it in actual cases is very difficult due to due to the communication costs that are inherent in moving data.
†Master Theorem
Let a be an integer greater than or equal to 1 and b be a real number greater than 1. Let c be a positive real number and d a nonnegative real number. Given a recurrence of the form -
T (n) = a * T(n/b) + nc when n > 1
then for n a power of b, if