Consider a whole program that contains m operations, to be executed on a parallel machine with p processors. If the fraction q of the operations can be executed in parallel, and 1-q must operate in serial mode, we have the speedup
If q=1, all operations are parallelizable, and the speedup is p, the number of processors. If q=0.5, a 32 node machine offers a speedup of just under 2. On 64 nodes the same q=0.5 offers only negligibly greater speedup. To see real gains, q must be closer to 0.9 or 0.95, fractions that are difficult to obtain.
Moreover, Amdahl's Law as presented ignores many of the realities of computing. Among the idealizations we ahve made are (i) parallelization is free, communication is free; (ii) parallel operation is a multiple of sequential operation, without changing data structures or operational design. Neither of these assumptions holds in practice.
We define the efficiency of a parallel algorithm as the speedup per processor e = speedup/p.
It is necessary for us to ask honestly How scalable is my job/algorithm? This question must be answered with care. Is memory scaling or held fixed? Is my job size increasing, with p/m fixed?
If Amdahl's law shows greatly diminished returns on a parallel computer, we ask again Why supercomputers?
Part of the answer is speedup. But now-a-days, the larger part of the answer is size. Small machines can handle only small problems. To solve flow around an airplane or global climate modeling, one needs massive supercomputers because only in that way can you get the massive amounts of memory necessary to solve these problems (recall a moderate flow around an airplane might require 10^8 unknowns, or about 10^4 Mb, each of which is updated at O(10^3) timesteps).
Because of the tradeoff between Amhdal and scaling, the most effective use of today's supercomputers comes when problem size is scaled up proportional to the number of processors, and p/m is fixed.