next up previous
Next: Architecture Basics Up: A DIGRESSION ON CHIPS Previous: Chip components

Performance

Here is a quick idea of factors affecting performance.

The theoretical peak performance, r_theory, is the number of operations you could conceivably be able to perform, if all data was in cache and no bottlenecks occurred. In essence, this comes down to any super-scalar or cerer-vector, vector, or parallel scaling times the clockspeed. This is the speed a salesman will quote to you.

The real speed, r_real, is the "real" performance on a specific ideal problem, usually only for brief periods of time.

The sustained speed, r_sustained, is the kind of speed you can see for (most of) an entire job, maybe excepting I/O.

Typically, r_sustained   1/10 r_theory; for massively parallel clusters, it is more like 1/100.

Memory is the usual culprit in slowing down a calculation. Consider the problem of solving a nonlinear problem, say involving 5 Newton iterations, each requiring an iterative solve with 5000 matrix-vector-multiplies (typical kinds of numbers). If we call S=# non-zero elements in the matrix to store, we have about S*2*5*5000 operations (2 ops each Newton) In one hour, we could execute about r_real * 10^6 * 3600 operations (measuring in Mflops) Equating and solving for S, we get

S = 3600/(2*5*5000) * r_real * 10^6 words = 0.72 r_real Mw Using r_real = 0.2 r_theory, we have S   0.1-1 r_theory Mb. That is, 1 Mflop performance requires about 1 Mb memory, and 1 Gflop about 1 Gb memory.



E. Bruce Pitman
Wed Sep 13 22:27:10 EDT 2000