next up previous
Next: About this document Up: No Title Previous: More on Data Structures

Computations and memory

Each time we access a piece of data, we should try to get as many floating point ops out of that reference as we can. A performance monitor is the flops per reference, and we want that to be high. First, lets look at a simplier set of examples.

Consider, for example, the operations listed

calc       flops/pass          operation
1              2                v1_i = v1_i + a*v2_i
2              8            v1_i = v1_i + s2*v2_i + s3*v3_i + s4*v4_i + s5*v5_i

3              1                v1_i = v2_i/v3_i
4              2                v1_i = v1_i + s2*v2_{idx(i)}
5              2                v1_i = v2_i - v3_i*v1_{i-1}
6              2                 s = s + v1_i *v2_i

More generally, let us study how much memory is required to sustain performance on today's supercomputers. First some terminology. Let us define:

tex2html_wrap_inline144 is the theoretical peak performance

tex2html_wrap_inline146 is the real performance you can achieve on a special problem

tex2html_wrap_inline148 is the sustained performance

Typically, tex2html_wrap_inline150 on massively parallel machines; on shared memory computers, tex2html_wrap_inline152 . tex2html_wrap_inline154 is typical.

Now consider a typical operation, say a nonlinear system solve that involves 5 Newton iterations where each Newton iteration involves 5000 matrix-vector multiplications. A matrix by vector multiply involves one multiplication and one addition. Let Nm be the number of non-zero elements in the matrix. Then the number of operations to be executed in, say, one hour, is

displaymath158

In one hour, we can execute

displaymath160

operations. Equating, we find

displaymath162

Doing the arithmetic, we find

displaymath164

That is, a 500 Mflop machine should have about 500 MB memory for reasonable arithmetic performance.



E. Bruce Pitman
Mon Nov 5 15:15:58 EST 2001