Each time we access a piece of data, we should try to get as many floating point ops out of that reference as we can. A performance monitor is the flops per reference, and we want that to be high. First, lets look at a simplier set of examples.
Consider, for example, the operations listed
calc flops/pass operation
1 2 v1_i = v1_i + a*v2_i
2 8 v1_i = v1_i + s2*v2_i + s3*v3_i + s4*v4_i + s5*v5_i
3 1 v1_i = v2_i/v3_i
4 2 v1_i = v1_i + s2*v2_{idx(i)}
5 2 v1_i = v2_i - v3_i*v1_{i-1}
6 2 s = s + v1_i *v2_i