Here we talk briefly about some specifics on how data is stored and accessed.
Once again, matrix entries are stored column-wise in FORTRAN. A 4X4 matrix
is addressed
. Memory is organized
by banks. Between access to any bank, there is a latency period. For
illustration purposes, lets imagine 8 banks [128 or
256 common on chips today], with bank busy time (bbt) of
8 cycles between accesses (so 8 banks is a min if we are to be able to access
data without waiting on bbt). Thus we have:
. . data a13 a213 a33 a43 a14 a24 a34 a44 data a11 a21 a31 a41 a12 a22 a32 a42 bank 1 2 3 4 5 6 7 8If we access data column-wise, we proceed through each bank in order. By the time we call a13, we (just) avoid bbt. On the other hand, if we access data row-wise, we get a11 in bank 1, a12 in bank 5, a13 in bank 1 again - so instead of access on clock cycle 3, we have to wait until cycle 9 - a14 in bank 5 again on cycle 10, etc.
If addressing is indirect, as in
we may wind up jumping all over, and suffer performance hits because of it.