As we said earlier, the SGI O2K is a distributed shared memory machine. That is, although memory is physically distributed with all the processors, it LOOKS like a shared environment to the user. This trick is the result of every processor having access to a single global address space. Thus every processor "knows" where every piece of data resides. For a given processor to access a piece of data, it must send a request over the interconnect network, and receive the data back from the correct location. The latency depends on how remote the data is from the requesting processor. On CCR's O2K with 128 processors, it might take as many as 6 'hops' to reach a remote node, each hop taking O(1) nsec. The bisection bandwidth (the total amount of data that can flow in the system concurrently) is some 310 Mb/sec for the system, and about 160 Mb/sec max on any one processor. The data read latency is on the order of 1100 nsec.
In contrast, a distributed memory machine like the IBM SP requires
explicit send and receive messages to be passed among the processors
in order to move data to the correct location. You the programmer
must write these message passing calls. On CCR's SP, a message among
two processors takes about 15
sec. About 125 Mb/sec can be sent
in any one message, so bigger data structures are broken up.
Processors can be arranged in a ring structure, or a mesh, a tree, or a hypercube.
One must beware of deadlock and livelock. Livelock is when a packet of information is sent 'round and 'round the network without ever finding its destination. Deadlock is when a group of processors are all waiting to send information to the next neighbor, but that neighbor is itself waiting for its neighbor to send off information. Thus no data gets moved.
Finally, I/O data movement is terribly slow, and an issue under investigation currently.