In the prototype shared memory machine, each processor has its own cache, and is connected to a bus or a crossbar switch. This bus or crossbar allows communication (equally fast) with any of the memory modules. This is a symmetric multiprocessor, or SMP, machine.
Note: in a NUMA machine, each processor may have a picture of memory (a memory control unit sometimes, which helps with cache coherenc and other aspects of memory management) but has fast access only to its local memory. Access to remote data is over a crossbar.
In a prototype distributed memory machine, each processor, cache, and memory module is attached to a communication network. Data is moved to processor(s) requiring it by explicit message-passing. The communication network and its speed now play an important role in a computation. For example, one may ask: Will it be faster to fetch a chuck of data from another processor, or to recompute that data with locally stored information? That is, what is the communication/computation trade-off?