MIMD Architectures

MIMD stands for Multiple Instruction stream Multiple Data stream. MIMD architectures usually consist of several independent processors which can execute individual instruction streams. Each processor may be executing a different program. One way of subdividing this type of architectures is by the relationship between processors and memory. This type of division leads to two main types of MIMD architectures. The two types are the shared memory architecture and the distributed memory architecture.

In the shared memory architecture a small number of processors has access to a global memory store through a bus. The processors communicate with each other through the following procedure: one processor writes data into a location in memory and another processor reads the data. With this type of communication the time to access any piece of datum is the same since all communication goes through the bus.

The characteristics of Shared memory are:
- Communication is done by hardware in the memory interface.
- Memory reference latency may be long and variable.
- Collisions are possible on reference to memory.
- Randomization of requests may be used to reduce collisions.

The advantage of shared memory architecture is that it is easy to program, since there is no explicit communication between processors. All communication is done through the global memory store, and access to this memory can be controlled by using semaphores. An essential mechanism for programming shared memory multiprocessors is synchronization. Synchronization guarantees some relationship between the rate of progress of the parallel processes, and it can be divided into two basic classes:
-Control oriented: Progress past some point in the program is controlled (critical section),
-Data oriented: Access to a data item is controlled by the state of the data item (locked and unlocked).

The critical section can be implemented by using software methods. These solutions are all based on the fact that read/write (load/store) are instructions available at the atomic machine level (hardware). Only one proccess at a time is allowed in the critical section, and once a process is executing the critical section, no other process is allowed to enter the critical section.
User accessible hardware locks are available to allow mutual exclusion of shared data structures. There are 16K such hardware locks in a set. One or more sets can be installed in a machine, one/multibus adapter board. Each lock is a 32-bit double word. The least significant bit determines the state of a lock: locked (1), and unlocked (0). Reading the lock returns the value of this bit and sets it to 1, thus locking the lock. Writing 0 to a lock unlocks it. Locks can support a variety of synchronization techniques including: busy waiting, counting/queueing semaphores, and barriers.

There are hardware solutions to the critical section. Some examples are:
-Test and Set: which is a machine-level instruction that can test and modify the contents of a word in one memory cycle.
-Swap: this instruction swaps the contents of two memory locations in one memory cycle. This instruction is common in IBM computers.

The disadvantage of shared memory architectures is that potentially a large number of processors may attempt to access the global memory store at the same time, leading to a deadlock.


The distributed memory architectures get around the drawbacks of the shared memory architecture by giving each processor its own memory. Therefore, distributed memory multiprocessors are also known as explicit communication multiprocessors. A processor can only access the memory which is attached directly to it. If a processor needs data which is contained in the memory of a remote processor, then it must send a message to the remote processor asking it to send it the data. Clearly access to local memory can occur much faster than access to data on a remote processor. In addition to this, the further the physical distance to the remote processor, the longer it can take to access remote data. There are three types of Distributed memory topologies: ring topology, rectangle mesh topology, and hypercube interconnection topology. An N processor ring topology can take up to N/2 steps to transmit a message from one processor to another (assuming bi-directional ring). An N processor mesh topology can take up to square root of N steps to transmit a message from one processor to another. For a hypercube switching network each processor connects directly to Log2(N) others, whose indices are obtainedby changing one bit of the binary number of the reference processor (grey code). Up to Log2(N) steps are needed to transmit a message between processors.

The characteristics of Distributed memory are:
- Communication is done by software using data transmission instructions.
- The instruction set has send/receive (processor) instructions as well as read/write (memory).
- Large messages may mask long variable latency (by using large messages the overhead is minimized).
- Careful management of communication is needed to avoid collisions.

The above classifications are idealised architectures. Often actual machines are mixtures of the different types of architectures.