A superscalar CPU architecture is one which has more than one logic unit which can perform the computations called for by instructions, and which is prepared to 'issue' instructions to more than one of these units simultaneously.
The units may include duplicates (e.g. two floating point multiplication units), as well as disparate units.
Synchronization between the units is an issue; if the Nth instruction in a stream uses an output of the N-1th instruction, the later instruction cannot start until the earlier one has completed. This conflict usually arises over registers in the CPU. A number of approaches have been used to handle this:
- The CPU can include hardware which looks for these conflicts and 'stalls' the later instruction until the earlier has completed; this was the technique used by the CDC 6600, the first superscalar CPU.
- When a later instruction needs a register which is used by an earlier instruction, but does not need the data in the register, a technique called register renaming is used to allow the later one to start as soon as its input data is available, but using a different physical register.
- The system may elect to move the handling out of hardware, and into software; the compiler will only issue instruction sequences in which conflicts do not occur.