7 instruction timing, Table 2-6. integer divide latency, Instruction timing -27 – Freescale Semiconductor MPC8260 User Manual

Page 145: Integer divide latency -27, Section 2.7, “instruction timing

Advertising
background image

G2 Core

MPC8260 PowerQUICC II Family Reference Manual, Rev. 2

Freescale Semiconductor

2-27

2.7

Instruction Timing

The processor core is a pipelined superscalar processor. A pipelined processor is one in which the
processing of an instruction is broken into discrete stages. Because the processing of an instruction is
broken into a series of stages, an instruction does not require the entire resources of an execution unit at
one time. For example, after an instruction completes the decode stage, it can pass on to the next stage,
while the subsequent instruction can advance into the decode stage. This improves the throughput of the
instruction flow. The instruction pipeline in the processor core has four major stages, described as follows:

The fetch pipeline stage primarily involves retrieving instructions from the memory system and
determining the location of the next instruction fetch. Additionally, the BPU decodes branches
during the fetch stage and folds out branch instructions before the dispatch stage if possible.

The dispatch pipeline stage is responsible for decoding the instructions supplied by the instruction
fetch stage, and determining which of the instructions are eligible to be dispatched in the current
cycle. In addition, the source operands of the instructions are read from the appropriate register file
and dispatched with the instruction to the execute pipeline stage. At the end of the dispatch pipeline
stage, the dispatched instructions and their operands are latched by the appropriate execution unit.

During the execute pipeline stage, each execution unit that has an executable instruction executes
the selected instruction (perhaps over multiple cycles), writes the instruction's result into the
appropriate rename register, and notifies the completion stage that the instruction has finished
execution.

The execution unit reports any internal exceptions to the completion/writeback pipeline stage and
discontinues execution until the exception is handled. The exception is not signaled until that
instruction is the next to be completed. Execution of most load/store instructions is also pipelined.
The load/store unit has two pipeline stages. The first stage is for effective address calculation and
MMU translation and the second stage is for accessing the data in the cache.

The complete/writeback pipeline stage maintains the correct architectural machine state and
transfers the contents of the rename registers to the GPRs and FPRs as instructions are retired. If
the completion logic detects an instruction causing an exception, all following instructions are
cancelled, their execution results in rename registers are discarded, and instructions are fetched
from the correct instruction stream.

The processor core provides support for single-cycle store operations and it provides an adder/comparator
in the SRU that allows the dispatch and execution of multiple integer add and compare instructions on each
cycle.

Performance of integer divide operations has been improved in the processor core. A divide instruction
takes half the cycles to execute as described in the G2 Core reference Manual. The new latency is reflected
in

Table 2-6

.

Table 2-6. Integer Divide Latency

Primary Opcode

Extended Opcode

Mnemonic

Form

Unit

Cycles

31

459

divwu[o][.]

xo

IU

20

31

491

divw[o][.]

xo

IU

20

Advertising