Packed 32*32 multiply, Packed 64-bit add/subtract, Bit shifts – Intel ARCHITECTURE IA-32 User Manual

Page 253

Advertising
background image

Optimizing for SIMD Integer Applications

4

4-33

Note that the output is a packed doubleword. If needed, a pack
instruction can be used to convert the result to 16-bit (thereby matching
the format of the input).

Packed 32*32 Multiply

The

PMULUDQ

instruction performs an unsigned multiply on the lower

pair of double-word operands within each 64-bit chunk from the two
sources; the full 64-bit result from each multiplication is returned to the
destination register. This instruction is added in both a 64-bit and
128-bit version; the latter performs 2 independent operations, on the low
and high halves of a 128-bit register.

Packed 64-bit Add/Subtract

The

PADDQ

/

PSUBQ

instructions add/subtract quad-word operands within

each 64-bit chunk from the two sources; the 64-bit result from each
computation is written to the destination register. Like the integer

ADD

/

SUB

instruction,

PADDQ

/

PSUBQ

can operate on either unsigned or

signed (two’s complement notation) integer operands. When an
individual result is too large to be represented in 64-bits, the lower
64-bits of the result are written to the destination operand and therefore
the result wraps around. These instructions are added in both a 64-bit
and 128-bit version; the latter performs 2 independent operations, on the
low and high halves of a 128-bit register.

128-bit Shifts

The

pslldq

/

psrldq

instructions shift the first operand to the left/right

by the amount of bytes specified by the immediate operand. The empty
low/high-order bytes are cleared (set to zero). If the value specified by
the immediate operand is greater than 15, then the destination is set to
all zeros.

Advertising