Clipping to an arbitrary unsigned range [high, low, Example 4-21 – Intel ARCHITECTURE IA-32 User Manual

Page 248

Advertising

IA-32 Intel® Architecture Optimization

4-28

The code above converts values to unsigned numbers first and then clips
them to an unsigned range. The last instruction converts the data back to
signed data and places the data within the signed range. Conversion to
unsigned data is required for correct results when (

high

low

)

0x8000

If (

high

low

)

>= 0x8000

, the algorithm can be simplified as shown in

Example 4-21.

This algorithm saves a cycle when it is known that (

high

low

)

0x8000

. The three-instruction algorithm does not work when (

high

low

)

< 0x8000

, because

0xffff

minus any number

< 0x8000

will yield

a number greater in magnitude than

0x8000

, which is a negative

number. When the second instruction,

psubssw MM0, (0xffff - high

+ low)

, in the three-step algorithm (Example 4-21) is executed, a

negative number is subtracted. The result of this subtraction causes the
values in

MM0

to be increased instead of decreased, as should be the case,

and an incorrect answer is generated.

Clipping to an Arbitrary Unsigned Range [high, low]

Example 4-22 clips an unsigned value to the unsigned range [

high,

low

]. If the value is less than

low

or greater than

high

, then clip to

low

high

, respectively. This technique uses the packed-add and

Example 4-21 Simplified Clipping to an Arbitrary Signed Range

; Input:

MM0

signed source operands

; Output:

MM1

signed operands clipped to the unsigned

;

range [high, low]

paddssw MM0, (packed_max - packed_high)

; in effect this clips to high

psubssw MM0, (packed_usmax - packed_high + packed_ow)

; clips to low

paddw MM0, low

; undo the previous two offsets

Advertising