Output Register Masks, Argument and Instruction Modifiers

In order to give you more control over how an individual register or instruction is used, you have an array of masks, selectors, and modifiers to manipulate exactly how an instruction works, and what register channels are used and/or written to.

 

source negation

 

Negation can be used to negate an entire source register before it is used. Source negation is indicated by placing a minus sign, “-”, in front of the source register to be negated. The source register values are unchanged.

 

Rules for using source negation:

 

mov    t0, -v0      // t0 = -1.0 * v0

mul    t0, -v0, -c3     

add    r0,  v0, -v1

mul    r1, 1-v0, -v1 // with invert modifier

texkill –t0         // Error! Text instruction!

 

source invert

 

Subtracts all elements of a register from one and uses that as its output.  Source negation is indicated by placing a “1 - ”, (Number one followed by a minus sign) in front of the source register to be inverted.  The source register values are unchanged.

 

Rules for using source invert:

 

mov  r0,  1-v0    // swaps colors

mul   r1,  1-v0, -v1

 

source bias

 

The bias modifier is used for shifting the range of the input register from the [0,1] range to the [-0.5,+0.5] range. The bias modifier is indicated by adding a “_bias” suffix to a register. Essentially the modifier subtracts 0.5 from the register’s values before they are used. Be careful when using this modifier with the color registers, as the range of the color registers is [0,1], and you’ll get an implicit clamping. The source register values are unchanged.

 

If you use it with a mov_2X instruction/modifier, you can convert a register range from [0,1] to [-1,1], the same as source signed scaling modifier.

 

Note: If you used the D3DTOP_ADDSIGNED texture operation in one of your DirectX texture stages, the bias modifier performs the same operation.

 

Rules for using source bias:

           

// Shift range from [0,1] to [-0.5, 0.5]

mov       r0,  r0_bias   // r0 =  r0 - 0.5

 

// Shift range from [0,1] to [-0.5, 0.5]

// Then shift sign

mov       r0,  -r0_bias   // r0 =  0.5 - r0

 

// shift range from [0,1] to [-1,1]

mov_x2    r0,  r0_bias

 

 

source signed scaling

 

The signed scaling modifier (also called “bias times two”) is used for shifting the range of the input register from the [0,1] range to the [-1,+1] range, typically when you want to use the full signed range registers are capable of. The bias modifier is indicated by adding a ” _bx2” suffix to a register. Essentially the modifier subtracts 0.5 from the register’s values and then multiplies that result by 2 before they are used.  The source register values are unchanged.

 

For PS 1.0 & 1.1 arguments for the texm3x2* and texm3x3* instructions can use the _bx2 modifier.

 

For PS 1.2 & 1.3 arguments for any tex* instruction can use the _bx2 modifier.

 

Note: If you used the D3DTOP_ADDSIGNED2X texture operation in one of your DirectX texture stages, the signed scaling modifier performs the same operation.

 

Rules for using signed source scaling:

           

mov       t0,  t0_bx2     // t0 =  2.0* (t0 - 0.5)

mov       r0,  r0_bx2     // darken dull colors

 

source scale 2X

The scale by two modifier is used for shifting the range of the input register from the [0,1] range to the [-1,+1] range, typically when you want to use the full signed range registers are capable of. The scale by two modifier is indicated by adding a ” _x2” suffix to a register. Essentially the modifier multiplies that register values by 2 before they are used.  The source register values are unchanged.

 

Rules for using scale by two:

 

mov  r0, r0_x2 // 2x r0 

 

 

source replication/selection

 

Just as vertex shader let you select the particular elements of a source register to use, so do pixel shaders with some differences. You can only select a single element and that element will be replicated to all channels. You specify a channel to replicate by adding a “.n” suffix to the register, where n is r, g, b or a. (or  x, y, z and w).

 

Source Register Selectors

 

Register channel

PS Version

 red  

green

blue

alpha

1.0

 

 

 

x

1.1

 

 

x

x

1.2

 

 

x

x

1.3

 

 

x

x

1.4 Phase 1

x

x

x

x

1.4 Phase 2

x

x

x

x

2.0

x

x

x

x

 

 

mov  r0,   v0.a     // ps.1.0

mov  r0.a, v0.b     // ps.1.1 ps.1.2 ps.1.3

 

// these commands are an error if not ps.1.4

mov  r0,  v0.b    

mov  r0,  v0.g    

 

 

texture register modifiers (PS 1.4 only)

PS 1.4 has its own set of modifiers for texture instructions. Since the only texcrd and texld instructions are used to load or sample textures with PS 1.4, these modifiers are unique to those instructions. Note that you can interchange rgba syntax with xyzw syntax, thus _dz is the same as _db.

 

Source Texture Register Selectors

These allow you to do to swizzle the source register to a limited extent. The syntax is they are added as a suffix on the register. They can be used anytime texcrd or texld can be used. Since the instructions will only read three components, these selectors allow you to fill the register’s last two channels with either the .z value or the .w value, instead of leaving it uninitialized. You can mix the .xyw selector with the _dw modifier. You can use the _dz modifier only on a temporary register, but not more than twice per shader. This allows you to map a 4D texture into 3D texture space so it can be manipulated in the shader.

 

 

PS 1.4 Source Register Selectors

Description

Syntax

Source register looks like .xyzz

.xyz

Source register looks like .xyww

.xyw

 

texld     r0,  t0.xyz    // r0.xyzw = t0.xyzz

texld     r0,  t0.rgb    // alternate syntax

texld     r0,  t0_dz.xyz // with a register modifier

 

Once you use a particular selector on a texture register, you cannot use a different one on the same source register in the same shader. For example, the following is a legal set of instructions. Register t2 is used with the .xyz selector twice.

 

texld     r0,  t2.xyz   

texld     r1,  t2.xyz

 

However, the following, which uses register t2 with the .xyz selector and then the .xyw selector is in error.

 

texld     r0,  t2.xyz   

texld     r1,  t2.xyw // Error register t2

// used again but with different selector.

 

Source Texture Register Modifiers

These modifiers allow you to do a perspective divide (either by the .z or the .w element) in the pixel shader. The syntax is they are added as a suffix on the register. They can be used anytime texcrd or texld can be used. Only the .xy channel of the destination will be modified. If the divisor is zero, then the destination is set to one. The  _dw modifier is for Phase 1, the _dz modifier is for Phase 2.

 

PS 1.4 Source Register Modifiers

Description

Syntax

Divide x,y by z

_dz

Divide x,y by w

_dw

 

texld     r0,  t0_dz

 

// these are the same as above

 

texld     r0,  t0_dz.xyz

texld     r0,  t0_db.xyz

texld     r0,  t0_db.rgb

.

You can mix the .xyw selector with the _dw modifier. The _dw modifier can be used as many times as necessary in Phase 1. After Phase 1 the .w channel is invalid, thus you can’t use the modifier. You can use the _dz modifier only on a temporary register (thus, only in Phase 2), and not more than twice per shader. The following shows what phase an instruction would be valid or invalid for. I’ve ignored usage restrictions on texture register, etc.

 

// Phase 1

 

texld     r0,  t0_dz // Invalid – dz Phase 2 only

texld     r0,  t0_dw // Valid

 

phase

 

// Phase 2

 

texld     r0,  t1_dz.xyz // Invalid – text register

texld     r0,  t1_db.xyz // Invalid - _db == _dz

texld     r0,  r0_dz.xyz // Valid – temp register

texld     r0,  t0_dw.xyz // Invalid – w is undefined

Destination Write Masks

These write masks control which channel(s) are written to. They can be used anytime texcrd or texld can be used. No mask is the same as specifying all. Only the combinations shown in the table can be used.

 

PS 1.4 Destination Write Masks

Description

Syntax

Writes to the xyzw channels

xyzw

Writes to the xyz channels

xyz

Writes to the xy channels

xy

 

texcrd    r0.xy,    t0_dz

texcrd    r0.rg,    t0_dz // same as previous

 

texcrd    r0.xyzw,  t0_dz

texcrd    r0,       t0_db // same as previous   

 

 

 

destination write mask

 

Note the word destination above. Masks can only be used to select which elements of a register are to be written to. Unlike vertex shaders however, all you can do is select all channels  (.rgba), color channels only (.rgb), or the alpha channel (.rgb)  – though later pixel shaders allow more control. This mimics the traditional lighting pipeline in which you can have color and alpha channels processed separately. Omitting a mask is the same as specifying the full mask. The alpha mask is also referred to as the scalar mask, since it uses a scalar value. The color write mask is sometimes referred to as the vector mask. An alternate syntax is to use .xyzw instead of .rgba.

 

Destination write masks are supported only for arithmetic instructions only with the exception of the texcrd and texld instructions. The dp3 instruction can only use .rgb or .rgba masks for PS 1.0 – 1.3.

 

Destination masks are particularly important when you start getting set up for instruction pairing.

 

Note that with PS 1.4 shaders you have the ability to operate on individual channels, giving you a lot more flexibility.

 

Destination write mask Descriptions

Mask

Operation

   .rgb

The operation works on the color channel (rgb) and is scheduled for execution in the vector pipeline.

.a

The operation works on the alpha channel and is scheduled for execution in the scalar pipeline.

.r, .g, .b

Let’s you select the destination channel to write to.

     .rgba

The operation works on the color and alpha channel and is scheduled for parallel execution in the vector and scalar pipelines. This is the default if a mask is not specified.

  .(r)(g)(b)(a)

Arbitrary mask. Must be listed in .rgba order but can use any of the masks.

 

 

Destination write mask Selectors

 

Selector

PS Version

 r  

g

b

a

rgb

rgba

(r)(g)(b)(a)

1.0

 

 

 

x

x

x

 

1.1

 

 

 

x

x

x

 

1.2

 

 

 

x

x

x

 

1.3

 

 

 

x

x

x

 

1.4 Phase 1

x

x

x

x

x

x

x

1.4 Phase 2

x

x

x

x

x

x

x

2.0

x

x

 

x

x

x

x

 

Here are some examples of using the write mask.

 

// color channel is modulated

mul r0.rgb,  t0, v0

// alpha is added using a different source register

add r0.a,    t1, v1

//

mul r0.rgb,  t0,  v0

+add r0.a,    t0,  v0 // note instruction pairing

 

// variations that have the same affect

// no masks is equvalent to

mul r0,      t0,   v0

mul r0.rgba, t0,   v0 // full specification

 

 

Note that specifying exactly the same operation on the color and alpha channel (including registers) will automatically cause pairing to occur. The following code fragments cause the same code to be assembled in the pixel shader.

 

// no masks, a single operation

mul r0,      t0,   v0

 

This is the same as writing;

 

// full mask with a single operation

mul r0.rgba, t0,   v0

 

This is the same as writing;

 

// color and alpha mask with the same operation

mul r0.rgba, t0,   v0 // on color

mul r0.a,    t0,   v0 // on alpha, same arguments

 

except it takes up an extra slot and will run slower. However, you can rewrite it as;

 

// color and alpha mask with the same operation

// with pairing

mul  r0.rgba, t0,   v0 // on color

+mul r0.a,    t0,   v0 // on alpha, same arguments

 

And now you’ve paired the instructions since you’ve freed one slot and reduced the run time. The point being that now you can change the alpha manipulations and perform something different in the scalar (alpha) pipe.

 

instruction modifiers

 

Note that these are placed on the actual instructions, not the arguments. The pixel shader assembler support shift/scale modifier flags and a saturation modifier flag that affects the generated output result. The modifiers and be though of as shift left (power-of-two multiply), shift right (power-of-two divide), and saturate – clamp output range to [0,1]

 

Rules for using instruction modifiers:

 

Instruction modifiers Description

Modifier

Operation

_2x

2X modifier. Multiply the results by 2 before storing in the register.

_4x

4X modifier. Multiply the results by 4 before storing in the register.

_8x

8X modifier. Multiply the results by 8 before storing in the register.

_d2

Half modifier. Divide the results by 2 before storing in the register.

_d4

Quarter modifier. Divide the results by 4 before storing in the register.

_d8

Eighth modifier. Divide the results by 8 before storing in the register.

_sat

Saturation modifier. Clamps the results to the range [0,1] before storing.

 

Instruction modifiers usage

 

Modifier

PS Version

 _x2  

_x4

_x8

_d2

_d4

_d8

_sat

1.0

X

X

 

X

 

 

X

1.1

X

X

 

X

 

 

X

1.2

X

X

 

X

 

 

X

1.3

X

X

 

X

 

 

X

1.4 Phase 1

X

X

X

X

x

X

X

1.4 Phase 2

X

X

X

X

x

X

X

2.0

x

x

(?)

x

(?)

(?)

X

 

Here are some examples of using instruction modifiers.

 

add_x2    r0, v1, v1

add_d2     r0, v1, v0

add_sat    r0, v1, v0

add_x2_sat r0, v1, v1

add_d2_sat r0, v1, v1

add_sat_d2 r0, v1, v1 // Error! _sat must be last

 

 

partial precision declaration modifier (PS 2.0)

 

DirectX 9 introduced the partial pre