Click on an instruction to jump to that page.

 

abs

dp3

lit

mov

add

dp4

log

mova

call

dst

logp

mul

callnz

else

loop

nop

crs

endif

lrp

nrm

dcl

endloop

m3x2

pow

def

endrep

m3x3

rcp

defb

exp

m3x4

rep

defi

expp

m4x3

ret

 

frc

m4x4

rsq

 

if

mad

sge

 

label

max

sgn

 

 

min

sincos

 

 

 

slt

 

 

 

sub

 

 

 

vs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

abs (macro)

 vs 2.0

 

This macro computes the absolute value of the input register.


One slot


abs Dest0, Source0

 

This macro is equivalent to;

 

max  Dest0, Source0, -Source0

 

which you can use if your using a pre-vertex shader 2.0 shader. In any case, you’ll end up with the absolute value of the Source0 in Dest0.

 

Setup:

One source register, Source0.

 

Results:

Dest0 is filled with the absolute value of Source0.

 

abs  r0  , r0

abs r0.z, r0.z

 

 

add

 vs 1.0, 1.1, 2.0

 

Adds two sources into the destination register.


One slot


add  Dest0, Source0, Source1

 

Adds the Source0 and Source0 registers and places the result in the Dest0 register.

 

Setup:

Two source registers, Source0 and Source1.

 

Results:

Each element of Dest0 is filled with the element-by-element addition of the elements of Source0 and Source1.

 

add  r0  , r0  ,   c2

add r0.z, r0.z,  -r0.z

 

SetSourceRegisters();

 

// Simulate the add instruction

TempReg.x = Source0.x + Source1.x;

TempReg.y = Source0.y + Source1.y;

    TempReg.z = Source0.z + Source1.z;

TempReg.w = Source0.w + Source1.w;

 

WriteDestinationRegisters();

 

call

 vs 2.0

 

Makes an unconditional function call to the instruction label.


One slot


call_InstructionLabelID

 

Pushed the address of the following instruction onto the internal shader stack, and then sets the current instruction address to address of the instruction that follows the label instruction with the name InstructionLabeIID. The instruction label ID will be an integer in the range [1,16]. TODO – particular format of the statement??

 

Typically you’d create a shader subroutine that terminates with the ret instruction.

 

Setup:

Requires a valid, existing instruction label. .

 

Results:

The shader execution is transferred the instruction following the instruction label.

 

call_1

call_16

call_Fred // Error! Invalid label

call_0    // Error! Invalid label (out of range)

 

 

// Simulate the call instruction

 

// make a cast to a bare function pointer

typedef (void (*fp)(void));

 

// take address of the label

fp pFP = (fp)IntructionLabelID;

pFP(); // call the function

// returns here only when ret is executed

 

callnz

 vs 2.0

 

Call if Not-Zero. Makes a function call to the instruction label.


One slot


callnz InstructionLabelID BoolSource0

 

If the boolean register Source0 is not zero, then the address of the following instruction is pushed onto the internal shader stack, and then the current instruction address is set to address of the instruction that follows the label instruction with the name InstructionLabeIID. The instruction label ID will be an integer in the range [1,16].

 

Typically you’d create a shader subroutine that terminates with the ret instruction.

 

Setup:

Source0 is a Boolean register. Requires a valid, existing instruction label. .

 

Results:

If the source register is not zero, the shader exectution is transferred the instruction following the instruction label.

 

callnz 1 b0 // transfer execution to label1 if = != b0

callnz 2 r0 // Error! Not a Boolean register

 

 

// Simulate the callnz instruction

 

// make a cast to a bare function pointer

typedef (void (*fp)(void));

 

if ( 0 != Boolean argument )

{

fp pFP = (fp)IntructionLabelID;

pFP(); // call the function

}

 

crs (macro)

 vs 2.0

 

The three component cross product computed.


Two slots


crs Dest0, Source0, Source1

 

Computes the three component cross product using the right-hand rule. There are fairly severe restrictions on the use of swizzles. The w element of all registers are ignored.

 

This macro is equivalent to;

 

mul Dest0.xyz,  Source0.yzxw, Source1.zxyw

mad Dest0.xyz, -Source1.yzxw, Source0.zxyw, Dest0

 

Setup:

Two source registers, Source0 and Source1. These registers must not be the same as the destination register. The source registers must not have any swizzles. TODO – is this checked or just gonna produce wrong values?

 

The destination register must have a destination mask, and that mask must not contain a reference to the w element of the destination register.

 

Results:

The cross product of the two input registers is stored into the specified elements of the destination register.

 

crs   r0.xyz,  r1., r2  // fill r0 with dp3

 

 

 

dcl

 vs 2.0

 

Declare. Map a vertex element to an input register.


Takes no slots


dcl Dest0

 

In order to make it easier to optimize and verify shaders VS 2.0 now requires a declaration statement on all input registers. Thus all texture or vertex input registers must be declared before use in the shader. Dest0 will be a specific input register. The partial precision modifier (_pp) can be applied to the declaration statement to indicate a lower precision is acceptable when using this register. You must supply a component mask on Dest0 to indicate which elements are in use and valid. dcl statements must appear before the first executable instruction.

 

dcl    t1.rg // using a 2D texture

dcl    t2    // using a 4D texture (default mask)

dcl_pp t3    // indicate partial precision is OK

 

 

 

def

 vs 1.0, 1.1, 2.0

 

Sets the value of vertex shader float constants, but leaves it up to the programmer to insert these into the shader code.


No slot


def  Dest0, value0, value1, value2, value3

 

Stores four floating-point values in the elements of Dest0 register. If these instructions are used in a shader, these instructions must follow the vs instruction and precede any other instructions.

 

Setup:

Four floating-point values separated by commas.

 

Results:

Has no effect upon the shader code to follow, you must manually insert the returned code fragment into your shader.

 

Note:  If you use the def in a shader then when the shader is compiled you will have to use the 4th parameter returned from D3DXAssembleShader. This parameter will contain an ID3DXBuffer interface, which will contain a compiled shader code fragment. You will have to manually insert this fragment into your shader declaration.

.

def       r0,  0.0, 0.5, 0.25 -1.0

def       r1,  1.0, 2.0, 5.0, 10.0

 

 

defi

 vs 2.0

 

Sets the value of vertex shader integer constants.


No slot


defi  IntDest0, value0, value1, value2, value3

 

Stores four integer values in the elements of IntDest0 register for use in this shader.

 

Setup:

Four integer values separated by commas.

 

Results:

Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantI() call to set a shader constant. The previous values of the register are restored upon exit from the shader.

 

defi i0,  0, 2, 4, 8

defi i1,  -2, -1, 1, 2

 

 

defb

 vs 2.0

 

Sets the value of vertex shader boolean constants.


No slot


defb  BoolDest0, value0, value1, value2, value3

 

Stores four boolean values in the elements of BoolDest0 register for use in this shader. Zero indicates false. Nonzero indicates true.

 

Setup:

Four booleans separated by commas.

 

Results:

Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantB() call to set a shader constant. The previous values of the register are restored upon exit from the shader.

 

defb b0,  0, 1, 0, 2 // false, true, false, true

 

 

 

 

dp3

 vs 1.0, 1.1, 2.0

 

Three component dot product ( a.k.a Dot-product three) is computed and the result replicated in all specified channels of the destination register.


One slot


dp3  Dest0, Source0, Source1

 

Computes the dot product of the Source0 and Source1 registers and places the result in the Dest0 register. Only the x,y and z values are used to compute the dot product, the w component is ignored.

 

Setup:

Two source registers, Source0 and Source1.

 

Results:

Unless otherwise masked, each element of Dest0 is filled with the dot product of the first three elements of registers Source0 and Source1.

 

dp3      r0  ,  v3,  c2 // fill r0 with dp3

dp3      r1.x,  v3,  c2 // just fill r1.x

 

SetSourceRegisters();

 

// Simulate the dp3 instruction

TempReg.x = TempReg.y = TempReg.z = TempReg.w =

Source0.x * Source1.x +

Source0.y * Source1.y +

Source0.z * Source1.z;

// note w component ignored

 

WriteDestinationRegisters();

 

 

dp4

 vs 1.0, 1.1, 2.0

 

Four component dot product ( a.k.a Dot-product four) is computed and the result stored in all specified channels of the destination register.


One slot


dp4  Dest0, Source0, Source1

 

Computes the dot product of the Source0 and Source1 registers and places the result in the Dest0 register. If no mask is specified on the destination, then the entire register is filled with the dot product.

 

Setup:

Two source registers, Source0 and Source1

 

Results:

Unless otherwise masked, each element of Dest0 is filled with the dot product of the four elements of registers Source0 and Source1.

 

dp4      r0,    v3,  c2

dp4      r1.x,  v3,  c2 // just fill r1.x

 

SetSourceRegisters();

 

// Simulate the dp4 instruction

TempReg.x = TempReg.y = TempReg.z = TempReg.w =

Source0.x * Source1.x +

Source0.y * Source1.y +

Source0.z * Source1.z +

Source0.w * Source1.w;

 

WriteDestinationRegisters();

 

 

dst

 vs 1.0, 1.1

 

Computes a distance vector in the format typically used for attenuated lighting calculations.


One slot


dst  Dest0, Source0, Source1

 

Creates a distance vector from a set of distance squared & reciprocal distance values, and put them in a format that can be used for attenuated lighting calculations.

 

Setup:

Two source registers are required to be set up. Source0 should be set up as  [n/a, d2, d2, n/a]. Source1 should be set up as  [n/a, 1/d, n/a, 1/d]. . Elements noted as “n/a” are not used and their values are ignored.

 

Results:

Dest0 will be filled with elements that correspond to [1, d, d2, 1/d]. Dest0.y is computed from the product of Source0.y and Source1.y

 

dst      r2,  r0,  r1

 

SetSourceRegisters();

 

// Simulate the dst instruction

TempReg.x = 1;

TempReg.y = Source0.y * Source1.y;

TempReg.z = Source0.z;

TempReg.w = Source1.w;

 

WriteDestinationRegisters();

 

 

else

 vs 2.0

 

Provided an alternate path of execution for an if-else-endif block.


One slot


else

 

Must be inside of an if-endif block. If the Boolean argument of the if statement is false, then the execution will skip to the else instruction and continue to the terminating endif statement.. If the boolean was true then execution will skip over the code enclosed by the else-endif block. There can be only one else statement in an if-endif block.

 

 

Setup:

The else statement must be between an if and endif statement.

 

Results:

If the argument provided to the if statement was false, then the code inside the else-endif block will be executed.

 

else  

 

 

 

 

endloop

 vs 2.0

 

The termination point for a loop-endloop block.


One slot


endloop

 

When used with the loop instruction, creates a block of instruction over which execution can be specified a variable number of times.

 

Setup:

You must have a loop instruction in your shader prior to this instruction.

 

Results:

When the loop reached the endloop instruction the loop counter (specified in the loop instruction) is incremented by the increment value (also specified in the loop instruction).

 

endloop

 

// simulate the endloop instruction

// assume that LoopCounter, LoopStep, LoopInterator

// were defined in the loop instruction and

// StartLoopOffset is the instruction following

// the loop instruction

 

 

LoopCounter += LoopStep;

 

--LoopInterator;


if ( LoopIterater > 0 )

     goto StartLoopOffset

 

// fall though

 

 

 

 

endif

 vs 2.0

 

The termination point for an if-endif or ifc-endif block.


Zero slots


endif

 

When used with the if or ifc instruction, creates a block of instruction over which execution can be specified a number of times.

 

Setup:

You must have an if or ifc instruction in your shader prior to this instruction.

 

Results:

Execution is controlled by the if or ifc instruction that proceeds this instruction. When the argument of that statement is false then execution will jump to the statement following the endif.

 

if  b1

   // if b1 != 0, this section gets executed

else // optional else statement

   // if b1 = 0, this section gets executed

endif

 

 

 

endrep

 vs 2.0

 

The termination point for a rep-endrep block.


Zero slots


endrep

 

When used with the rep instruction, creates a block of instruction over which execution can be specified a number of times.

 

Setup:

You must have a rep instruction in your shader prior to this instruction.

 

Results:

Execution is controlled by the rep instruction that precedes this instruction. When the iteration count of that statement is zero then execution will jump to the statement following the endrep.

 

defi i0, 20, 0, 0, 0

 

rep    i0 // i0.x is used = 20

 

   // this section gets executed 20 times

 

endrep

 

 

exp (macro)

 vs 1.0, 1.1, 2.0

 

This macro computes power of two to at least 20-bits of precision.  By default, only the source register’s w element is used. The results are replicated in the entire destination register. Note that the expp instruction sets the destination’s w element is set to 1.


Takes at least 12 instruction slots.


exp  Dest0, Source0

 

Calculates for 2Source0.w, and writes the result in Dest0. Unless otherwise specified, Source0.w is the input value, and all elements of Dest0 are written with the exponented value. This is somewhat different from the expp instruction, which always sets Dest0.w to 1. [TODO: what happens for 0 and negative arguments??]

 

exp      r0,   c1  // fill all of r0 with exp2(c1.w)

exp      r0.x, c1.y // store exp2(c1.y) in r0.x

 

expp

 vs 1.0, 1.1, 2.0

 

Computes power of two with the results being broken into a partial precision part and a higher precision integer and fractional parts. This allows you to use the lower precision single element or use a more complicated integer/fractional calculation when you need higher precision. The destination’s w element is set to 1. Only the integer part of the source register’s w element is used. If Source0.w < 0 then the results are undefined.

Note: Don’t confuse this with the exp macro!


One slot


expp  Dest0, Source0

 

Computes low and higher precision values for 2Source0.w, where Dest0.z contains the low precision single element approximation, Dest0.x and Dest0.y contain the integer and fractional parts. Dest0.w is set to 1.

 

You have a choice in which part of the results to use. There low precision part will contain the exponent of the input value to 10-bits of precision. The two-part higher precision part will contain the exponent of the integer part of the input value, and the fractional part of the input value, which you will have to provide a function to compute the value of 2n for 0 <= n <= 1 to your desired precision, and then add that to the integer’s exponent value.

 

Setup:

Store the value you want the exponent of in Source0.w. The value should be positive. The other resister elements are ignored.

 

Results:

Dest0.z will contain a low precision exponential value.

Dest0.x will contain the exponential of the integer part of the input.

Dest0.y will contain the fractional part of the input, not the exponential of the fractional part. You have to do the conversion yourself.

Dest0.w is set to 1.0.

 

expp      r0,  r1

 

// DirectX 8 version

SetSourceRegisters();

 

// Simulate the expp instruction

float wWhole = Source0.w; // take all

float wInt   = (int)Source0.w; // take integer part

 

// compute the higher-precision parts

TempReg.x = pow(2,wInt);

TempReg.y = Source0.w – wInt; // fractional part of w

 

// calculate the 2^(Source0.w) then chop

// to 10 bits precision

TempReg.z = pow(2,wWhole) & 0xffffff00;

// set w to 1

TempReg.w = 1;

 

WriteDestinationRegisters();


 

// DirectX 9 version

SetSourceRegisters();

 

// Simulate the logp instruction

float v = abs(Source0.w); // only positive values

float logValue;

 

if ( 0 == v )

{

logValue = MINUS_INFINITY;

}

     else

{

logValue = (float)(log(v)/log(2));

logValue = (int)::floor( logValue );

// store low-precision part to 10-bits

unsigned long temp = *(unsigned long*)&logValue;

          logValue = *(float*)& temp & 0xFFFFFF00;

}

 

TempReg.x = TempReg.y =

TempReg.z = TempReg.w =

LogValue;

 

WriteDestinationRegisters();

 

frc (macro)

 vs 1.0, 1.1, 2.0

 

This macro removes the integer part of the input register’s x and y elements and places the fractional remainder into the destination register’s x and y elements. The sign of the results are always positive. A write mask on the destination is required.


Takes 3 instruction slots


frc  Dest0, Source0

 

Takes the fractional parts of Source0’s x and y elements and places them in Dest0’s x and y elements. Dest0’s z and w elements are unaltered. The sign of the input arguments is ignored. You must specify a write mask on Dest0. This can be either xy or just y. (Not just x, for some reason).

 

Note: Early versions of the SDK documentation incorrectly stated that the entire source register was used, and made no mention of the fact that write masks were required.

 

frc    r0.xy,  r1 // use r1.xy and store fractions in r0.xy

 

// use r1.x and store fraction in r0.y, r0.x (and z & w)

// remain unchanged

frc    r0.y ,  r1.x

 

// this has no effect on the results, since the

// sign is ignored

frc    r0.y ,  -r1.x

 

frc    r0, r1 // Error! No write mask.

 

 

 

 

 

if

 vs 2.0

 

The start of an if-else-endif block. Conditionally execute a block of code.


One slot.


if BoolReg0

 

The argument must be a boolean constant register. There must be a terminating endif that follows the if instruction. The else instruction is optional and must be between the if and endif statements. If the boolean argument is true, then execution will continue immediately after the if statement, until either the else or