Click on an instruction to jump to that page.
|
|
|||
|
|
|||
|
|
|||
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
vs 2.0 |
This macro computes the absolute value of the input register.
One slot
abs Dest0, Source0
This macro is equivalent to;
max Dest0, Source0, -Source0
which you can use if your using a pre-vertex shader 2.0 shader. In any case, you’ll end up with the absolute value of the Source0 in Dest0.
Setup:
One source register, Source0.
Results:
Dest0 is filled with the absolute value of Source0.
abs r0 , r0
abs r0.z, r0.z
|
vs 1.0, 1.1, 2.0 |
Adds two sources into the destination register.
One slot
add Dest0, Source0, Source1
Adds the Source0 and Source0 registers and places the result in the Dest0 register.
Setup:
Two source registers, Source0 and Source1.
Results:
Each element of Dest0 is filled with the element-by-element addition of the elements of Source0 and Source1.
add r0 , r0 , c2
add r0.z, r0.z, -r0.z
SetSourceRegisters();
//
Simulate the add instruction
TempReg.x
= Source0.x + Source1.x;
TempReg.y
= Source0.y + Source1.y;
TempReg.z
= Source0.z + Source1.z;
TempReg.w
= Source0.w + Source1.w;
WriteDestinationRegisters();
|
vs 2.0 |
Makes an unconditional function call to the instruction label.
One slot
call_InstructionLabelID
Pushed the address of the following instruction onto the internal shader stack, and then sets the current instruction address to address of the instruction that follows the label instruction with the name InstructionLabeIID. The instruction label ID will be an integer in the range [1,16]. TODO – particular format of the statement??
Typically you’d create a shader subroutine that terminates with the ret instruction.
Setup:
Requires a valid, existing instruction label. .
Results:
The shader execution is transferred the instruction following the instruction label.
call_1
call_16
call_Fred // Error! Invalid label
call_0 // Error! Invalid label (out of range)
//
Simulate the call instruction
//
make a cast to a bare function pointer
typedef
(void (*fp)(void));
//
take address of the label
fp
pFP = (fp)IntructionLabelID;
pFP();
// call the function
//
returns here only when ret is executed
|
vs 2.0 |
Call if Not-Zero. Makes a function call to the instruction label.
One slot
callnz InstructionLabelID
BoolSource0
If the boolean register Source0 is not zero, then the address of the following instruction is pushed onto the internal shader stack, and then the current instruction address is set to address of the instruction that follows the label instruction with the name InstructionLabeIID. The instruction label ID will be an integer in the range [1,16].
Typically you’d create a shader subroutine that terminates with the ret instruction.
Setup:
Source0 is a Boolean register. Requires a valid, existing instruction label. .
Results:
If the source register is not zero, the shader exectution is transferred the instruction following the instruction label.
callnz 1 b0 // transfer execution to label1 if = != b0
callnz 2 r0 // Error! Not a Boolean register
//
Simulate the callnz instruction
//
make a cast to a bare function pointer
typedef
(void (*fp)(void));
if
( 0 != Boolean argument )
{
fp pFP = (fp)IntructionLabelID;
pFP(); // call the function
}
|
vs 2.0 |
The three component cross product computed.
Two slots
crs Dest0, Source0,
Source1
Computes the three component cross product using the right-hand rule. There are fairly severe restrictions on the use of swizzles. The w element of all registers are ignored.
This macro is equivalent to;
mul Dest0.xyz, Source0.yzxw, Source1.zxyw
mad Dest0.xyz,
-Source1.yzxw, Source0.zxyw, Dest0
Setup:
Two source registers, Source0 and Source1. These registers must not be the same as the destination register. The source registers must not have any swizzles. TODO – is this checked or just gonna produce wrong values?
The destination register must have a destination mask, and that mask must not contain a
reference to the w
element of the destination register.
Results:
The cross product of the two input registers is stored into the specified elements of the destination register.
crs r0.xyz, r1., r2 // fill r0 with dp3
|
vs 2.0 |
Declare. Map a vertex element to an input register.
Takes no slots
dcl Dest0
In order to make it easier to optimize and verify shaders VS 2.0 now requires a declaration statement on all input registers. Thus all texture or vertex input registers must be declared before use in the shader. Dest0 will be a specific input register. The partial precision modifier (_pp) can be applied to the declaration statement to indicate a lower precision is acceptable when using this register. You must supply a component mask on Dest0 to indicate which elements are in use and valid. dcl statements must appear before the first executable instruction.
dcl t1.rg // using a 2D texture
dcl t2 // using a 4D texture (default mask)
dcl_pp t3 // indicate partial precision is OK
|
vs 1.0, 1.1, 2.0 |
Sets the value of vertex shader float constants, but leaves it up to the programmer to insert these into the shader code.
No slot
def Dest0, value0, value1, value2, value3
Stores four floating-point values in the elements of Dest0 register. If these instructions are used in a shader, these instructions must follow the vs instruction and precede any other instructions.
Setup:
Four floating-point values separated by commas.
Results:
Has no effect upon the shader code to follow, you must manually insert the returned code fragment into your shader.
Note: If you use the def in a shader then when the shader is compiled you will have to use the 4th parameter returned from D3DXAssembleShader. This parameter will contain an ID3DXBuffer interface, which will contain a compiled shader code fragment. You will have to manually insert this fragment into your shader declaration.
.
def r0, 0.0, 0.5, 0.25 -1.0
def r1, 1.0, 2.0, 5.0, 10.0
|
vs 2.0 |
Sets the value of vertex shader integer constants.
No slot
defi IntDest0, value0, value1, value2, value3
Stores four integer values in the elements of IntDest0 register for use in this shader.
Setup:
Four integer values separated by commas.
Results:
Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantI() call to set a shader constant. The previous values of the register are restored upon exit from the shader.
defi i0, 0, 2, 4, 8
defi i1, -2, -1, 1, 2
|
vs 2.0 |
Sets the value of vertex shader boolean constants.
No slot
defb BoolDest0, value0, value1, value2, value3
Stores four boolean values in the elements of BoolDest0 register for use in this shader. Zero indicates false. Nonzero indicates true.
Setup:
Four booleans separated by commas.
Results:
Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantB() call to set a shader constant. The previous values of the register are restored upon exit from the shader.
defb b0, 0, 1, 0, 2 // false, true, false, true
|
vs 1.0, 1.1, 2.0 |
Three component dot product ( a.k.a Dot-product three) is computed and the result replicated in all specified channels of the destination register.
One slot
dp3 Dest0, Source0, Source1
Computes the dot product of the Source0 and Source1 registers and places the result in the Dest0 register. Only the x,y and z values are used to compute the dot product, the w component is ignored.
Setup:
Two source registers, Source0 and Source1.
Results:
Unless otherwise masked, each element of Dest0 is filled with the dot product of the first three elements of registers Source0 and Source1.
dp3 r0 , v3, c2 // fill r0 with dp3
dp3 r1.x, v3, c2 // just fill r1.x
SetSourceRegisters();
//
Simulate the dp3 instruction
TempReg.x
= TempReg.y = TempReg.z = TempReg.w =
Source0.x * Source1.x +
Source0.y * Source1.y +
Source0.z * Source1.z;
// note w component ignored
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Four component dot product ( a.k.a Dot-product four) is computed and the result stored in all specified channels of the destination register.
One slot
dp4 Dest0, Source0, Source1
Computes the dot product of the Source0 and Source1 registers and places the result in the Dest0 register. If no mask is specified on the destination, then the entire register is filled with the dot product.
Setup:
Two source registers, Source0 and Source1
Results:
Unless otherwise masked, each element of Dest0 is filled with the dot product of the four elements of registers Source0 and Source1.
dp4 r0, v3, c2
dp4 r1.x, v3, c2 // just fill r1.x
SetSourceRegisters();
//
Simulate the dp4 instruction
TempReg.x
= TempReg.y = TempReg.z = TempReg.w =
Source0.x * Source1.x +
Source0.y * Source1.y +
Source0.z * Source1.z +
Source0.w * Source1.w;
WriteDestinationRegisters();
|
vs 1.0, 1.1 |
Computes a distance vector in the format typically used for attenuated lighting calculations.
One slot
dst Dest0, Source0, Source1
Creates a distance vector from a set of distance squared & reciprocal distance values, and put them in a format that can be used for attenuated lighting calculations.
Setup:
Two source registers are required to be set up. Source0 should be set up as [n/a, d2, d2, n/a]. Source1 should be set up as [n/a, 1/d, n/a, 1/d]. . Elements noted as “n/a” are not used and their values are ignored.
Results:
Dest0 will be filled with elements that correspond to [1, d, d2, 1/d]. Dest0.y is computed from the product of Source0.y and Source1.y
dst r2, r0, r1
SetSourceRegisters();
//
Simulate the dst instruction
TempReg.x
= 1;
TempReg.y
= Source0.y * Source1.y;
TempReg.z
= Source0.z;
TempReg.w
= Source1.w;
WriteDestinationRegisters();
|
vs 2.0 |
Provided an alternate path of execution for an if-else-endif block.
One slot
else
Must be inside of an if-endif block. If the Boolean argument of the if statement is false, then the execution will skip to the else instruction and continue to the terminating endif statement.. If the boolean was true then execution will skip over the code enclosed by the else-endif block. There can be only one else statement in an if-endif block.
Setup:
The else statement must be between an if and endif statement.
Results:
If the argument provided to the if statement was false, then the code inside the else-endif block will be executed.
else
|
vs 2.0 |
The termination point for a loop-endloop block.
One slot
endloop
When used with the loop instruction, creates a block of instruction over which execution can be specified a variable number of times.
Setup:
You must have a loop instruction in your shader prior to this instruction.
Results:
When the loop reached the endloop instruction the loop counter (specified in the loop instruction) is incremented by the increment value (also specified in the loop instruction).
endloop
// simulate the endloop
instruction
// assume that
LoopCounter, LoopStep, LoopInterator
// were defined in the
loop instruction and
// StartLoopOffset is the
instruction following
// the loop instruction
LoopCounter += LoopStep;
--LoopInterator;
if ( LoopIterater > 0 )
goto StartLoopOffset
// fall though
|
vs 2.0 |
The termination point for an if-endif or ifc-endif block.
Zero slots
endif
When used with the if or ifc instruction, creates a block of instruction over which execution can be specified a number of times.
Setup:
You must have an if or ifc instruction in your shader prior to this instruction.
Results:
Execution is controlled by the if or ifc instruction that proceeds this instruction. When the argument of that statement is false then execution will jump to the statement following the endif.
if b1
// if b1 != 0, this section gets executed
else // optional else statement
// if b1 = 0, this section gets executed
endif
|
vs 2.0 |
The termination point for a rep-endrep block.
Zero slots
endrep
When used with the rep instruction, creates a block of instruction over which execution can be specified a number of times.
Setup:
You must have a rep instruction in your shader prior to this instruction.
Results:
Execution is controlled by the rep instruction that precedes this instruction. When the iteration count of that statement is zero then execution will jump to the statement following the endrep.
defi i0, 20, 0, 0, 0
rep i0 // i0.x is used = 20
// this section gets executed 20 times
endrep
|
vs 1.0, 1.1, 2.0 |
This macro computes power of two to at least 20-bits of precision. By default, only the source register’s w element is used. The results are replicated in the entire destination register. Note that the expp instruction sets the destination’s w element is set to 1.
Takes at least 12 instruction slots.
exp Dest0, Source0
Calculates for 2Source0.w, and writes the result in Dest0. Unless otherwise specified, Source0.w is the input value, and all elements of Dest0 are written with the exponented value. This is somewhat different from the expp instruction, which always sets Dest0.w to 1. [TODO: what happens for 0 and negative arguments??]
exp r0, c1 // fill all of r0 with exp2(c1.w)
exp r0.x, c1.y // store exp2(c1.y) in r0.x
|
vs 1.0, 1.1, 2.0 |
Computes power of two with the results being broken into a partial precision part and a higher precision integer and fractional parts. This allows you to use the lower precision single element or use a more complicated integer/fractional calculation when you need higher precision. The destination’s w element is set to 1. Only the integer part of the source register’s w element is used. If Source0.w < 0 then the results are undefined.
Note: Don’t confuse this with the exp macro!
One slot
expp Dest0, Source0
Computes low and higher precision values for 2Source0.w, where Dest0.z contains the low precision single element approximation, Dest0.x and Dest0.y contain the integer and fractional parts. Dest0.w is set to 1.
You have a choice in which part of the results to use. There low precision part will contain the exponent of the input value to 10-bits of precision. The two-part higher precision part will contain the exponent of the integer part of the input value, and the fractional part of the input value, which you will have to provide a function to compute the value of 2n for 0 <= n <= 1 to your desired precision, and then add that to the integer’s exponent value.
Setup:
Store the value you want the exponent of in Source0.w. The value should be positive. The other resister elements are ignored.
Results:
Dest0.z will contain a low precision exponential value.
Dest0.x will contain the exponential of the integer part of the input.
Dest0.y will contain the fractional part of the input, not the exponential of the fractional part. You have to do the conversion yourself.
Dest0.w is set to 1.0.
expp r0, r1
//
DirectX 8 version
SetSourceRegisters();
//
Simulate the expp instruction
float
wWhole = Source0.w; // take all
float
wInt = (int)Source0.w; // take integer
part
//
compute the higher-precision parts
TempReg.x
= pow(2,wInt);
TempReg.y
= Source0.w – wInt; // fractional part of w
//
calculate the 2^(Source0.w) then chop
//
to 10 bits precision
TempReg.z
= pow(2,wWhole) & 0xffffff00;
//
set w to 1
TempReg.w
= 1;
WriteDestinationRegisters();
//
DirectX 9 version
SetSourceRegisters();
//
Simulate the logp instruction
float
v = abs(Source0.w); // only positive values
float
logValue;
if
( 0 == v )
{
logValue = MINUS_INFINITY;
}
else
{
logValue = (float)(log(v)/log(2));
logValue = (int)::floor( logValue );
// store low-precision part to 10-bits
unsigned long temp = *(unsigned
long*)&logValue;
logValue = *(float*)& temp &
0xFFFFFF00;
}
TempReg.x = TempReg.y =
TempReg.z = TempReg.w =
LogValue;
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
This macro removes the integer part of the input register’s x and y elements and places the fractional remainder into the destination register’s x and y elements. The sign of the results are always positive. A write mask on the destination is required.
Takes 3 instruction slots
frc Dest0, Source0
Takes the fractional parts of Source0’s x and y elements and places them in Dest0’s x and y elements. Dest0’s z and w elements are unaltered. The sign of the input arguments is ignored. You must specify a write mask on Dest0. This can be either xy or just y. (Not just x, for some reason).
Note: Early versions of the SDK documentation incorrectly stated that the entire source register was used, and made no mention of the fact that write masks were required.
frc r0.xy, r1 // use r1.xy and store fractions in r0.xy
// use r1.x and store fraction in r0.y, r0.x (and z & w)
// remain unchanged
frc r0.y , r1.x
// this has no effect on the results, since the
// sign is ignored
frc r0.y , -r1.x
frc r0, r1 // Error! No write mask.
|
vs 2.0 |
The start of an if-else-endif block. Conditionally execute a block of code.
One slot.
if BoolReg0
The argument must be a boolean constant register. There must be a terminating endif that follows the if instruction. The else instruction is optional and must be between the if and endif statements. If the boolean argument is true, then execution will continue immediately after the if statement, until either the else or endif statement are reached.
if blocks can be placed inside an if-endif or a loop-endloop block, but they must be entirely inside them.
Setup:
The argument must be a boolean register. You must have an endif instruction in your shader following to this instruction.
Results:
Execution is controlled by the if instruction. When the boolean of that statement is false then execution will jump to the statement following the if, which must be either an else or an endif statement.
if b1
// if b1 != 0, this section gets executed
else // optional else statement
// if b1 = 0, this section gets executed
endif
|
vs 2.0 |
Defines a label for use with a call or callnz instruction.
Zero slots.
label <n>
The label instruction marks the next instruction as having the specified label, thus making it a target for a subroutine call. The argument <n> must be integer label in the range [0,15] - that is, there can be a total of sixteen labels.
Setup:
The argument must be an integer.
Results:
When a call or callnz instruction calls the integer label, execution immediately (and conditionally for the callnz instruction) goes to the instruction following the label statement. Execution will return when a ret instruction is encountered
// VS 2.0
call 12 // somewhere in shader call subroutine
label 12 // label 12
// the subroutine instructions go here
ret // execution returns after the call
|
vs 1.0, 1.1 |
Computes the traditional diffuse and specular lighting coefficients when passed on the resulting dot products from N · L and N · H and a power coefficient.
One slot
lit Dest0, Source0
You’ll need to calculate normalized N · L and N · H dot products and specify a specular power value prior to using this instruction. The results will be the traditional diffuse component in Dest0.y:
Dest0.y = id = ( N · L )
And the traditional specular component (i.e. Blinn’s eqn.) in Dest0.z
Dest0.z = is = ( N · H ) ms
Note that there are no k parameters in the equations. If you need them you’ll have to do the multiplication in your shader.
Setup:
Source0.x should contain the normalized dot product between the normal and the direction from the vertex to the light. Source0.y should contain the normalized dot product between the normal and the half angle vector. Source0.w should contain the power value in the range -128 to +128. Source0.z is ignored.
Results:
Dest0.x and Dest0.w are set to 1. If Source0.x (N · L) is positive, it’s stored in Dest0.y, else Dest0.y is set to 0. If both Source0.x and Source0.y (N · H) are positive then Dest0.z is set to Source0.y raised to the Source0.w power, else is set to zero. The power value (Source0.w) is clamped to the range [–128, 128].
Note: Early versions of the SDK documentation incorrectly stated that negative exponential values would cause undefined results.
lit r0, r1
SetSourceRegisters();
//
Simulate the lit instruction
//
these are constants
TempReg.x
= TempReg.w = 1;
// if N dot L is positive…
if ( Source0.x > 0 )
{
Dest0.y = Source0.x;
// if N dot H is positive
if ( Source0.y > 0 )
{
// clamp the power value to an 8.8
// fixed point representation of the
// maximum allowable value
const float kPowerMax = 127.9961f;
float ClampedPower = Source0.w;
if
(ClampedPower < -kPowerMax )
ClampedPower = -kPowerMax;
else if (ClampedPower > kPowerMax )
ClampedPower = kPowerMax;
// actual value in shader math is only
// good to seven fractional bits of
// precision
Dest0.z = pow( Source0.y, ClampedPower );
}
else
{
Dest0.z = 0; // if N dot H was negative/zero
}
}
else
{
Dest0.y = 0; // if N dot L was negative/zero
}
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
This macro computes log2 of the input argument in at least 20-bit precision. The absolute value the source register’s w element is used. Unlike the logp instruction, the destination’s w element is not set to 1.
Note: Don’t confuse this with the logp instruction.
Takes at least 12 instruction slots
log Dest0, Source0
Computes log2 of the absolute value of the Source0.w element (unless otherwise specified) and places the result into all elements of Dest0. If the argument is equal to zero then all result registers are set to minus infinity. This is somewhat different from the logp instruction, which always sets Dest0.w to 1.
log r0, r1
|
vs 1.0, 1.1, 2.0 |
Computes log2 with the results being broken into a single10-bit precision part and a higher precision dual-element part. This allows you to use the lower precision single element or use a more complicated integer/fractional calculation if you need higher precision.
One slot
logp Dest0, Source0
Computes low and higher precision values for log2Source0.w. The destination’s w element is set to 1. Only the source register’s w element is used. If Source0.w is negative then it’s absolute value is used. If Source0.w is zero then the results are negative infinity in Dest0.x and Dest0.z and 1.0 in Dest0.y.
You have a choice in which part of the results to use. There low precision part will contain the log2 of the input value to 10-bits of precision.
The two-part higher precision part represents the exponent and mantissa. This allows you to use the lower precision single element or use a more complicated exponent/mantissa calculation when you need higher precision.
To use the high precision results you’ll need to provide a function that computes log2 in the range [1,2) with your desired precision. You’d then add this result to the value returned in Dest0.x to get the log2 of your input value.
Note: Don’t confuse this with the log macro!
Setup:
Store the value you want the log2 of in Source0.w. The value should be positive. The other resister elements are ignored.
Results:
Dest0.z contains the low precision (10-bit) single element approximation.
Dest0.x contains the most significant part of the dual-element result. This value can be negative.
Dest0.y contains the mantissa of the dual-element result in the form an exponented value in the range [1,2). You have to do the conversion yourself.
logp r0, r1
//
DirectX 8 version
SetSourceRegisters();
//
Simulate the logp instruction
float
v = abs(Source0.w); // only positive values
TempReg.y
= TempReg.w = 1.0f;
if
( 0 == v )
{
TempReg.x = = TempReg.z = MINUS_INFINITY;
}
else
{
float logValue = (float)(log(v)/log(2));
// store exponent
Dest0.x = (int)::floor( logValue );
// store mantissa, lop off anything more than
// 8 bits of significance
int p = (*(unsigned long*)&v
& 0x7FFFFF | 0x3F800000;
Dest0.y = *(float*)&p;
// store low-precision part to 10-bits
unsigned long temp = *(unsigned
long*)&logValue;
Dest0.z = *(float*)& temp &
0xFFFFFF00;
}
WriteDestinationRegisters();
//
DirectX 9 version
SetSourceRegisters();
//
Simulate the logp instruction
float
v = abs(Source0.w); // only positive values
float
logValue;
if
( 0 == v )
{
logValue = MINUS_INFINITY;
}
else
{
logValue = (float)(log(v)/log(2));
logValue = (int)::floor( logValue );
// store low-precision part to 10-bits
unsigned long temp = *(unsigned
long*)&logValue;
logValue = *(float*)& temp &
0xFFFFFF00;
}
TempReg.x = TempReg.y =
TempReg.z = TempReg.w =
LogValue;
WriteDestinationRegisters();
|
vs 2.0 |
The starting point for a loop-endloop block. Iterate over a block of code a number of times.
One slot
loop IntSource0
When used with the endloop instruction, creates a block of instruction over which execution can be specified a variable number of times. Each time though the loop the loop counter is incremented by the specified amount. Compare this to the rep instruction, which does not increment the loop counter independently.
Setup:
The argument must be integer register. IntSource0.x holds the number of times the loop is to execute. The loop counter register gets incremented at the endloop. The counter can be used to index into the constant register array. IntSource0.y is the initial value for the loop counter register. IntSource0.z specifies the step for the loop counter register. Execution will go to the statement following the matching endloop instruction when IntSource0.x <= 0. The loop counter register, aL, is available inside the loop.
You must not nest loops, nor jump neither out of nor into a loop - endloop block. The endloop instruction must precede the loop instruction
Results:
The instructions in the loop - endloop block is executed IntSource0.x times with the loop counter getting incremented each time though.
loop i0 // assume i0.x, i0.y, and i0.z are set up
// aL is available and incremented each time
// through the loop
endloop
// simulate the loop
instruction
// -- loop statement begin
aL = IntReg0.y; // loop counter
StartLoopOffset: // <-
label
if ( IntSource0.x <= 0
)
goto EndLoopOffset;
// -- loop statement end
// section of code between the loop/endloop
// -- endloop statement
begin
aL += IntSource0.z; //
increment loop counter
IntSource0.x--; //
decrement
goto StartLoopOffset;
EndLoopOffset: // <-
label
// -- endloop statement
end
|
vs 2.0 |
Linear interpolation between two registers (a.k.a. “lerp”) using a fraction specified in a third register. This is done on an element-by-element basis.
Two slots.
lrp Dest0, Source0, Source1, Source2
This macro instruction interpolates between two floating point numbers, Source0 and Source1, based upon a third, Source2. When Source2 is zero, Source0 is placed in the destination. When Source2 is one, Source1 is placed in the destination. Values in the [0,1] range interpolate between Source1 and Source2. If the value is outside the range [0,1] the result is indeterminate. Dest0 must be a temporary register, and cannot be the same as Source0 or Source2.
The macro expands to the following code:
add Dest0, Source1, -Source2
mad Dest0, Dest0, Source0, Source2
Setup:
Dest0 must be a temporary register, and Source2 should be in the range [0,1].
Results:
The value between Source0 and Source1 is interpolated from the value in Source2. The result is written in Dest0.
lrp r0, r1, r2, c14
lrp r0, r1, r2, c14.x // lrp using a single value
SetSourceRegisters();
//
simulate lrp instruction
TempReg.x
=
Source3.x *(Source1.x – Source2.x) +
Source2.x;
TempReg.y
=
Source3.y *(Source1.y – Source2.y) +
Source2.y;
TempReg.z
=
Source3.z *(Source1.z – Source2.z) +
Source2.z;
TempReg.w
=
Source3.w *(Source1.w – Source2.w) +
Source2.w;
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Matrix 3 by 2. Performs a matrix multiply on the input vector and input matrix and stores the result. This macro is typically used for 2D transformation calculations.
Takes 2 instruction slots
m3x2 Dest0, Source0, Source1
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read, only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused and only the x and y elements of the destination are written.
Warning!: Make sure that your Dest0 and Source0 registers aren’t the same registers. It will compile and run but your results will be incorrect.
Note: You are not allows to use the swizzle or negate modifiers on Source1.
m3x2 r0, v0, c6 ; // will use c7 as well
m3x2 r0, v0, c6.yzxw // Error! Can’t uses swizzle
This macro;
m3x2 Dest0, Source0, Source1
expands to the following;
dp3 Dest0.x, Source0, Source1
dp3 Dest0.y, Source0, Source2
Note: You can use the swizzle or negate modifiers if you expand this macro yourself.
|
vs 1.0, 1.1, 2.0 |
Matrix 3 by 3. Performs a matrix multiply on the input vector and input matrix and stores the result. This macro is typically used for normal transformations during lighting calculations.
Takes 3 instruction slots
m3x3 Dest0, Source0, Source1
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read, only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused and only the x, y, and z elements of the destination register are written.
Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.
Note: You are not allows to use the swizzle or negate modifiers on Source1.
m3x3 r0, v0, c6 ; // will use c7 & C8 as well
m3x3 r0, v0, c6.yzxw // Error! Can’t uses swizzle
This macro;
m3x3 Dest0, Source0, Source1
expands to the following;
dp3 Dest0.x, Source0, Source1
dp3 Dest0.y, Source0, Source2
dp3 Dest0.y, Source0, Source3
Note: You can use the swizzle or negate modifiers if you expand this macro yourself.
|
vs 1.0, 1.1, 2.0 |
Matrix 3 by 4. Performs a matrix multiply on the input
vector and input matrix and stores the result.
Takes 4 instruction slots
m3x4 Dest0, Source0, Source1
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read, only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused.
Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.
Note: You are not allows to use the swizzle or negate modifiers on Source1.
m3x4 r0, v0, c6 ; // will use c7, C8, & c9 as well
m3x4 r0, v0, c6.yzxw // Error! Can’t uses swizzle
This macro;
m3x4 Dest0, Source0, Source1
expands to the following;
dp3 Dest0.x, Source0, Source1
dp3 Dest0.y, Source0, Source2
dp3 Dest0.y, Source0, Source3
dp3 Dest0.y, Source0, Source4
Note: You can use the swizzle or negate modifiers if you expand this macro yourself.
|
vs 1.0, 1.1, 2.0 |
Matrix 4 by 3. Performs a matrix multiply on the input
vector and input matrix and stores the result.
Takes 3 instruction slots
m4x3 Dest0, Source0, Source1
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. All source register elements are used, but Dest0.w will be unmodified.
Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.
Note: You are not allows to use the swizzle or negate modifiers on Source1.
m4x3 r0, v0, c6 ; // will use c7 & C8 as well
m4x3 r0, v0, c6.yzxw // Error! Can’t uses swizzle
This macro;
m4x3 Dest0, Source0, Source1
expands to the following;
dp4 Dest0.x, Source0, Source1
dp4 Dest0.y, Source0, Source2
dp4 Dest0.z, Source0, Source3
Note: You can use the swizzle or negate modifiers if you expand this macro yourself.
|
vs 1.0, 1.1, 2.0 |
Matrix 4 by 4. Performs a matrix multiply on the input
vector and input matrix and stores the result.
Takes 4 instruction slots
m4x4 Dest0, Source0, Source1
Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. All source register elements are used, and all destination register will be written.
Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.
Note: You are not allows to use the swizzle or negate modifiers on Source1.
m4x4 r0, v0, c6 ; // will use c7, c8, & C9 as well
m4x4 r0, v0, c6.yzxw // Error! Can’t uses swizzle
This macro;
m4x4 Dest0, Source0, Source1
expands to the following;
dp4 Dest0.x, Source0, Source1
dp4 Dest0.y, Source0, Source2
dp4 Dest0.z, Source0, Source3
dp4 Dest0.w, Source0, Source4
Note: You can use the swizzle or negate modifiers if you expand this macro yourself.
|
vs 1.0, 1.1, 2.0 |
Multiply and add. Multiplies two registers and then adds a
third to the result then stores the result.
One slot
mad Dest0, Source0, Source1, Source2
Multiplies Source0 by Source1, then adds Source2 to the result. The result is stored in Dest0.
Setup::
Source0 and Source1 are the registers to be multiplied. Source2 is the register to be added to the result of the multiplication.
Results:
Dest0 contains (Source0 * Source1) + Source2.
mad r0, r0, r1, r2
SetSourceRegisters();
//
Simulate the mad instruction
TempReg.x
= Source0.x * Source0.x + Source1.x;
TempReg.y
= Source0.y * Source0.y + Source1.y;
TempReg.z
= Source0.z * Source0.z + Source1.z;
TempReg.w
= Source0.w * Source0.w + Source1.w;
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Stores the maximum value from comparing two source registers into the destination register.
One slot
max Dest0, Source0, Source1
Finds the maximum between elements of Source0 and Source1, then stores the results in elements Dest0. The resulting register may not be equal to either input register.
Setup:
Source0 and Source1 are the registers to be compared.
Results:
Dest0.z contains the maximum of the two input registers done on an element-by-element basis.
max r0, r1, r2
SetSourceRegisters();
//
Simulate the max instruction
TempReg.x
= Source0.x > Source0.x ?
Source0.x : Source0.x;
TempReg.y
= Source0.y > Source0.y ?
Source0.y : Source0.y;
TempReg.z
= Source0.z > Source0.z ?
Source0.z : Source0.z;
TempReg.w
= Source0.w > Source0.w ?
Source0.w : Source0.w;
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Stores the minimum value from comparing two source registers into the destination register.
One slot
min Dest0, Source0, Source1
Finds the minimum between elements of Source0 and Source1, then stores the results in elements Dest0. The resulting register may not be equal to either input register.
Setup:
Source0 and Source1 are the registers to be compared.
Results:
Dest0 contains the minimum of the two input registers done on an element-by-element basis.
min r0, r1, r2
SetSourceRegisters();
//
Simulate the min instruction
TempReg.x
= Source0.x < Source0.x ?
Source0.x : Source0.x;
TempReg.y
= Source0.y < Source0.y ?
Source0.y : Source0.y;
TempReg.z
= Source0.z < Source0.z ?
Source0.z : Source0.z;
TempReg.w
= Source0.w < Source0.w ?
Source0.w : Source0.w;
WriteDestinationRegisters();
|
vs 1.0 - 2.0 |
Stores the source registers into the destination register. Useful for moving from a temporary register into an output register or for swizzling. The source and destination registers can be the same.
The mov instruction is the only instruction that can use the address register as a destination and only in vertex shaders version 1.1 or later. If the address register is the destination, then the value is rounded to the integer value that is less than or equal to the initial value. In VS 2.0 you must use the mova instruction to set the address register.
One slot
mov Dest0, Source0
Moves Source0 into Dest0. A special case is when the Dest0 is an address register. In this case the value stored is the closest integer values that is less than the initial value. This means that it rounds the number towards negative infinity. Thus 1.5 would get stored as 1, while –1.5 would get stored as –2. In both cases the values stored is the integer value that’s closest and less than the initial value.
Setup:
Source0 is the register to be copied.
Results:
Dest0 contains a copy of Source0, unless it’s the address register, in which case nearest integer value that’s less than or equal to the initial value in the register. If the destination is the address register, then, unless otherwise specified, only the Source0.x register is used.
mov r0 , r1
mov a0.x , c1.w // initializing address register
SetSourceRegisters();
//
Simulate the mov instruction
if
( Source0 == a0 ) // it’s the address register
{
// use only integer part
TempReg.x = (int)::floor( Source0.x );
}
else
{
TempReg.x = Source0.x;
TempReg.y = Source0.y;
TempReg.z = Source0.z;
TempReg.w = Source0.w;
}
WriteDestinationRegisters();
|
vs 2.0 |
Move data from a floating point register into the address register.
One slot
mova Dest0, Source0
This instruction rounds Source0 to the nearest integer and places the result in Dest0. Dest0 must be the address register. Rounding is to nearest even, though this is not exactly specified and applications should nor rely on this behavior. That is, for values equidistant between two integers, some implementations may round up, down, or randomly pick a direction. The _sat modifier is not supported.
Setup:
Source0 is the floating point register to be rounded then placed in the address register.
Results:
The rounded value from Source0 is placed in the address register.
mova a0.x, c1.w // move one element
mova a0, c1 // move all
SetSourceRegisters();
//
use only integer part
//
Note: RoundToNearestInteger() is
//
Implementation dependant
a0.x
= RoundToNearestInteger( Source0.x );
a0.y
= RoundToNearestInteger( Source0.y );
a0.z
= RoundToNearestInteger( Source0.z );
a0.w
= RoundToNearestInteger( Source0.w );
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Multiplies the two source registers element by element and stores them in the destination register.
One slot
mul Dest0, Source0, Source1
Multiplies Source0 by Source1 and stores the result in Dest0.
Setup::
Source0 and Source1 are the two registers to be multiplied.
Results:
Dest0 contains the result of the multiplication of Source0 and Source1.
mul r0, r1, r2
SetSourceRegisters();
//
Simulate the mul instruction
TempReg.x
= Source0.x * Source0.x;
TempReg.y
= Source0.y * Source0.y;
TempReg.z
= Source0.z * Source0.z;
TempReg.w
= Source0.w * Source0.w;
WriteDestinationRegisters();
|
vs 1.0 - 2.0 |
Defines the null instruction (No-Operation).
No slots, no time
nop
You might use this instruction for timing. You can use it to create a shader that does nothing but take up time as it executes to see how a shader of that length would affect your rendering. It’s possible that a driver might optimize away this instruction.
Setup:
None.
Results:
Takes up one slot/clock cycle.
nop
|
vs 2.0 |
This macro will normalize all elements of a register.
Takes three slots
nrm Dest0, Source0
This macro will take all elements of Source0 and normalize them so that the square root of the sum of squares of all elements in Dest0 is one. Dest0 cannot be the same register as Source0.
nrm r0, v0
This macro;
nrm Dest0, Source0
is equivalent to the following;
dp4 Dest0.x, Source0
rsq Dest0.x, Dest.x
mul Dest0, Source0, Dest0.x
|
vs 2.0 |
Computes the power function for a scalar value.
Takes 3 slots
pow Dest0, Source0, Source1
Only the .w element of the source registers are used. Only the absolute value of the Source0 is used. Dest0 is filled with abs(Source0.x) raised to the Source1.x power. The result is replicated in all elements of the destination.
pow r0, r3, c6 // assume r3.x and c6.x are set
This macro;
pow Dest0, Source0, Source1
is equivalent to the following;
log Dest0.w, Source0 // takes absolute value
mul Dest0.w, Dest0.w, Source1.w
exp Dest0, Dest0.w
|
vs 1.0, 1.1, 2.0 |
Computes the reciprocal of an element of the source register and stores it in the destination register.
One slot
rcp Dest0, Source0
Computes the reciprocal of a single element of the source register and stores it in all elements of the destination register. Only one element of the source is used. If no element is specified then Source0.w is used. A value of exactly 1 on input returns 1 on output (no round-off error) while a value of 0 on input returns positive infinity;
Setup:
Source0 contains the element take the reciprocal of. If unspecified, Source0.w is used.
Results:
Dest0 contains the reciprocal of the specified element copied in all elements.
.
Performance Note: This is one of the few instructions that will take more than one clock to execute. Use it sparingly, and when you use it try to arrange your code so that you don’t need the results immediately.
rcp r0, r1
SetSourceRegisters();
//
Simulate the rcp instruction
if
( 0.0f == Source0.w ) // if 0
{
TempReg.w =
PLUS_INFINITY;
}
else
if ( 1.0f == Source0.w == 1 ) // if 1
{
TempReg.w =
1.0f;
}
else
{
TempReg.w =
1.0f/Source0.w;
}
TempReg.x
= TempReg.y = TempReg.z = TempReg.w;
WriteDestinationRegisters();
|
vs 2.0 |
Indicates the start of a rep-endrep block.
One slot
rep
IntSource0
IntSource0 must be an integer register. Only the .x element is used. The maximum initial value can be 255. Execution over the block will continue for IntSource0.x times, as long as the number is positive. Compare this to the loop instruction, which additionally increments over the loop counter independently.
Setup:
IntSource0 must be an integer register with the .x element initialized to the number of times to iterate through the block.
Results:
The instructions in the rep - endrep block is executed IntSource0.x times.
defi i0, 10, 0, 0, 0 // i0.x is set to the count
rep i0
// the instructions here will get executed i0.x times
endrep
//
Simulate the rep instruction
int
LoopCounter = IntReg0.x;
if
(LoopCounter <= 0 ) goto EndLoop
//
the instructions following the loop
//
instruction would go here
//
Simulate endloop instruction
aL
+= IntReg0.z;
LoopCounter--;
goto
TopLoop;
EndLoop:
|
vs 2.0 |
Indicates the end of a subroutine.
One slot
ret
This instruction will return to the calling instruction (a call or callnz instruction) or return from the main function.
Setup:
Returns to the address following the most recent call or callnz instruction, or returns from the main function.
Results:
The path of execution is changed to the next instruction on the instruction stack.
ret
|
vs 1.0, 1.1, 2.0 |
Computes the reciprocal square root of one element of the source register and stores it in all elements of the destination register.
One slot
rsq Dest0, Source0
Computes the reciprocal square root of the specified element of the source register and stores it all elements of the destination register. If no element is specified then Source0.w is used. The absolute value of the input is used. A value of exactly 1 on input returns 1 on output (no round-off) while a value of 0 on input returns positive infinity;
Setup:
Source0 contains the element take the reciprocal square root of. If unspecified, Source0.w is used.
Results:
Dest0 contains the reciprocal square root of the absolute value of the specified element copied in all elements.
.
Performance Note: This is one of the few instructions that will take more than one clock to execute. Use it sparingly, and when you use it try to arrange your code so that you don’t need the results immediately.
rsq r0, r1
SetSourceRegisters();
//
Simulate the rsq instruction
float
v = abs(Source0.w);
if
( 0.0f == v ) // if 0
{
TempReg.w =
PLUS_INFINITY;
}
else
if ( 1.0f == v ) // if 1
{
TempReg.w =
1.0f;
}
else
{
TempReg.w =
1.0f/sqrt(v);
}
TempReg.x
= TempReg.y = TempReg.z = TempReg.w;
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Set Greater-than or Equal-to. Stores 1 in the destination register if the first source register is greater or equal to the second source register. If not it stores 0 in the destination register. Does an element-by-element comparison.
One slot
sge Dest0, Source0, Source1
Compares the two source registers element by element. If the first source register’s element is greater or equal to the second source register’s element , the value 1 is placed in the destination register’s element. If not it stores 0 in the destination register’s element. The resulting register may not be equal to either input register.
Setup:
Source0 and Source1 are the registers to be compared.
Results:
The element Dest0.n contains 1.0 if the Source0.n is greater than or equal to Source1.n, otherwise it contains 0.0. This is done for all elements of Dest0.
sge r0, r1, r2
SetSourceRegisters();
//
Simulate the sge instruction
TempReg.x
= Source0.x >= Source0.x ?
1.0f : 0.0f;
TempReg.y
= Source0.y >= Source0.y ?
1.0f : 0.0f;
TempReg.z
= Source0.z >= Source0.z ?
1.0f : 0.0f;
TempReg.w
= Source0.w >= Source0.w ?
1.0f : 0.0f;
WriteDestinationRegisters();
|
vs 2.0 |
Computes the sign of each element in a register.
Takes 3 slots
sgn Dest0, Source0, Source1, Source2
Computes the sign of the elements of Source0, using two temporary scratch registers. All elements of the source registers are compared. The comparison is done element-by-element. Source1 and Source2 should be temporary registers and should not be the same. If an element in Source0 was > 0 then the corresponding element in Dest0 will be 1. If it was <0 then the result will be –1. If it was 0 the result will be 0.
Note: Source1 and Source2 will be modified after this macro!
sgn r3, r1, r2
This macro;
sgn Dest0, Source0, Source1, Source2
is equivalent to the following;
slt Source1, Source0, -Source0
slt Source2, -Source0, Source0
add Dest0, Source2, -Source1
|
vs 2.0 |
Computes the sine and cosine values for a scalar argument.
Takes 8 slots
sincos Dest0, Source0, Source1, Source2
Estimates the sine and cosine value inside a shader with a maximum error of 0.002 through the use of a Taylor series expansion. Source0 must have a replicate swizzle to indicate which element to use. This should be a value in radians between ±π. Dest0 should be a temporary register. The destination must have .x, .y or .xy as a write mask.
Setup:
One element of Source0 has to have the value in radians. Source1 and Source2 have to be set up with the following values to perform the expansion.
Source1 = [ 1/(7!*128), 1/(6!*64), 1/(4!*16), 1/(5!*16) ]
Source2 = [ 1/(3!*8), 1/(2!*8), 1, 0.5 ]
Results:
The resulting sine and cosine values are written in Dest0.x and Dest0.y respectively.
// setup values
def c1, 1.0f/(7!*128),1.0f /(6!*64),
1.ff/(4!*16), 1.0f/(5!*16)
def c2, 1.0f/(3!*8), 1.0f/(2!*8), 1.0f, 0.5f
// assume value to take sin/cos of is in r0.x
sincos r0.xy, r0.x, c1, c2
|
vs 1.0, 1.1, 2.0 |
Set Less-Than. Stores 1 in the destination register if the first source register is less than the second source register. If not it stores 0 in the destination register. Does an element-by-element comparison.
One slot
slt Dest0, Source0, Source1
Compares the two source registers element by element. If the first source register’s element is less than the second source register’s element , the value 1 is placed in the destination register’s element. If not it stores 0 in the destination register’s element. The resulting register may not be equal to either input register.
Setup:
Source0 and Source1 are the registers to be compared.
Results:
The element Dest0.n contains 1.0 if the Source0.n is less than Source1.n, otherwise it contains 0.0. This is done for all elements of Dest0.
slt r0, r1, r2
SetSourceRegisters();
//
Simulate the slt instruction
TempReg.x
= Source0.x < Source0.x ?
1.0f : 0.0f;
TempReg.y
= Source0.y < Source0.y ?
1.0f : 0.0f;
TempReg.z
= Source0.z < Source0.z ?
1.0f : 0.0f;
TempReg.w
= Source0.w < Source0.w ?
1.0f : 0.0f;
WriteDestinationRegisters();
|
vs 1.0, 1.1 |
Subtracts two sources into the destination register.
One slot
sub Dest0, Source0, Source1
Subtracts the Source0 and Source0 registers and places the result in the Dest0 register.
Setup:
Two source registers, Source0 and Source1.
Results:
Each element of Dest0 is filled with the element-by-element subtraction of the elements of Source1 from Source0.
sub r0, r0, c2
SetSourceRegisters();
//
Simulate the sub instruction
TempReg.x
= Source0.x - Source1.x;
TempReg.y
= Source0.y - Source1.y;
TempReg.z
= Source0.z - Source1.z;
TempReg.w
= Source0.w - Source1.w;
WriteDestinationRegisters();
|
vs 1.0, 1.1, 2.0 |
Defines the version of the vertex shader code you are using.
No Slots
vs.integer1.integer2
The argument is of the form vs.x.y, where x is the main version number and y is the minor version number. Both values are integers.
Setup:
Two integers that form the major.minor version of the shader version you want to use. This must be the first instruction in your shader.
Results:
Tell the assembler what features to allow in the shader instruction to follow.
vs.1.0 //not using the address register in this one
vs.1.1 //uses address register