Click on an instruction to jump to that page.

 

abs

dp3

lit

mov

add

dp4

log

mova

call

dst

logp

mul

callnz

else

loop

nop

crs

endif

lrp

nrm

dcl

endloop

m3x2

pow

def

endrep

m3x3

rcp

defb

exp

m3x4

rep

defi

expp

m4x3

ret

 

frc

m4x4

rsq

 

if

mad

sge

 

label

max

sgn

 

 

min

sincos

 

 

 

slt

 

 

 

sub

 

 

 

vs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

abs (macro)

 vs 2.0

 

This macro computes the absolute value of the input register.


One slot


abs Dest0, Source0

 

This macro is equivalent to;

 

max  Dest0, Source0, -Source0

 

which you can use if your using a pre-vertex shader 2.0 shader. In any case, you’ll end up with the absolute value of the Source0 in Dest0.

 

Setup:

One source register, Source0.

 

Results:

Dest0 is filled with the absolute value of Source0.

 

abs  r0  , r0

abs r0.z, r0.z

 

 

add

 vs 1.0, 1.1, 2.0

 

Adds two sources into the destination register.


One slot


add  Dest0, Source0, Source1

 

Adds the Source0 and Source0 registers and places the result in the Dest0 register.

 

Setup:

Two source registers, Source0 and Source1.

 

Results:

Each element of Dest0 is filled with the element-by-element addition of the elements of Source0 and Source1.

 

add  r0  , r0  ,   c2

add r0.z, r0.z,  -r0.z

 

SetSourceRegisters();

 

// Simulate the add instruction

TempReg.x = Source0.x + Source1.x;

TempReg.y = Source0.y + Source1.y;

    TempReg.z = Source0.z + Source1.z;

TempReg.w = Source0.w + Source1.w;

 

WriteDestinationRegisters();

 

call

 vs 2.0

 

Makes an unconditional function call to the instruction label.


One slot


call_InstructionLabelID

 

Pushed the address of the following instruction onto the internal shader stack, and then sets the current instruction address to address of the instruction that follows the label instruction with the name InstructionLabeIID. The instruction label ID will be an integer in the range [1,16]. TODO – particular format of the statement??

 

Typically you’d create a shader subroutine that terminates with the ret instruction.

 

Setup:

Requires a valid, existing instruction label. .

 

Results:

The shader execution is transferred the instruction following the instruction label.

 

call_1

call_16

call_Fred // Error! Invalid label

call_0    // Error! Invalid label (out of range)

 

 

// Simulate the call instruction

 

// make a cast to a bare function pointer

typedef (void (*fp)(void));

 

// take address of the label

fp pFP = (fp)IntructionLabelID;

pFP(); // call the function

// returns here only when ret is executed

 

callnz

 vs 2.0

 

Call if Not-Zero. Makes a function call to the instruction label.


One slot


callnz InstructionLabelID BoolSource0

 

If the boolean register Source0 is not zero, then the address of the following instruction is pushed onto the internal shader stack, and then the current instruction address is set to address of the instruction that follows the label instruction with the name InstructionLabeIID. The instruction label ID will be an integer in the range [1,16].

 

Typically you’d create a shader subroutine that terminates with the ret instruction.

 

Setup:

Source0 is a Boolean register. Requires a valid, existing instruction label. .

 

Results:

If the source register is not zero, the shader exectution is transferred the instruction following the instruction label.

 

callnz 1 b0 // transfer execution to label1 if = != b0

callnz 2 r0 // Error! Not a Boolean register

 

 

// Simulate the callnz instruction

 

// make a cast to a bare function pointer

typedef (void (*fp)(void));

 

if ( 0 != Boolean argument )

{

fp pFP = (fp)IntructionLabelID;

pFP(); // call the function

}

 

crs (macro)

 vs 2.0

 

The three component cross product computed.


Two slots


crs Dest0, Source0, Source1

 

Computes the three component cross product using the right-hand rule. There are fairly severe restrictions on the use of swizzles. The w element of all registers are ignored.

 

This macro is equivalent to;

 

mul Dest0.xyz,  Source0.yzxw, Source1.zxyw

mad Dest0.xyz, -Source1.yzxw, Source0.zxyw, Dest0

 

Setup:

Two source registers, Source0 and Source1. These registers must not be the same as the destination register. The source registers must not have any swizzles. TODO – is this checked or just gonna produce wrong values?

 

The destination register must have a destination mask, and that mask must not contain a reference to the w element of the destination register.

 

Results:

The cross product of the two input registers is stored into the specified elements of the destination register.

 

crs   r0.xyz,  r1., r2  // fill r0 with dp3

 

 

 

dcl

 vs 2.0

 

Declare. Map a vertex element to an input register.


Takes no slots


dcl Dest0

 

In order to make it easier to optimize and verify shaders VS 2.0 now requires a declaration statement on all input registers. Thus all texture or vertex input registers must be declared before use in the shader. Dest0 will be a specific input register. The partial precision modifier (_pp) can be applied to the declaration statement to indicate a lower precision is acceptable when using this register. You must supply a component mask on Dest0 to indicate which elements are in use and valid. dcl statements must appear before the first executable instruction.

 

dcl    t1.rg // using a 2D texture

dcl    t2    // using a 4D texture (default mask)

dcl_pp t3    // indicate partial precision is OK

 

 

 

def

 vs 1.0, 1.1, 2.0

 

Sets the value of vertex shader float constants, but leaves it up to the programmer to insert these into the shader code.


No slot


def  Dest0, value0, value1, value2, value3

 

Stores four floating-point values in the elements of Dest0 register. If these instructions are used in a shader, these instructions must follow the vs instruction and precede any other instructions.

 

Setup:

Four floating-point values separated by commas.

 

Results:

Has no effect upon the shader code to follow, you must manually insert the returned code fragment into your shader.

 

Note:  If you use the def in a shader then when the shader is compiled you will have to use the 4th parameter returned from D3DXAssembleShader. This parameter will contain an ID3DXBuffer interface, which will contain a compiled shader code fragment. You will have to manually insert this fragment into your shader declaration.

.

def       r0,  0.0, 0.5, 0.25 -1.0

def       r1,  1.0, 2.0, 5.0, 10.0

 

 

defi

 vs 2.0

 

Sets the value of vertex shader integer constants.


No slot


defi  IntDest0, value0, value1, value2, value3

 

Stores four integer values in the elements of IntDest0 register for use in this shader.

 

Setup:

Four integer values separated by commas.

 

Results:

Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantI() call to set a shader constant. The previous values of the register are restored upon exit from the shader.

 

defi i0,  0, 2, 4, 8

defi i1,  -2, -1, 1, 2

 

 

defb

 vs 2.0

 

Sets the value of vertex shader boolean constants.


No slot


defb  BoolDest0, value0, value1, value2, value3

 

Stores four boolean values in the elements of BoolDest0 register for use in this shader. Zero indicates false. Nonzero indicates true.

 

Setup:

Four booleans separated by commas.

 

Results:

Locally sets these values into the register. A local call takes precedence over an external SetVertexShaderConstantB() call to set a shader constant. The previous values of the register are restored upon exit from the shader.

 

defb b0,  0, 1, 0, 2 // false, true, false, true

 

 

 

 

dp3

 vs 1.0, 1.1, 2.0

 

Three component dot product ( a.k.a Dot-product three) is computed and the result replicated in all specified channels of the destination register.


One slot


dp3  Dest0, Source0, Source1

 

Computes the dot product of the Source0 and Source1 registers and places the result in the Dest0 register. Only the x,y and z values are used to compute the dot product, the w component is ignored.

 

Setup:

Two source registers, Source0 and Source1.

 

Results:

Unless otherwise masked, each element of Dest0 is filled with the dot product of the first three elements of registers Source0 and Source1.

 

dp3      r0  ,  v3,  c2 // fill r0 with dp3

dp3      r1.x,  v3,  c2 // just fill r1.x

 

SetSourceRegisters();

 

// Simulate the dp3 instruction

TempReg.x = TempReg.y = TempReg.z = TempReg.w =

Source0.x * Source1.x +

Source0.y * Source1.y +

Source0.z * Source1.z;

// note w component ignored

 

WriteDestinationRegisters();

 

 

dp4

 vs 1.0, 1.1, 2.0

 

Four component dot product ( a.k.a Dot-product four) is computed and the result stored in all specified channels of the destination register.


One slot


dp4  Dest0, Source0, Source1

 

Computes the dot product of the Source0 and Source1 registers and places the result in the Dest0 register. If no mask is specified on the destination, then the entire register is filled with the dot product.

 

Setup:

Two source registers, Source0 and Source1

 

Results:

Unless otherwise masked, each element of Dest0 is filled with the dot product of the four elements of registers Source0 and Source1.

 

dp4      r0,    v3,  c2

dp4      r1.x,  v3,  c2 // just fill r1.x

 

SetSourceRegisters();

 

// Simulate the dp4 instruction

TempReg.x = TempReg.y = TempReg.z = TempReg.w =

Source0.x * Source1.x +

Source0.y * Source1.y +

Source0.z * Source1.z +

Source0.w * Source1.w;

 

WriteDestinationRegisters();

 

 

dst

 vs 1.0, 1.1

 

Computes a distance vector in the format typically used for attenuated lighting calculations.


One slot


dst  Dest0, Source0, Source1

 

Creates a distance vector from a set of distance squared & reciprocal distance values, and put them in a format that can be used for attenuated lighting calculations.

 

Setup:

Two source registers are required to be set up. Source0 should be set up as  [n/a, d2, d2, n/a]. Source1 should be set up as  [n/a, 1/d, n/a, 1/d]. . Elements noted as “n/a” are not used and their values are ignored.

 

Results:

Dest0 will be filled with elements that correspond to [1, d, d2, 1/d]. Dest0.y is computed from the product of Source0.y and Source1.y

 

dst      r2,  r0,  r1

 

SetSourceRegisters();

 

// Simulate the dst instruction

TempReg.x = 1;

TempReg.y = Source0.y * Source1.y;

TempReg.z = Source0.z;

TempReg.w = Source1.w;

 

WriteDestinationRegisters();

 

 

else

 vs 2.0

 

Provided an alternate path of execution for an if-else-endif block.


One slot


else

 

Must be inside of an if-endif block. If the Boolean argument of the if statement is false, then the execution will skip to the else instruction and continue to the terminating endif statement.. If the boolean was true then execution will skip over the code enclosed by the else-endif block. There can be only one else statement in an if-endif block.

 

 

Setup:

The else statement must be between an if and endif statement.

 

Results:

If the argument provided to the if statement was false, then the code inside the else-endif block will be executed.

 

else  

 

 

 

 

endloop

 vs 2.0

 

The termination point for a loop-endloop block.


One slot


endloop

 

When used with the loop instruction, creates a block of instruction over which execution can be specified a variable number of times.

 

Setup:

You must have a loop instruction in your shader prior to this instruction.

 

Results:

When the loop reached the endloop instruction the loop counter (specified in the loop instruction) is incremented by the increment value (also specified in the loop instruction).

 

endloop

 

// simulate the endloop instruction

// assume that LoopCounter, LoopStep, LoopInterator

// were defined in the loop instruction and

// StartLoopOffset is the instruction following

// the loop instruction

 

 

LoopCounter += LoopStep;

 

--LoopInterator;


if ( LoopIterater > 0 )

     goto StartLoopOffset

 

// fall though

 

 

 

 

endif

 vs 2.0

 

The termination point for an if-endif or ifc-endif block.


Zero slots


endif

 

When used with the if or ifc instruction, creates a block of instruction over which execution can be specified a number of times.

 

Setup:

You must have an if or ifc instruction in your shader prior to this instruction.

 

Results:

Execution is controlled by the if or ifc instruction that proceeds this instruction. When the argument of that statement is false then execution will jump to the statement following the endif.

 

if  b1

   // if b1 != 0, this section gets executed

else // optional else statement

   // if b1 = 0, this section gets executed

endif

 

 

 

endrep

 vs 2.0

 

The termination point for a rep-endrep block.


Zero slots


endrep

 

When used with the rep instruction, creates a block of instruction over which execution can be specified a number of times.

 

Setup:

You must have a rep instruction in your shader prior to this instruction.

 

Results:

Execution is controlled by the rep instruction that precedes this instruction. When the iteration count of that statement is zero then execution will jump to the statement following the endrep.

 

defi i0, 20, 0, 0, 0

 

rep    i0 // i0.x is used = 20

 

   // this section gets executed 20 times

 

endrep

 

 

exp (macro)

 vs 1.0, 1.1, 2.0

 

This macro computes power of two to at least 20-bits of precision.  By default, only the source register’s w element is used. The results are replicated in the entire destination register. Note that the expp instruction sets the destination’s w element is set to 1.


Takes at least 12 instruction slots.


exp  Dest0, Source0

 

Calculates for 2Source0.w, and writes the result in Dest0. Unless otherwise specified, Source0.w is the input value, and all elements of Dest0 are written with the exponented value. This is somewhat different from the expp instruction, which always sets Dest0.w to 1. [TODO: what happens for 0 and negative arguments??]

 

exp      r0,   c1  // fill all of r0 with exp2(c1.w)

exp      r0.x, c1.y // store exp2(c1.y) in r0.x

 

expp

 vs 1.0, 1.1, 2.0

 

Computes power of two with the results being broken into a partial precision part and a higher precision integer and fractional parts. This allows you to use the lower precision single element or use a more complicated integer/fractional calculation when you need higher precision. The destination’s w element is set to 1. Only the integer part of the source register’s w element is used. If Source0.w < 0 then the results are undefined.

Note: Don’t confuse this with the exp macro!


One slot


expp  Dest0, Source0

 

Computes low and higher precision values for 2Source0.w, where Dest0.z contains the low precision single element approximation, Dest0.x and Dest0.y contain the integer and fractional parts. Dest0.w is set to 1.

 

You have a choice in which part of the results to use. There low precision part will contain the exponent of the input value to 10-bits of precision. The two-part higher precision part will contain the exponent of the integer part of the input value, and the fractional part of the input value, which you will have to provide a function to compute the value of 2n for 0 <= n <= 1 to your desired precision, and then add that to the integer’s exponent value.

 

Setup:

Store the value you want the exponent of in Source0.w. The value should be positive. The other resister elements are ignored.

 

Results:

Dest0.z will contain a low precision exponential value.

Dest0.x will contain the exponential of the integer part of the input.

Dest0.y will contain the fractional part of the input, not the exponential of the fractional part. You have to do the conversion yourself.

Dest0.w is set to 1.0.

 

expp      r0,  r1

 

// DirectX 8 version

SetSourceRegisters();

 

// Simulate the expp instruction

float wWhole = Source0.w; // take all

float wInt   = (int)Source0.w; // take integer part

 

// compute the higher-precision parts

TempReg.x = pow(2,wInt);

TempReg.y = Source0.w – wInt; // fractional part of w

 

// calculate the 2^(Source0.w) then chop

// to 10 bits precision

TempReg.z = pow(2,wWhole) & 0xffffff00;

// set w to 1

TempReg.w = 1;

 

WriteDestinationRegisters();


 

// DirectX 9 version

SetSourceRegisters();

 

// Simulate the logp instruction

float v = abs(Source0.w); // only positive values

float logValue;

 

if ( 0 == v )

{

logValue = MINUS_INFINITY;

}

     else

{

logValue = (float)(log(v)/log(2));

logValue = (int)::floor( logValue );

// store low-precision part to 10-bits

unsigned long temp = *(unsigned long*)&logValue;

          logValue = *(float*)& temp & 0xFFFFFF00;

}

 

TempReg.x = TempReg.y =

TempReg.z = TempReg.w =

LogValue;

 

WriteDestinationRegisters();

 

frc (macro)

 vs 1.0, 1.1, 2.0

 

This macro removes the integer part of the input register’s x and y elements and places the fractional remainder into the destination register’s x and y elements. The sign of the results are always positive. A write mask on the destination is required.


Takes 3 instruction slots


frc  Dest0, Source0

 

Takes the fractional parts of Source0’s x and y elements and places them in Dest0’s x and y elements. Dest0’s z and w elements are unaltered. The sign of the input arguments is ignored. You must specify a write mask on Dest0. This can be either xy or just y. (Not just x, for some reason).

 

Note: Early versions of the SDK documentation incorrectly stated that the entire source register was used, and made no mention of the fact that write masks were required.

 

frc    r0.xy,  r1 // use r1.xy and store fractions in r0.xy

 

// use r1.x and store fraction in r0.y, r0.x (and z & w)

// remain unchanged

frc    r0.y ,  r1.x

 

// this has no effect on the results, since the

// sign is ignored

frc    r0.y ,  -r1.x

 

frc    r0, r1 // Error! No write mask.

 

 

 

 

 

if

 vs 2.0

 

The start of an if-else-endif block. Conditionally execute a block of code.


One slot.


if BoolReg0

 

The argument must be a boolean constant register. There must be a terminating endif that follows the if instruction. The else instruction is optional and must be between the if and endif statements. If the boolean argument is true, then execution will continue immediately after the if statement, until either the else or endif statement are reached.

 

if blocks can be placed inside an if-endif or a loop-endloop block, but they must be entirely inside them. 

 

Setup:

The argument must be a boolean register. You must have an endif instruction in your shader following to this instruction.

 

Results:

Execution is controlled by the if instruction. When the boolean of that statement is false then execution will jump to the statement following the if, which must be either an else or an endif statement.

 

if  b1

   // if b1 != 0, this section gets executed

else // optional else statement

   // if b1 = 0, this section gets executed

endif

 

 

 

 

 

label

 vs 2.0

 

Defines a label for use with a call or callnz instruction.


Zero slots.


label <n>

 

The label instruction marks the next instruction as having the specified label, thus making it a target for a subroutine call. The argument <n> must be integer label in the range [0,15] - that is, there can be a total of sixteen labels.

 

Setup:

The argument must be an integer.

 

Results:

When a call or callnz instruction calls the integer label, execution immediately (and conditionally for the callnz instruction) goes to the instruction following the label statement. Execution will return when a ret instruction is encountered

 

// VS 2.0

 

call 12 // somewhere in shader call subroutine

 

 

label 12   // label 12

 

// the subroutine instructions go here

 

ret        // execution returns after the call

 

 

 

 

 

 

lit

 vs 1.0, 1.1

 

Computes the traditional diffuse and specular lighting coefficients when passed on the resulting dot products from N · L and N · H and a power coefficient.


One slot


lit  Dest0, Source0

 

You’ll need to calculate normalized N · L and N · H dot products and specify a specular power value prior to using this instruction. The results will be the traditional diffuse component in Dest0.y:

Dest0.y = id = ( N · L )

 

And the traditional specular component (i.e. Blinn’s eqn.) in Dest0.z

 

Dest0.z = is = ( N · H ) ms

 

Note that there are no k parameters in the equations. If you need them you’ll have to do the multiplication in your shader.

 

Setup:

Source0.x should contain the normalized dot product between the normal and the direction from the vertex to the light. Source0.y should contain the normalized dot product between the normal and the half angle vector. Source0.w should contain the power value in the range -128 to +128. Source0.z is ignored.

 

Results:

Dest0.x and Dest0.w are set to 1. If Source0.x (N · L) is positive, it’s stored in Dest0.y, else Dest0.y is set to 0. If both Source0.x and Source0.y (N · H) are positive then Dest0.z is set to Source0.y raised to the Source0.w power, else is set to zero. The power value (Source0.w) is clamped to the range [–128, 128].

 

Note: Early versions of the SDK documentation incorrectly stated that negative exponential values would cause undefined results.

 

lit      r0,  r1

 

SetSourceRegisters();

 

// Simulate the lit instruction

 

// these are constants

TempReg.x = TempReg.w =  1;

 

     // if N dot L is positive…

     if ( Source0.x > 0 )

{

Dest0.y = Source0.x;

// if N dot H is positive

if ( Source0.y > 0 )

{

 

// clamp the power value to an 8.8

// fixed point representation of the

// maximum allowable value

 

const float kPowerMax = 127.9961f;

float ClampedPower = Source0.w;

if      (ClampedPower < -kPowerMax )

ClampedPower = -kPowerMax;

else if (ClampedPower >  kPowerMax )

ClampedPower = kPowerMax;

 

// actual value in shader math is only

// good to seven fractional bits of

// precision

Dest0.z = pow( Source0.y, ClampedPower );

}

else 

{

Dest0.z = 0; // if N dot H was negative/zero

}

}

else

{

Dest0.y = 0; // if N dot L was negative/zero

}

 

WriteDestinationRegisters();

 

log (macro)

 vs 1.0, 1.1, 2.0

 

This macro computes log2 of the input argument in at least 20-bit precision. The absolute value the source register’s w element is used. Unlike the logp instruction, the destination’s w element is not set to 1.

 

Note: Don’t confuse this with the logp instruction.


Takes at least 12 instruction slots


log  Dest0, Source0

 

Computes log2 of the absolute value of the Source0.w element (unless otherwise specified) and places the result into all elements of Dest0. If the argument is equal to zero then all result registers are set to minus infinity. This is somewhat different from the logp instruction, which always sets Dest0.w to 1.

 

log     r0,  r1

 

 

 

 

 

logp

 vs 1.0, 1.1, 2.0

 

Computes log2 with the results being broken into a single10-bit precision part and a higher precision dual-element part. This allows you to use the lower precision single element or use a more complicated integer/fractional calculation if you need higher precision.


One slot


logp  Dest0, Source0

 

Computes low and higher precision values for log2Source0.w.  The destination’s w element is set to 1. Only the source register’s w element is used. If Source0.w is negative then it’s absolute value is used. If Source0.w is zero then the results are negative infinity in Dest0.x and Dest0.z and 1.0 in Dest0.y.

 

You have a choice in which part of the results to use. There low precision part will contain the log2 of the input value to 10-bits of precision.

 

The two-part higher precision part represents the exponent and mantissa. This allows you to use the lower precision single element or use a more complicated exponent/mantissa calculation when you need higher precision.

To use the high precision results you’ll need to provide a function that computes log2 in the range [1,2) with your desired precision. You’d then add this result to the value returned in Dest0.x to get the log2 of your input value.

 

Note: Don’t confuse this with the log macro!

 

Setup:

Store the value you want the log2 of in Source0.w. The value should be positive. The other resister elements are ignored.

 

Results:

Dest0.z contains the low precision (10-bit) single element approximation.

Dest0.x contains the most significant part of the dual-element result. This value can be negative.

Dest0.y contains the mantissa of the dual-element result in the form an exponented value in the range [1,2). You have to do the conversion yourself.

 

logp      r0,  r1

 

// DirectX 8 version

SetSourceRegisters();

 

// Simulate the logp instruction

float v = abs(Source0.w); // only positive values

 

TempReg.y = TempReg.w = 1.0f;

 

if ( 0 == v )

{

TempReg.x = = TempReg.z = MINUS_INFINITY;

}

     else

{

float logValue = (float)(log(v)/log(2));

// store exponent

Dest0.x = (int)::floor( logValue );

// store mantissa, lop off anything more than

// 8 bits of significance

int p = (*(unsigned long*)&v

& 0x7FFFFF | 0x3F800000;

Dest0.y = *(float*)&p;

 

// store low-precision part to 10-bits

unsigned long temp = *(unsigned long*)&logValue;

          Dest0.z = *(float*)& temp & 0xFFFFFF00;

}

 

WriteDestinationRegisters();

 


 

// DirectX 9 version

SetSourceRegisters();

 

// Simulate the logp instruction

float v = abs(Source0.w); // only positive values

float logValue;

 

if ( 0 == v )

{

logValue = MINUS_INFINITY;

}

     else

{

logValue = (float)(log(v)/log(2));

logValue = (int)::floor( logValue );

// store low-precision part to 10-bits

unsigned long temp = *(unsigned long*)&logValue;

          logValue = *(float*)& temp & 0xFFFFFF00;

}

 

TempReg.x = TempReg.y =

TempReg.z = TempReg.w =

LogValue;

 

WriteDestinationRegisters();

 

loop

 vs 2.0

 

The starting point for a loop-endloop block. Iterate over a block of code a number of times.


One slot


loop IntSource0

 

When used with the endloop instruction, creates a block of instruction over which execution can be specified a variable number of times. Each time though the loop the loop counter is incremented by the specified amount. Compare this to the rep instruction, which does not increment the loop counter independently.

 

Setup:

The argument must be integer register. IntSource0.x holds the number of times the loop is to execute. The loop counter register gets incremented at the endloop. The counter can be used to index into the constant register array. IntSource0.y is the initial value for the loop counter register. IntSource0.z specifies the step for the loop counter register. Execution will go to the statement following the matching endloop instruction when IntSource0.x <= 0. The loop counter register, aL, is available inside the loop.

 

You must not nest loops, nor jump neither out of nor into a loop - endloop block. The endloop instruction must precede the loop instruction

 

Results:

The instructions in the loop - endloop block is executed IntSource0.x times with the loop counter getting incremented each time though.

 

loop  i0 // assume i0.x, i0.y, and i0.z are set up

 

// aL is available and incremented each time

// through the loop

 

endloop

 

// simulate the loop instruction

 

// -- loop statement begin

aL  = IntReg0.y; // loop counter

StartLoopOffset: // <- label

 

if ( IntSource0.x <= 0 )

     goto EndLoopOffset;

// -- loop statement end

 

//  section of code between the loop/endloop

 

// -- endloop statement begin

aL += IntSource0.z; // increment loop counter

IntSource0.x--; // decrement

goto StartLoopOffset;

EndLoopOffset: // <- label

// -- endloop statement end

 

 

 

 

lrp (macro)

 vs 2.0

 

Linear interpolation between two registers (a.k.a. “lerp”) using a fraction specified in a third register. This is done on an element-by-element basis.


Two slots.


lrp  Dest0, Source0, Source1, Source2

 

This macro instruction interpolates between two floating point numbers, Source0 and Source1, based upon a third, Source2. When Source2 is zero, Source0 is placed in the destination. When Source2 is one, Source1 is placed in the destination. Values in the [0,1] range interpolate between Source1 and Source2. If the value is outside the range [0,1] the result is indeterminate. Dest0 must be a temporary register, and cannot be the same as Source0 or Source2.

 

The macro expands to the following code:

 

add  Dest0, Source1, -Source2

mad  Dest0, Dest0, Source0, Source2

 

Setup:

Dest0 must be a temporary register, and Source2 should be in the range [0,1].

 

Results:

The value between Source0 and Source1 is interpolated from the value in Source2. The result is written in Dest0.

 

lrp  r0, r1, r2, c14

lrp  r0, r1, r2, c14.x // lrp using a single value

 

SetSourceRegisters();

 

// simulate lrp instruction

TempReg.x =

     Source3.x *(Source1.x – Source2.x) + Source2.x;

TempReg.y =

     Source3.y *(Source1.y – Source2.y) + Source2.y;

TempReg.z =

     Source3.z *(Source1.z – Source2.z) + Source2.z;

TempReg.w =

     Source3.w *(Source1.w – Source2.w) + Source2.w;

 

WriteDestinationRegisters();

 

m3x2 (macro)

 vs 1.0, 1.1, 2.0

 

Matrix 3 by 2. Performs a matrix multiply on the input vector and input matrix and stores the result. This macro is typically used for 2D transformation calculations.


Takes 2 instruction slots


m3x2  Dest0, Source0, Source1

 

Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read, only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused and only the x and y elements of the destination are written.

 

Warning!: Make sure that your Dest0 and Source0 registers aren’t the same registers. It will compile and run but your results will be incorrect.

 

Note: You are not allows to use the swizzle or negate modifiers on Source1.

m3x2   r0,  v0, c6 ; // will use c7 as well

m3x2   r0,  v0, c6.yzxw // Error! Can’t uses swizzle

 

This macro;

 

m3x2   Dest0, Source0, Source1

 

 expands to the following;

 

dp3       Dest0.x, Source0, Source1

dp3       Dest0.y, Source0, Source2

 

Note: You can use the swizzle or negate modifiers if you expand this macro yourself.

 

m3x3 (macro)

 vs 1.0, 1.1, 2.0

 

Matrix 3 by 3. Performs a matrix multiply on the input vector and input matrix and stores the result. This macro is typically used for normal transformations during lighting calculations.


Takes 3 instruction slots


m3x3  Dest0, Source0, Source1

 

Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read, only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused and only the x, y, and z elements of the destination register are written.

 

Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.

 

Note: You are not allows to use the swizzle or negate modifiers on Source1.

 

m3x3   r0,  v0, c6 ; // will use c7 & C8 as well

m3x3   r0,  v0, c6.yzxw // Error! Can’t uses swizzle

 

This macro;

 

m3x3   Dest0, Source0, Source1

 

 expands to the following;

 

dp3       Dest0.x, Source0, Source1

dp3       Dest0.y, Source0, Source2

dp3       Dest0.y, Source0, Source3

 

Note: You can use the swizzle or negate modifiers if you expand this macro yourself.

 

m3x4 (macro)

 vs 1.0, 1.1, 2.0

 

Matrix 3 by 4. Performs a matrix multiply on the input vector and input matrix and stores the result.


Takes 4 instruction slots


m3x4  Dest0, Source0, Source1

 

Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1. Only those elements actually used in the calculations are read, only those that are calculated are written to the destination registers. The w elements in the source matrix and vector are unused.

 

Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.

 

Note: You are not allows to use the swizzle or negate modifiers on Source1.

 

m3x4   r0,  v0, c6 ; // will use c7, C8, & c9 as well

m3x4   r0,  v0, c6.yzxw // Error! Can’t uses swizzle

 

This macro;

 

m3x4   Dest0, Source0, Source1

 

 expands to the following;

 

dp3       Dest0.x, Source0, Source1

dp3       Dest0.y, Source0, Source2

dp3       Dest0.y, Source0, Source3

dp3       Dest0.y, Source0, Source4

 

Note: You can use the swizzle or negate modifiers if you expand this macro yourself.

 

m4x3 (macro)

 vs 1.0, 1.1, 2.0

 

Matrix 4 by 3. Performs a matrix multiply on the input vector and input matrix and stores the result.


Takes 3 instruction slots


m4x3  Dest0, Source0, Source1

 

Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1.  All source register elements are used, but Dest0.w will be unmodified.

 

Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.

 

Note: You are not allows to use the swizzle or negate modifiers on Source1.

 

m4x3   r0,  v0, c6 ; // will use c7 & C8 as well

m4x3   r0,  v0, c6.yzxw // Error! Can’t uses swizzle

 

This macro;

 

m4x3   Dest0, Source0, Source1

 

 expands to the following;

 

dp4       Dest0.x, Source0, Source1

dp4       Dest0.y, Source0, Source2

dp4       Dest0.z, Source0, Source3

 

Note: You can use the swizzle or negate modifiers if you expand this macro yourself.

 

m4x4 (macro)

 vs 1.0, 1.1, 2.0

 

Matrix 4 by 4. Performs a matrix multiply on the input vector and input matrix and stores the result.


Takes 4 instruction slots


m4x4  Dest0, Source0, Source1

 

Does a matrix multiply assuming that Source0 is the input vector and the matrix starts at element Source1 and that there are the correct number of registers available after Source1.  All source register elements are used, and all destination register will be written.

 

Warning!: Make sure that your Dest0 and Source0 registers are different. It will compile but your results will be incorrect.

 

Note: You are not allows to use the swizzle or negate modifiers on Source1.

 

m4x4   r0,  v0, c6 ; // will use c7, c8, & C9 as well

m4x4   r0,  v0, c6.yzxw // Error! Can’t uses swizzle

 

This macro;

 

m4x4   Dest0, Source0, Source1

 

 expands to the following;

 

dp4       Dest0.x, Source0, Source1

dp4       Dest0.y, Source0, Source2

dp4       Dest0.z, Source0, Source3

dp4       Dest0.w, Source0, Source4

 

Note: You can use the swizzle or negate modifiers if you expand this macro yourself.

 

 

mad

 vs 1.0, 1.1, 2.0

 

Multiply and add. Multiplies two registers and then adds a third to the result then stores the result.


One slot


mad  Dest0, Source0, Source1, Source2

 

Multiplies Source0 by Source1, then adds Source2 to the result. The result is stored in Dest0.

 

Setup::

Source0 and Source1 are the registers to be multiplied. Source2 is the register to be added to the result of the multiplication.

 

Results:

Dest0 contains (Source0 * Source1) + Source2.

 

mad      r0,  r0,  r1, r2

 

SetSourceRegisters();

 

// Simulate the mad instruction

TempReg.x = Source0.x * Source0.x + Source1.x;

TempReg.y = Source0.y * Source0.y + Source1.y;

TempReg.z = Source0.z * Source0.z + Source1.z;

TempReg.w = Source0.w * Source0.w + Source1.w;

 

WriteDestinationRegisters();

 

max

 vs 1.0, 1.1, 2.0

 

Stores the maximum value from comparing two source registers into the destination register.


One slot


max  Dest0, Source0, Source1

 

Finds the maximum between elements of Source0 and Source1, then stores the results in elements Dest0. The resulting register may not be equal to either input register.

 

Setup:

Source0 and Source1 are the registers to be compared.

 

Results:

Dest0.z contains the maximum of the two input registers done on an element-by-element basis.

 

max      r0,  r1, r2

 

SetSourceRegisters();

 

// Simulate the max instruction

TempReg.x = Source0.x > Source0.x ?

Source0.x : Source0.x;

TempReg.y = Source0.y > Source0.y ?

Source0.y : Source0.y;

TempReg.z = Source0.z > Source0.z ?

Source0.z : Source0.z;

TempReg.w = Source0.w > Source0.w ?

Source0.w : Source0.w;

 

WriteDestinationRegisters();

 

min

 vs 1.0, 1.1, 2.0

 

Stores the minimum value from comparing two source registers into the destination register.


One slot


min  Dest0, Source0, Source1

 

Finds the minimum between elements of Source0 and Source1, then stores the results in elements Dest0. The resulting register may not be equal to either input register.

 

Setup:

Source0 and Source1 are the registers to be compared.

 

Results:

Dest0 contains the minimum of the two input registers done on an element-by-element basis.

 

min      r0,  r1, r2

 

SetSourceRegisters();

 

// Simulate the min instruction

TempReg.x = Source0.x < Source0.x ?

Source0.x : Source0.x;

TempReg.y = Source0.y < Source0.y ?

Source0.y : Source0.y;

TempReg.z = Source0.z < Source0.z ?

Source0.z : Source0.z;

TempReg.w = Source0.w < Source0.w ?

Source0.w : Source0.w;

 

WriteDestinationRegisters();

 

mov

 vs 1.0 - 2.0

 

Stores the source registers into the destination register. Useful for moving from a temporary register into an output register or for swizzling. The source and destination registers can be the same.

 

The mov instruction is the only instruction that can use the address register as a destination and only in vertex shaders version 1.1 or later. If the address register is the destination, then the value is rounded to the integer value that is less than or equal to the initial value. In VS 2.0 you must use the mova instruction to set the address register.


One slot


mov  Dest0, Source0

 

Moves Source0 into Dest0.  A special case is when the Dest0 is an address register. In this case the value stored is the closest integer values that is less than the initial value. This means that it rounds the number towards negative infinity. Thus 1.5 would get stored as 1, while –1.5 would get stored as –2. In both cases the values stored is the integer value that’s closest and less than the initial value.

 

Setup:

Source0 is the register to be copied.

 

Results:

Dest0 contains a copy of Source0, unless it’s the address register, in which case nearest integer value that’s less than or equal to the initial value in the register. If the destination is the address register, then, unless otherwise specified, only the Source0.x register is used.

 

mov  r0   ,  r1

mov  a0.x ,  c1.w // initializing address register

 

SetSourceRegisters();

 

// Simulate the mov instruction

if ( Source0 == a0 ) // it’s the address register

{

// use only integer part

TempReg.x = (int)::floor( Source0.x );

}

else

{

TempReg.x = Source0.x;

TempReg.y = Source0.y;

TempReg.z = Source0.z;

TempReg.w = Source0.w;

}

 

WriteDestinationRegisters();

 

mova

 vs 2.0

 

Move data from a floating point register into the address register.


One slot


mova  Dest0, Source0

 

This instruction rounds Source0 to the nearest integer and places the result in Dest0. Dest0 must be the address register. Rounding is to nearest even, though this is not exactly specified and applications should nor rely on this behavior. That is, for values equidistant between two integers, some implementations may round up, down, or randomly pick a direction. The _sat modifier is not supported.

 

Setup:

Source0 is the floating point register to be rounded then placed in the address register.

 

Results:

The rounded value from Source0 is placed in the address register.

 

mova a0.x,  c1.w // move one element

mova a0,    c1   // move all

 

SetSourceRegisters();

 

// use only integer part

// Note: RoundToNearestInteger() is

// Implementation dependant

a0.x = RoundToNearestInteger( Source0.x );

a0.y = RoundToNearestInteger( Source0.y );

a0.z = RoundToNearestInteger( Source0.z );

a0.w = RoundToNearestInteger( Source0.w );

 

WriteDestinationRegisters();

 

mul

 vs 1.0, 1.1, 2.0

 

Multiplies the two source registers element by element and stores them in the destination register.


One slot


mul  Dest0, Source0, Source1

 

Multiplies Source0 by Source1 and stores the result in Dest0.

 

Setup::

Source0 and Source1 are the two registers to be multiplied.

 

Results:

Dest0 contains the result of the multiplication of Source0 and Source1.

 

mul      r0,  r1,  r2

 

SetSourceRegisters();

 

// Simulate the mul instruction

TempReg.x = Source0.x * Source0.x;

TempReg.y = Source0.y * Source0.y;

TempReg.z = Source0.z * Source0.z;

TempReg.w = Source0.w * Source0.w;

 

WriteDestinationRegisters();

nop

 vs 1.0 - 2.0

 

Defines the null instruction (No-Operation).


No slots, no time


nop

 

You might use this instruction for timing. You can use it to create a shader that does nothing but take up time as it executes to see how a shader of that length would affect your rendering. It’s possible that a driver might optimize away this instruction.

 

Setup:

None.

 

Results:

Takes up one slot/clock cycle.

 

nop 

 

 

 

nrm (macro)

 vs 2.0

 

This macro will normalize all elements of a register.


Takes three slots


nrm  Dest0, Source0

 

This macro will take all elements of Source0 and normalize them so that the square root of the sum of squares of all elements in Dest0 is one. Dest0 cannot be the same register as Source0.

 

 

nrm   r0,  v0

 

This macro;

 

nrm   Dest0, Source0

 

 is equivalent to the following;

 

dp4  Dest0.x, Source0

rsq  Dest0.x, Dest.x

mul  Dest0, Source0, Dest0.x

 

 

 

pow  (macro)

 vs 2.0

 

Computes the power function for a scalar value.


Takes 3 slots


pow  Dest0, Source0, Source1

 

Only the .w element of the source registers are used. Only the absolute value of the Source0 is used. Dest0 is filled with abs(Source0.x) raised to the Source1.x power. The result is replicated in all elements of the destination.

 

pow  r0, r3, c6 // assume r3.x and c6.x are set

 

This macro;

 

pow   Dest0, Source0, Source1

 

 is equivalent to the following;

 

log  Dest0.w, Source0 // takes absolute value

mul  Dest0.w, Dest0.w, Source1.w

exp  Dest0, Dest0.w

 

 

 

 

rcp

 vs 1.0, 1.1, 2.0

 

Computes the reciprocal of an element of the source register and stores it in the destination register.


One slot


rcp  Dest0, Source0

 

Computes the reciprocal of a single element of the source register and stores it in all elements of the destination register. Only one element of the source is used. If no element is specified then Source0.w is used. A value of exactly 1 on input returns 1 on output (no round-off error) while a value of 0 on input returns positive infinity;

 

Setup:

Source0 contains the element take the reciprocal of. If unspecified, Source0.w is used.

 

Results:

Dest0 contains the reciprocal of the specified element copied in all elements.

.

Performance Note: This is one of the few instructions that will take more than one clock to execute. Use it sparingly, and when you use it try to arrange your code so that you don’t need the results immediately.

 

rcp      r0,  r1

 

SetSourceRegisters();

 

// Simulate the rcp instruction

if ( 0.0f == Source0.w ) // if 0

{

TempReg.w = PLUS_INFINITY;

}

else if ( 1.0f == Source0.w == 1 ) // if 1

{

TempReg.w = 1.0f;

}

else

{

TempReg.w = 1.0f/Source0.w;

}

 

TempReg.x = TempReg.y = TempReg.z = TempReg.w;

 

WriteDestinationRegisters();

rep

 vs 2.0

 

Indicates the start of a rep-endrep block.


One slot


rep IntSource0

 

IntSource0 must be an integer register. Only the .x element is used. The maximum initial value can be 255. Execution over the block will continue for IntSource0.x times, as long as the number is positive. Compare this to the loop instruction, which additionally increments over the loop counter independently.

 

Setup:

IntSource0 must be an integer register with the .x element initialized to the number of times to iterate through the block.

 

Results:

The instructions in the rep - endrep block is executed IntSource0.x times.

 

defi i0, 10, 0, 0, 0  // i0.x is set to the count

 

rep   i0

 

  // the instructions here will get executed i0.x times

 

endrep

 

// Simulate the rep instruction

int LoopCounter = IntReg0.x;

if (LoopCounter <= 0 ) goto EndLoop

 

 

// the instructions following the loop

// instruction would go here

 

// Simulate endloop instruction

aL += IntReg0.z;

LoopCounter--;

goto TopLoop;

EndLoop:

 

 

 

ret

 vs 2.0

 

Indicates the end of a subroutine.


One slot


ret

 

This instruction will return to the calling instruction (a call or callnz instruction) or return from the main function.

 

Setup:

Returns to the address following the most recent call or callnz instruction, or returns from the main function.

 

Results:

The path of execution is changed to the next instruction on the instruction stack.

 

ret  

 

 

 

 

 

rsq

 vs 1.0, 1.1, 2.0

 

Computes the reciprocal square root of one element of the source register and stores it in all elements of the destination register.


One slot


rsq  Dest0, Source0

 

Computes the reciprocal square root of the specified element of the source register and stores it all elements of the destination register. If no element is specified then Source0.w is used. The absolute value of the input is used. A value of exactly 1 on input returns 1 on output (no round-off) while a value of 0 on input returns positive infinity;

 

Setup:

Source0 contains the element take the reciprocal  square root of. If unspecified, Source0.w is used.

 

Results:

Dest0 contains the reciprocal square root of the absolute value of the specified element copied in all elements.

.

Performance Note: This is one of the few instructions that will take more than one clock to execute. Use it sparingly, and when you use it try to arrange your code so that you don’t need the results immediately.

 

rsq      r0,  r1

 

SetSourceRegisters();

 

// Simulate the rsq instruction

float v = abs(Source0.w);

if ( 0.0f == v ) // if 0

{

TempReg.w = PLUS_INFINITY;

}

else if ( 1.0f == v ) // if 1

{

TempReg.w = 1.0f;

}

else

{

TempReg.w = 1.0f/sqrt(v);

}

 

 

TempReg.x = TempReg.y = TempReg.z = TempReg.w;

 

WriteDestinationRegisters();

 

 

sge

 vs 1.0, 1.1, 2.0

 

Set Greater-than or Equal-to. Stores 1 in the destination register if the first source register is greater or equal to the second source register. If not it stores 0 in the destination register. Does an element-by-element comparison.


One slot


sge  Dest0, Source0, Source1

 

Compares the two source registers element by element. If the first source register’s element is greater or equal to the second source register’s element , the value 1 is placed in the destination register’s element. If not it stores 0 in the destination register’s element. The resulting register may not be equal to either input register.

 

Setup:

Source0 and Source1 are the registers to be compared.

 

Results:

The element Dest0.n contains 1.0 if the Source0.n is greater than or equal to Source1.n, otherwise it contains 0.0. This is done for all elements of Dest0.

 

sge      r0,  r1,  r2

 

SetSourceRegisters();

 

// Simulate the sge instruction

TempReg.x = Source0.x >= Source0.x ?

1.0f : 0.0f;

TempReg.y = Source0.y >= Source0.y ?

1.0f : 0.0f;

TempReg.z = Source0.z >= Source0.z ?

1.0f : 0.0f;

TempReg.w = Source0.w >= Source0.w ?

1.0f : 0.0f;

 

WriteDestinationRegisters();

 

sgn (macro)

 vs 2.0

 

Computes the sign of each element in a register.


Takes 3 slots


sgn  Dest0, Source0, Source1, Source2

 

Computes the sign of the elements of Source0, using two temporary scratch registers. All elements of the source registers are compared. The comparison is done element-by-element. Source1 and Source2 should be temporary registers and should not be the same.   If an element in Source0 was > 0 then the corresponding element in Dest0 will be 1. If it was <0 then the result will be –1. If it was 0 the result will be 0.

 

Note: Source1 and Source2 will be modified after this macro!

 

sgn  r3,  r1, r2

 

This macro;

 

sgn  Dest0, Source0, Source1, Source2

 

 is equivalent to the following;

 

slt Source1,  Source0, -Source0

slt Source2, -Source0,  Source0

add Dest0, Source2, -Source1

 

 

 

sincos (macro)

 vs 2.0

 

Computes the sine and cosine values for a scalar argument.


Takes 8 slots


sincos  Dest0, Source0, Source1, Source2

 

Estimates the sine and cosine value inside a shader with a maximum error of 0.002 through the use of a Taylor series expansion. Source0 must have a replicate swizzle to indicate which element to use. This should be a value in radians between ±π. Dest0 should be a temporary register. The destination must have .x, .y or .xy as a write mask.

 

Setup:

One element of Source0 has to have the value in radians. Source1 and Source2 have to be set up with the following values to perform the expansion.

 

Source1 = [ 1/(7!*128), 1/(6!*64), 1/(4!*16), 1/(5!*16)  ]

Source2 = [ 1/(3!*8), 1/(2!*8), 1, 0.5 ]

 

Results:

The resulting sine and cosine values are written in Dest0.x and Dest0.y respectively.

 

// setup values

def c1, 1.0f/(7!*128),1.0f /(6!*64),

            1.ff/(4!*16), 1.0f/(5!*16)

def c2, 1.0f/(3!*8), 1.0f/(2!*8), 1.0f, 0.5f

 

// assume value to take sin/cos of is in r0.x

sincos  r0.xy,  r0.x, c1, c2

 

 

slt

 vs 1.0, 1.1, 2.0

 

Set Less-Than. Stores 1 in the destination register if the first source register is less than the second source register. If not it stores 0 in the destination register. Does an element-by-element comparison.


One slot


slt  Dest0, Source0, Source1

 

Compares the two source registers element by element. If the first source register’s element is less than the second source register’s element , the value 1 is placed in the destination register’s element. If not it stores 0 in the destination register’s element. The resulting register may not be equal to either input register.

 

Setup:

Source0 and Source1 are the registers to be compared.

 

Results:

The element Dest0.n contains 1.0 if the Source0.n is less than Source1.n, otherwise it contains 0.0. This is done for all elements of Dest0.

 

slt      r0,  r1,  r2

 

SetSourceRegisters();

 

// Simulate the slt instruction

TempReg.x = Source0.x < Source0.x ?

1.0f : 0.0f;

TempReg.y = Source0.y < Source0.y ?

1.0f : 0.0f;

TempReg.z = Source0.z < Source0.z ?

1.0f : 0.0f;

TempReg.w = Source0.w < Source0.w ?

1.0f : 0.0f;

 

WriteDestinationRegisters();

 

sub

 vs 1.0, 1.1

 

Subtracts two sources into the destination register.


One slot


sub  Dest0, Source0, Source1

 

Subtracts the Source0 and Source0 registers and places the result in the Dest0 register.

 

Setup:

Two source registers, Source0 and Source1.

 

Results:

Each element of Dest0 is filled with the element-by-element subtraction of the elements of Source1 from Source0.

 

sub  r0, r0, c2

 

SetSourceRegisters();

 

// Simulate the sub instruction

TempReg.x = Source0.x - Source1.x;

TempReg.y = Source0.y - Source1.y;

    TempReg.z = Source0.z - Source1.z;

TempReg.w = Source0.w - Source1.w;

 

WriteDestinationRegisters();

vs

 vs 1.0, 1.1, 2.0

 

Defines the version of the vertex shader code you are using.


No Slots


vs.integer1.integer2

 

The argument is of the form vs.x.y, where x is the main version number and y is the minor version number. Both values are integers.

 

Setup:

Two integers that form the major.minor version of the shader version you want to use. This must be the first instruction in your shader.

 

Results:

Tell the assembler what features to allow in the shader instruction to follow.

 

vs.1.0    //not using the address register in this one

vs.1.1    //uses address register