Quote: "Odd it runs at 250fps here with pixel shader 2.0 and 350 with 1.4 0_o DBP's blendmapping runs at 350 so I'll be using 1.4 as my card seems to like the older shaders. Is there any way to allow it to be set by an option?"
Odd, I'll see what happens on my machine (my GFX card is a GeForce FX 5200 128MB) - and look at the two versions in FX Composer (it tells you the pixel processing rate for different cards which is useful). I wouldn't have expected such a large difference between 2.0 and 1.4. Is your GFX card info in your signature up-to-date?
The usual way of allowing the option to use 2.0 or 1.4, etc, is to have two techniques which have identical code except in the compile instructions, e.g.
technique t0
{ pass p0
{ VertexShader = compile vs_1_1 VShader();
PixelShader = compile ps_1_4 PShader();
alphablendenable = false;
blendop = add;
srcBlend = srcAlpha;
destBlend = invSrcAlpha;
cullmode = none;
}
}
technique t1
{ pass p0
{ VertexShader = compile vs_1_1 VShader();
PixelShader = compile ps_2_0 PShader();
alphablendenable = false;
blendop = add;
srcBlend = srcAlpha;
destBlend = invSrcAlpha;
cullmode = none;
}
}
You can choose which to use by, for example,
set effect technique effectNum, "t0". I believe DarkBASIC will use the first technique that is supported by your graphics card - but that doesn't guarantee that it will use the best when there is a choice.
Sometimes the extra techniques for different "compile targets" need different pixel or vertex shader code - or extra passes, etc. This can happen with complicated shaders which exceed the instruction count for lower VS or PS versions.
Quote: "The shader works great now if the stage 0 and stage 1 UV's are matched up. Would it be possible to modify it to work when they don't"
Yes - and you told me how to set up the DBP side of things at the beginning of this thread!
What you need to do in the shader is something like:
struct VS_INPUT
{ float4 Pos : POSITION;
float2 UV0 : TEXCOORD0; // stage 0 coords
float2 UV1 : TEXCOORD1; // stage 1 coords
};
struct VS_OUTPUT
{ float4 Pos : POSITION;
float2 UV0 : TEXCOORD0; // stage 0 coords
float2 UV1 : TEXCOORD1; // stage 1 coords
};
VS_OUTPUT VShader(VS_INPUT In, VS_OUTPUT Out)
{ Out.Pos = mul(In.Pos, wvp);
Out.UV0 = In.UV0;
Out.UV1 = In.UV1;
return Out;
}
struct PS_INPUT
{ float2 UV0 : TEXCOORD0; // stage 0 coords
float2 UV1 : TEXCOORD1; // stage 1 coords
};
struct PS_OUTPUT { float4 col : COLOR; };
PS_OUTPUT PShader(PS_INPUT In, PS_OUTPUT Out)
{ float4 baseColour = tex2D(lightMapSample, In.UV0);
baseColour *= tex2D(baseSample, In.UV1) * 2.0;
Alternatively, if you are just using tiling, you can pass an extra float4 parameter to the shader such as UVscale (the final two entries need to be set - but won't be used in the shader) and multiply the UV coords by these when you do the look-up, e.g.
tex2D(baseSample, In.UV1 * UVscale)
You need an extra float2 declaration in the shader - but I guess you can sort that out.
I'll go and check the fps thing now.
Edit: Just done the tests. Get similar results to you (but slower for both): 190 for PS1.4, and 168 for PS2.0. FX Composer shows that, for this shader, the same pixel shader code compiles to fewer asm instructions using 1.4 compared to 2.0. FXC also gives the pixel throughput as 800MPS for 1.4 and 267 for 2.0 which is a big difference - we don't see that of course because of the other things going on in the system when we run the program.
Looking at the pixel shader asm code, the gain seems to be that the two multiplications are combined into a single instruction (i.e. multiply and shift) in the PS1.4 version. This suggests to me that you won't get that gain if you multiply by, say, 3 instead of by 2. Will experiment now. It could of course be a compiler inefficiency - something else to look into.
Isn't computing fun?
2nd Edit: I was wrong - the two versions still give 800MPS and 267MPS. The PS1.4 asm code is different - and longer. But by some GFX card magic still takes the same number of machine cycles (just one!). With a more complicated pixel shader you might need PS2.0 because of the instruction count limits. I'll have to watch this issue when I'm writing my shaders - I tend to routinely use PS2.0 because of its extra features, but this example shows it's worth stopping to think every now and then.
Incidentally, for simple blending a texture with a lightmap you don't need the alphablending stuff in the code - I merely included that to show you could get alpha transparency as well. You just need, for example:
{ pass p0
{ VertexShader = compile vs_1_1 VShader();
PixelShader = compile ps_2_0 PShader();
}
}