Shawn Hargreaves:Possibly... but are you sure this app is GPU limited?
The Xbox has a very fast GPU, so most games tend to be CPU limited. If that is the case, it may not hurt you at all to have the GPU doing redundant calculations, and would actually slow things down if you manually moved that work over to the CPU.
(this is one of the reasons we didn't do the work to implement preshaders: moving computations from GPU to CPU tends not to be a performance win on Xbox)
While I agree with this in some ways, I still feel preshaders can be an overall benefit (especially to new developers who may not realise they have duplicate logic in their shaders) and on the flip side, they can reduce cpu usage with fewer shader constants to copy (although I'd have no idea if that would be a net win)
I have my own shader system, which is built from decompiled Effects. In this I actually convert the PreShaders into .net code :-).
I thought it would be an interesting experiment to actually test to see how significant the differences could be. Now, of course, it would depend if you are cpu or gpu limited, and with the xbox shaders being unified it is tricky to measure without a real world example.
The most common thing I see, is people doing something like this in their shader:
float4x4 mvp = mul(mul(World,View),Projection);
The preshader is obviously going to pull that out.
The test case I setup was very simple, 40 instances of a ~25k triangle sphere with a very simple shader (using the above mvp matrix).
Now, it's my opinion that even on the 360, 40-80 matrix mults per frame is hardly going to have a meaningful impact on performance. I tried my best to create an measurable impact, but I couldn't (pegging the CPU then adding redundant multiplies didn't produce a measurable result until getting up to +400 mults).
So the question was, how much would the preshader help.
With this very, very simple setup, I felt most of the impact would be masked by other bottlenecks, but still, there was an improvement. In the test case I mentioned, the system was 401.3 fps with the preshader (or with a single matrix), and 396.4 fps without the preshader. :-)
I wasn't happy with this. So I tried something else. I pushed the pixel shader until it was the bottleneck. Adding ~150 useless instructions (basicaly cos(cos(cos(cos(... )))) :-) things changed a lot. With the preshader, 385fps. Without the preshader, 314fps. Nearly 20% slower. So at least in this case, the gpu hit far outweighs the cpu hit (yes I know, it doesn't mean a thing if you are cpu limited, but I still thought it was interesting)
:-)
Xen: Graphics API for XNA
www.codeplex.com/xen