-
|
|
Vector2 maths slow on 360
|
Hi,
I've been having speed issues running my physics code on the 360, on PC it works just fine. I ended up disabling everything and finding just adding a few Vector2s will bring the console to a halt. for example...
Vector2 a = new Vector2(0, 0);
Vector2 b = new Vector2(1,1);
for(int i=0; i<200; i++)
a += b;
Any ideas on this?
|
|
-
|
|
Re: Vector2 maths slow on 360
|
There are two main reasons this code executes slowly:
- The Xbox CLR is downright horrible with floating-point code. Expect an order of magnitude difference between PC and Xbox when working with C#.
- Overloaded operators incur significant overhead on the Xbox CLR (and on the Desktop CLR to an extent). Where a good C++ compiler will have no trouble inlining that operator, the Xbox JIT'er will make that a function call to the Vector2::op_Addition method, which causes not only a subroutine call but copies to be made of both operands. The result is calculated in the method and a copy of the result is returned from the method. This copy is then copied to your 'a' variable. Notice the frequent copying, just to add two vectors. :)
My advice would be to manually inline the operator. It's very easy and clean in this case, just "a.x = a.x + b.x; a.y = a.y + b.y". This will eliminate point 2 above, and just leave you with point 1.
Microsoft DirectX/XNA MVP
|
|
-
|
|
Re: Vector2 maths slow on 360
|
200 seems to be too few to cause a real problem, unless you do it each frame for each of 100 objects.
However, yes, the Xbox CLR JIT is really poor at generating code for operator overloads, and is also pretty poor at floating point code, and it is also poor at passing structs as arguments or return values.
Jon Watte, Direct3D MVP
kW X-port 3ds Max .X exporter
kW Animation source code
|
|
-
|
|
Re: Vector2 maths slow on 360
|
While reading GDC 2008 "Understanding XNA Framework Performance" slides I found that methods and operators perform identically (when not using references), like the example below:
// Both perform identically
Position = Vector3.Add(Position, Velocity);
Position += Velocity;
Does it only applies to Windows?
Bruno Evangelista, Homepage | XNAnimationZiggyware Article: Playing Nice Animations with XNAFor what will it profit a man if he gains the whole world and forfeits his soul? Or what will a man give in exchange for his soul?, Matthew 16:26
|
|
-
|
|
Re: Vector2 maths slow on 360
|
Bruno Evangelista:Does it only applies to Windows?
No, it applies on Xbox as well.
| Position += Velocity |
| |
| compiles to: |
| |
| Position = Vector3.op_Addition(Position, Velocity) |
As you can see, both result in a static method call. While the Desktop CLR may decide to inline that (probably not), the Xbox CLR will not.
To see why these static non-reference methods can lead to poor performance, look at the difference between memory operations between the method call and the manually inlined version:
- Position += Velocity
- Copy Position to temporary variable.
- Copy Velocity to temporary variable.
- Call Vector3::op_Addition
- Create new Vector3 instance
- Add two parameters and place result in new Vector3 instance
- Copy the new Vector3 instance into a temporary return variable
- Copy returned temporary variable into Position.
- Manually inlined
- Add Position.X and Velocity.X and place in Position.X
- Repeat for Y and Z
You can see where the big win here is. Now is Position and Velocity are properties, add four more memory copies to both.
Microsoft DirectX/XNA MVP
|
|
-
|
|
Re: Vector2 maths slow on 360
|
Interesting, looking at your "code diagram" a not inlined code looks really bad, more than I thought.
ShawMishrak:You can see where the big win here is. Now is Position and Velocity are properties, add four more memory copies to both.
And this looks even worse. Why it can't inline simple things like "get { return field; }".. =(
It would be great if someone create a "math code inline tool" that gets your code and outputs it with all the math inlined! =D
Bruno Evangelista, Homepage | XNAnimationZiggyware Article: Playing Nice Animations with XNAFor what will it profit a man if he gains the whole world and forfeits his soul? Or what will a man give in exchange for his soul?, Matthew 16:26
|
|
-
|
|
Re: Vector2 maths slow on 360
|
To put this in perspective, take the following code:
| Matrix a = new Matrix(); |
| Matrix b = new Matrix(); |
| |
| public Matrix A |
| { |
| get { return a; } |
| set { a = value; } |
| } |
| |
| public Matrix B |
| { |
| get { return b; } |
| set { b = value; } |
| } |
| |
| public void Func() |
| { |
| A *= B; |
| } |
| |
| |
The Func() method does a simple matrix-multiply using properties and operators. Now look at the generated x86 assembly (this is from the Desktop CLR with optimizations enabled [mode JitOptimizations 1 in CorDbg]):
| 046: A *= B; |
| (cordbg) dis 100 |
| Function WindowsGame1.Game1.Func (code starts at 0x5063010). |
| Offsets are relative to function start. |
| [0000] push ebp |
| [0001] mov ebp,esp |
| [0003] push edi |
| [0004] push esi |
| [0005] sub esp,0C0h |
| [000b] mov esi,ecx |
| [000d] cmp dword ptr ds:[00152E1Ch],0 |
| [0014] je 00000007 |
| [0016] call 750C5321 |
| *[001b] mov edi,esi |
| [001d] lea edx,[ebp-48h] |
| [0020] mov ecx,esi |
| [0022] call dword ptr ds:[00D90F74h] ;;; A's Getter |
| [0028] lea edx,[ebp+FFFFFF78h] |
| [002e] mov ecx,esi |
| [0030] call dword ptr ds:[00D90F7Ch] ;;; B's Getter |
| [0036] lea eax,[ebp-48h] |
| [0039] sub esp,40h |
| [003c] movq xmm0,mmword ptr [eax] ;;; Copy matrix to send to operator |
| [0040] movq mmword ptr [esp],xmm0 |
| [0045] movq xmm0,mmword ptr [eax+8] |
| [004a] movq mmword ptr [esp+8],xmm0 |
| [0050] movq xmm0,mmword ptr [eax+10h] |
| [0055] movq mmword ptr [esp+10h],xmm0 |
| [005b] movq xmm0,mmword ptr [eax+18h] |
| [0060] movq mmword ptr [esp+18h],xmm0 |
| [0066] movq xmm0,mmword ptr [eax+20h] |
| [006b] movq mmword ptr [esp+20h],xmm0 |
| [0071] movq xmm0,mmword ptr [eax+28h] |
| [0076] movq mmword ptr [esp+28h],xmm0 |
| [007c] movq xmm0,mmword ptr [eax+30h] |
| [0081] movq mmword ptr [esp+30h],xmm0 |
| [0087] movq xmm0,mmword ptr [eax+38h] |
| [008c] movq mmword ptr [esp+38h],xmm0 |
| [0092] lea eax,[ebp+FFFFFF78h] |
| [0098] sub esp,40h |
| [009b] movq xmm0,mmword ptr [eax] ;;; Copy other matrix to send to operator |
| [009f] movq mmword ptr [esp],xmm0 |
| [00a4] movq xmm0,mmword ptr [eax+8] |
| [00a9] movq mmword ptr [esp+8],xmm0 |
| [00af] movq xmm0,mmword ptr [eax+10h] |
| [00b4] movq mmword ptr [esp+10h],xmm0 |
| [00ba] movq xmm0,mmword ptr [eax+18h] |
| [00bf] movq mmword ptr [esp+18h],xmm0 |
| [00c5] movq xmm0,mmword ptr [eax+20h] |
| [00ca] movq mmword ptr [esp+20h],xmm0 |
| [00d0] movq xmm0,mmword ptr [eax+28h] |
| [00d5] movq mmword ptr [esp+28h],xmm0 |
| [00db] movq xmm0,mmword ptr [eax+30h] |
| [00e0] movq mmword ptr [esp+30h],xmm0 |
| [00e6] movq xmm0,mmword ptr [eax+38h] |
| [00eb] movq mmword ptr [esp+38h],xmm0 |
| [00f1] lea ecx,[ebp+FFFFFF38h] |
| [00f7] call dword ptr ds:[00D90EA8h] ;;; Matrix.op_Multiply() |
| [00fd] lea eax,[ebp+FFFFFF38h] |
| [0103] sub esp,40h |
| [0106] movq xmm0,mmword ptr [eax] ;;; Copy return value into temporary |
| [010a] movq mmword ptr [esp],xmm0 |
| [010f] movq xmm0,mmword ptr [eax+8] |
| [0114] movq mmword ptr [esp+8],xmm0 |
| [011a] movq xmm0,mmword ptr [eax+10h] |
| [011f] movq mmword ptr [esp+10h],xmm0 |
| [0125] movq xmm0,mmword ptr [eax+18h] |
| [012a] movq mmword ptr [esp+18h],xmm0 |
| [0130] movq xmm0,mmword ptr [eax+20h] |
| [0135] movq mmword ptr [esp+20h],xmm0 |
| [013b] movq xmm0,mmword ptr [eax+28h] |
| [0140] movq mmword ptr [esp+28h],xmm0 |
| [0146] movq xmm0,mmword ptr [eax+30h] |
| [014b] movq mmword ptr [esp+30h],xmm0 |
| [0151] movq xmm0,mmword ptr [eax+38h] |
| [0156] movq mmword ptr [esp+38h],xmm0 |
| [015c] mov ecx,edi |
| [015e] call dword ptr ds:[00D90F78h] ;;; A's Setter |
| [0164] nop |
| [0165] lea esp,[ebp-8] |
| [0168] pop esi |
| [0169] pop edi |
| [016a] pop ebp |
| [016b] ret |
Look at all of that copying going on! Granted, the memory copying is fairly optimized in the getters/setters on the Desktop CLR using the rep instruction prefix, but it's still unnecessary copying.
Bruno Evangelista:It would be great if someone create a "math code inline tool" that gets your code and outputs it with all the math inlined! =D
It's called a macro processor, commonly found in C++. :)
You can use #define() macro's in C#, and run the code through the C++ macro processor as a pre-build step.
Microsoft DirectX/XNA MVP
|
|
-
|
|
Re: Vector2 maths slow on 360
|
I found the relevant GameFest 2007 slide. It is slide 26 of the Costs of Managed Code presentation.
"inliner does not inline methods with value-type parameters."
Whether "parameters" include return values, I'm not sure. But remember that value-type setters have an implicit value-type parameter.
Microsoft DirectX/XNA MVP
|
|
-
|
|
Re: Vector2 maths slow on 360
|
I think another problem is that the JIT doesn't pass arguments in registers. On the x86, that's not so bad, because it is largely stack based anyway, but on the PPC, that really hurts performance a lot because it's register based (and the first 8 function arguments are supposed to go in registers AFAICR).
Jon Watte, Direct3D MVP
kW X-port 3ds Max .X exporter
kW Animation source code
|
|
-
|
|
Re: Vector2 maths slow on 360
|
First, just a simple question. =)
ShawMishrak: [0022] call dword ptr ds:[00D90F74h]
DS is data segment? And call is calling a method on data segment position [00D90F74h]?
ShawMishrak:It's called a macro processor, commonly found in C++. :)
You can use #define() macro's in C#, and run the code through the C++ macro processor as a pre-build step.
But macros on C++ are not "type safe", right? Also, I didn't know C# support macros. I was thinking in a pre-build step that just inlines everything on the math classes. That would be helpful while the CLR does not inline it.
Bruno Evangelista, Homepage | XNAnimationZiggyware Article: Playing Nice Animations with XNAFor what will it profit a man if he gains the whole world and forfeits his soul? Or what will a man give in exchange for his soul?, Matthew 16:26
|
|
-
|