XNA Creators Club Online
Page 1 of 2 (30 items) 1 2 Next >
Sort Posts: Previous Next

Vector2 maths slow on 360

Last post 07-06-2008 10:14 PM by Bruno Evangelista. 29 replies.
  • 06-30-2008 10:17 PM

    Vector2 maths slow on 360

    Hi,

    I've been having speed issues running my physics code on the 360, on PC it works just fine. I ended up disabling everything and finding just adding a few Vector2s will bring the console to a halt. for example...

     

                        Vector2 a = new Vector2(0, 0);
                        Vector2 b = new Vector2(1,1);
                        for(int i=0; i<200; i++)

                                a += b;

     Any ideas on this?

     

     
  • 06-30-2008 10:58 PM In reply to

    Re: Vector2 maths slow on 360

    There are two main reasons this code executes slowly:


    1. The Xbox CLR is downright horrible with floating-point code.  Expect an order of magnitude difference between PC and Xbox when working with C#.
    2. Overloaded operators incur significant overhead on the Xbox CLR (and on the Desktop CLR to an extent).  Where a good C++ compiler will have no trouble inlining that operator, the Xbox JIT'er will make that a function call to the Vector2::op_Addition method, which causes not only a subroutine call but copies to be made of both operands.  The result is calculated in the method and a copy of the result is returned from the method.  This copy is then copied to your 'a' variable.  Notice the frequent copying, just to add two vectors. :)


    My advice would be to manually inline the operator.  It's very easy and clean in this case, just "a.x = a.x + b.x; a.y = a.y + b.y".  This will eliminate point 2 above, and just leave you with point 1.

    Microsoft DirectX/XNA MVP
  • 06-30-2008 11:12 PM In reply to

    Re: Vector2 maths slow on 360

    200 seems to be too few to cause a real problem, unless you do it each frame for each of 100 objects.

    However, yes, the Xbox CLR JIT is really poor at generating code for operator overloads, and is also pretty poor at floating point code, and it is also poor at passing structs as arguments or return values.


    Jon Watte, Direct3D MVP kW X-port 3ds Max .X exporter kW Animation source code
  • 07-01-2008 4:12 PM In reply to

    Re: Vector2 maths slow on 360

    While reading GDC 2008 "Understanding XNA Framework Performance" slides I found that methods and operators perform identically (when not using references), like the example below:

    // Both perform identically
    Position = Vector3.Add(Position, Velocity);
    Position += Velocity;

    Does it only applies to Windows?

    Bruno Evangelista,
    Homepage | XNAnimation
    Ziggyware Article: Playing Nice Animations with XNA
    For what will it profit a man if he gains the whole world and forfeits his soul? Or what will a man give in exchange for his soul?, Matthew 16:26
  • 07-01-2008 4:38 PM In reply to

    Re: Vector2 maths slow on 360

    Bruno Evangelista:
    Does it only applies to Windows?


    No, it applies on Xbox as well.


    Position += Velocity 
     
    compiles to: 
     
    Position = Vector3.op_Addition(Position, Velocity) 


    As you can see, both result in a static method call. While the Desktop CLR may decide to inline that (probably not), the Xbox CLR will not.

    To see why these static non-reference methods can lead to poor performance, look at the difference between memory operations between the method call and the manually inlined version:


    • Position += Velocity
      • Copy Position to temporary variable.
      • Copy Velocity to temporary variable.
      • Call Vector3::op_Addition
        • Create new Vector3 instance
        • Add two parameters and place result in new Vector3 instance
        • Copy the new Vector3 instance into a temporary return variable
      • Copy returned temporary variable into Position.
    • Manually inlined
      • Add Position.X and Velocity.X and place in Position.X
      • Repeat for Y and Z


    You can see where the big win here is.  Now is Position and Velocity are properties, add four more memory copies to both.
    Microsoft DirectX/XNA MVP
  • 07-01-2008 5:05 PM In reply to

    Re: Vector2 maths slow on 360

    Interesting, looking at your "code diagram" a not inlined code looks really bad, more than I thought.

    ShawMishrak:
    You can see where the big win here is.  Now is Position and Velocity are properties, add four more memory copies to both.
    And this looks even worse.  Why it can't inline simple things like "get { return field; }".. =(

    It would be great if someone create a "math code inline tool" that gets your code and outputs it with all the math inlined! =D

    Bruno Evangelista,
    Homepage | XNAnimation
    Ziggyware Article: Playing Nice Animations with XNA
    For what will it profit a man if he gains the whole world and forfeits his soul? Or what will a man give in exchange for his soul?, Matthew 16:26
  • 07-01-2008 5:44 PM In reply to

    Re: Vector2 maths slow on 360

    To put this in perspective, take the following code:



            Matrix a = new Matrix(); 
            Matrix b = new Matrix(); 
     
            public Matrix A 
            { 
                get { return a; } 
                set { a = value; } 
            } 
     
            public Matrix B 
            { 
                get { return b; } 
                set { b = value; } 
            } 
     
            public void Func() 
            { 
                A *= B; 
            } 
     
     


    The Func() method does a simple matrix-multiply using properties and operators. Now look at the generated x86 assembly (this is from the Desktop CLR with optimizations enabled [mode JitOptimizations 1 in CorDbg]):


    046:             A *= B; 
    (cordbg) dis 100 
    Function WindowsGame1.Game1.Func (code starts at 0x5063010). 
    Offsets are relative to function start. 
     [0000] push        ebp 
     [0001] mov         ebp,esp 
     [0003] push        edi 
     [0004] push        esi 
     [0005] sub         esp,0C0h 
     [000b] mov         esi,ecx 
     [000d] cmp         dword ptr ds:[00152E1Ch],0 
     [0014] je          00000007 
     [0016] call        750C5321 
    *[001b] mov         edi,esi 
     [001d] lea         edx,[ebp-48h] 
     [0020] mov         ecx,esi 
     [0022] call        dword ptr ds:[00D90F74h]     ;;;  A's Getter 
     [0028] lea         edx,[ebp+FFFFFF78h] 
     [002e] mov         ecx,esi 
     [0030] call        dword ptr ds:[00D90F7Ch]     ;;;  B's Getter 
     [0036] lea         eax,[ebp-48h] 
     [0039] sub         esp,40h 
     [003c] movq        xmm0,mmword ptr [eax]        ;;; Copy matrix to send to operator 
     [0040] movq        mmword ptr [esp],xmm0 
     [0045] movq        xmm0,mmword ptr [eax+8] 
     [004a] movq        mmword ptr [esp+8],xmm0 
     [0050] movq        xmm0,mmword ptr [eax+10h] 
     [0055] movq        mmword ptr [esp+10h],xmm0 
     [005b] movq        xmm0,mmword ptr [eax+18h] 
     [0060] movq        mmword ptr [esp+18h],xmm0 
     [0066] movq        xmm0,mmword ptr [eax+20h] 
     [006b] movq        mmword ptr [esp+20h],xmm0 
     [0071] movq        xmm0,mmword ptr [eax+28h] 
     [0076] movq        mmword ptr [esp+28h],xmm0 
     [007c] movq        xmm0,mmword ptr [eax+30h] 
     [0081] movq        mmword ptr [esp+30h],xmm0 
     [0087] movq        xmm0,mmword ptr [eax+38h] 
     [008c] movq        mmword ptr [esp+38h],xmm0 
     [0092] lea         eax,[ebp+FFFFFF78h] 
     [0098] sub         esp,40h 
     [009b] movq        xmm0,mmword ptr [eax]        ;;;  Copy other matrix to send to operator 
     [009f] movq        mmword ptr [esp],xmm0 
     [00a4] movq        xmm0,mmword ptr [eax+8] 
     [00a9] movq        mmword ptr [esp+8],xmm0 
     [00af] movq        xmm0,mmword ptr [eax+10h] 
     [00b4] movq        mmword ptr [esp+10h],xmm0 
     [00ba] movq        xmm0,mmword ptr [eax+18h] 
     [00bf] movq        mmword ptr [esp+18h],xmm0 
     [00c5] movq        xmm0,mmword ptr [eax+20h] 
     [00ca] movq        mmword ptr [esp+20h],xmm0 
     [00d0] movq        xmm0,mmword ptr [eax+28h] 
     [00d5] movq        mmword ptr [esp+28h],xmm0 
     [00db] movq        xmm0,mmword ptr [eax+30h] 
     [00e0] movq        mmword ptr [esp+30h],xmm0 
     [00e6] movq        xmm0,mmword ptr [eax+38h] 
     [00eb] movq        mmword ptr [esp+38h],xmm0 
     [00f1] lea         ecx,[ebp+FFFFFF38h] 
     [00f7] call        dword ptr ds:[00D90EA8h]     ;;;  Matrix.op_Multiply() 
     [00fd] lea         eax,[ebp+FFFFFF38h] 
     [0103] sub         esp,40h 
     [0106] movq        xmm0,mmword ptr [eax]        ;;;  Copy return value into temporary 
     [010a] movq        mmword ptr [esp],xmm0 
     [010f] movq        xmm0,mmword ptr [eax+8] 
     [0114] movq        mmword ptr [esp+8],xmm0 
     [011a] movq        xmm0,mmword ptr [eax+10h] 
     [011f] movq        mmword ptr [esp+10h],xmm0 
     [0125] movq        xmm0,mmword ptr [eax+18h] 
     [012a] movq        mmword ptr [esp+18h],xmm0 
     [0130] movq        xmm0,mmword ptr [eax+20h] 
     [0135] movq        mmword ptr [esp+20h],xmm0 
     [013b] movq        xmm0,mmword ptr [eax+28h] 
     [0140] movq        mmword ptr [esp+28h],xmm0 
     [0146] movq        xmm0,mmword ptr [eax+30h] 
     [014b] movq        mmword ptr [esp+30h],xmm0 
     [0151] movq        xmm0,mmword ptr [eax+38h] 
     [0156] movq        mmword ptr [esp+38h],xmm0 
     [015c] mov         ecx,edi 
     [015e] call        dword ptr ds:[00D90F78h]     ;;;  A's Setter 
     [0164] nop 
     [0165] lea         esp,[ebp-8] 
     [0168] pop         esi 
     [0169] pop         edi 
     [016a] pop         ebp 
     [016b] ret 


    Look at all of that copying going on!  Granted, the memory copying is fairly optimized in the getters/setters on the Desktop CLR using the rep instruction prefix, but it's still unnecessary copying.


    Bruno Evangelista:
    It would be great if someone create a "math code inline tool" that gets your code and outputs it with all the math inlined! =D


    It's called a macro processor, commonly found in C++. :)

    You can use #define() macro's in C#, and run the code through the C++ macro processor as a pre-build step.
    Microsoft DirectX/XNA MVP
  • 07-01-2008 6:02 PM In reply to

    Re: Vector2 maths slow on 360

    I found the relevant GameFest 2007 slide.  It is slide 26 of the Costs of Managed Code presentation.

    "inliner does not inline methods with value-type parameters."

    Whether "parameters" include return values, I'm not sure.  But remember that value-type setters have an implicit value-type parameter.

    Microsoft DirectX/XNA MVP
  • 07-01-2008 6:11 PM In reply to

    Re: Vector2 maths slow on 360

    I think another problem is that the JIT doesn't pass arguments in registers. On the x86, that's not so bad, because it is largely stack based anyway, but on the PPC, that really hurts performance a lot because it's register based (and the first 8 function arguments are supposed to go in registers AFAICR).
    Jon Watte, Direct3D MVP kW X-port 3ds Max .X exporter kW Animation source code
  • 07-01-2008 6:41 PM In reply to

    Re: Vector2 maths slow on 360

    First, just a simple question. =)

    ShawMishrak:
     [0022] call        dword ptr ds:[00D90F74h]
    DS is data segment? And call is calling a method on data segment position [00D90F74h]?

     

    ShawMishrak:
    It's called a macro processor, commonly found in C++. :)

    You can use #define() macro's in C#, and run the code through the C++ macro processor as a pre-build step.
    But macros on C++ are not "type safe", right? Also, I didn't know C# support macros. I was thinking in a pre-build step that just inlines everything on the math classes. That would be helpful while the CLR does not inline it.

     

    Bruno Evangelista,
    Homepage | XNAnimation
    Ziggyware Article: Playing Nice Animations with XNA
    For what will it profit a man if he gains the whole world and forfeits his soul? Or what will a man give in exchange for his soul?, Matthew 16:26