XNA Creators Club Online
Page 2 of 2 (42 items) < Previous 1 2
Sort Posts: Previous Next

How fast is it?

Last post 1/16/2008 4:19 PM by Jim Perry. 41 replies.
  • 12/20/2007 11:34 AM In reply to
    • (717)
    • premium membership
    • Posts 179

    Re: How fast is it?

    being a professional C# programmer since .Net 1.0 Beta I can tell you a few tidbits about Managed code and C# compared to native C++.

    When working with out-of-the-box namespaces in C#, you come across a lot of iterators.  Iterators create clones of objects (a pointer object pointing to the real object) to be used in loops and such.  By eliminating the need for LinkedList<T> and the like, you can achieve code speeds that are very close to C++.  You can go even further if you know what you are doing and give into the world of pointers in C# to make it EQUALLY AS FAST AS C++.

    I've written a Motion Capture Detection system using C# and in C++ (originally in C++) and I acheieve the SAME performance in C# (using pointers in unsafe code) as I do in C++.

    Like many people have stated above. The speed decrease in Managed code lies in the fact that .Net has to get a pointer to the object to be used in the Managed Memory pools.  Getting these pointers can be costly for high-performance code and would be better left to you.

    A List<Model> is slower to iterate through than an    unsafe { List<Model*> } however, you can really damage your system if you don't take care of your pointer objects or mess up on pointer math.  Use with caution.  This is how I am able to manage over 4,000 models at ~1200 poly each on screen in my game engine (running at 900fps with vsync off and all that jazz).
  • 12/20/2007 2:20 PM In reply to

    Re: How fast is it?

    As Aeon hinted above, the issue with managed code is not the general case, it is rather the specific cases where the .NET runtime is not as optimized as some would wish it were.  In the general case, you're not going to see much of a difference, if any.  IL is compiled to machine code, after all, and the CLR does not do a bad job in the general case of generating the machine code.  However, you do get differences in special cases, like floating-point vector operations.  I have yet to see the Desktop CLR generate proper vector code for operations such as 4x4 matrix multiplication. 

    For example, the optimized JIT'ed code for XNA's Matrix.Multiply will use fmul and faddp on x86 to do the multiplication operation-by-operation (disassembly given by cordbg).  Clearly, properly vectorized SSE instructions would perform better (see table below), but there are a variety of reasons the CLR does not do this (yet), the greatest of which is the time vs. benefit ratio.  Very rarely will you see traditional .NET programs bottlenecked by floating-point performance.  Games and other math-heavy applications are an exception here, but they are the minority in the managed world.  The majority of .NET applications are business applications that are much more concerned with productivity and reliability than vectorized math operations.  So why should the CLR team cater to the minority?

    This is where native languages like C++ score their big wins in special-case performance.  The Visual Studio 2005/2008 C++ compiler can somewhat vectorize code like 4x4 matrix multiplication and outperform CLR-generated code.  The big win, however, is in programmer control.  When the C++ compiler fails to generate sufficiently fast code, the programmer can drop down into intrinsics or even inline assembly to squeeze the last bit of performance out of the code.  For instance, you can rewrite the matrix multiplication routine with hand-optimized code using SSE2 intrinsics and score a substantial win.  The big difference here between the managed world and the native world is that this is impossible in the managed world.  A programmer cannot drop down to a lower-level to help the CLR generate better code.  What is generated for you is all you get in the managed world.  In other words, you're at the mercy of the run-time code generator.

    That said, I'm not trying to bash .NET or C# or any managed language.  I use XNA and .NET for some of my work, and I enjoy using it.  It's fun and productive, but you also have to use it for what it is and know its limitations, just like with any language/product.  Unless you're rewriting Havok and are knowledgable in x86 architecture and instruction set, chances are you will write C# code that is just as fast as C++, and be more productive doing it.  That's the big win for the majority of .NET developers who really don't care and don't need to know what's going on internally.  Writing GUI front-ends and database interaction back-ends wil practically never require these sorts of low-level optimizations.


    To give some concrete numbers to the vector math case, I compared XNA's Matrix.Multiply (in .NET) to what the Visual Studio 2008 C++ compiler will generate.  The (native) C++ routine I used is a direct copy of what Reflector gives as MC++ output for XNA's Matrix.Multiply to make the comparison as accurate as possible.  I then wrote the matrix multiplication using SSE2 intrinsics, including a version in C++/CLI.  The times below are the total running time over 10,000,000 successive multiplies, where the same matrices are used each iteration to ensure cache hits and remove the dependency on possible memory access latency differences.
    Machine: Core 2 Duo E6600 clocked at 3.0GHz
    Managed XNA - Matrix.Multiply(ref,ref,out): 557ms
    Managed w/ P/Invoke to SSE2 function: 341ms
    Native C++ - Full Compiler Optimizations: 300ms
    Native C++ - hand-optimized SSE2: 138ms

    Again, the point of this test if not to "prove" that C#/.NET is slower than C++ in the general case!  The point is to show that there do exist special cases where native code clearly and substantially outperforms managed code.  It is up to the developer/team to decide if the performance requirements of the program are better suited to C# (which is very productive and fast in the general case) or C++ (which can be less productive, but gives more opportunity for low-level optimizations).

    There are more special cases than just the vector math case, but the important thing to remember from all this is that the cases where you will notice a speed difference and can achieve significantly better performance in C++ are just that: special cases.  In general, you will not notice the difference in everyday usage. 


    For the purpose of this discussion, the Xbox CLR has been intentionally left out.  We are all aware of the performance issues regarding it, and just bringing them up again is an exercise in futility.

    Microsoft DirectX/XNA MVP
  • 1/15/2008 11:56 PM In reply to

    Re: How fast is it?

    There is also a HUGE HUUUUUGE speed penalty for compiling in debug mode, as the JIT compiler will take over and drastically slow performance, so it's important to run in release mode!  (I had an octree color-palettizer that took up to 3 minutes in debug mode and less than a second inrelease mode... due to recursion!)
  • 1/16/2008 12:02 AM In reply to

    Re: How fast is it?

    True, though that applies to C++ and C#.  Neither are optimized when compiled (either directly or with the CLR) with debug settings.
    Microsoft DirectX/XNA MVP
  • 1/16/2008 12:27 AM In reply to

    Re: How fast is it?

    I've been doing "managed" code since the Java 1.02 Sdk, and C# since before it went public.

    There was a time back in the day, when they said that java MIGHT some day be only 5 times slower than c++.

    That old thinking all changed when Microsoft released the IE4 Virtual Machine.  That vm revolutionized what was possible with managed code.  It included the first real JIT, and it made it possible to perform low level primitive functions at essentially the same speed as C/C++.  After that point, Java would remain slower, not so much because of fundamental limitations, of which the lack of primitives was one of the real ones, but mainly because of poor api design.

    Back in that time-frame, I had "challenges" with several C++ developers to write the fastest application that would reverse words in a file.  My app in java was significantly faster (because my algorithm was better.)  Once they implemented the same algorithm, the C++ app was then only 10% faster (due to less IO overhead).  Maximum raw speed doesn't really matter UNTIL you reach the optimal algorithm.

    As of today, with C# you have the ability to perform low level primitive operations on floats/ints/arrays etc that are basically just as fast as doing the same thing in C or C++.

    It's not really fair to compare "hand optimized SSE" code to C# performance on matrix multiplication.  "hand optimized SSE" is realy processor specific assembly language, and has NOTHING to do with the language of C++.  None of the SSE operations are defined in the C++ standards.  It just so happens that C++ has a better ability to call assembly instructions.  (Managed C++ can do the same thing btw.)

    Do this:  Implement your own matrix multiplication code in C#, and then implement it again in C++ using pointers or whatever, but NO assembly instructions or SSE extensions, and see what the performance difference is.


    Anyways, all of this is pretty much MOOT, since all of the real pro coding is now done in HLSL anyways.


  • 1/16/2008 1:06 AM In reply to

    Re: How fast is it?


    You guys are smart. I can't wait till I can actually contribute to these converstations. :p

    How far along in the industry in years are you guys? Please don't tell me you are a bunch of 23 year olds. :(
    Try Not! Do, or Do Not. There is no try.
  • 1/16/2008 1:08 AM In reply to

    Re: How fast is it?

    I don't know about the other guys, but I'm 14.
  • 1/16/2008 1:34 AM In reply to

    Re: How fast is it?

    What kind of school do you go to where you learn all this stuff at 14? When I was 14 I was too distracted by the female population in my class to get much of anything done haha.

    I swear, my little brother is learning stuff in 5th grade that I was taught in 10th.
    Try Not! Do, or Do Not. There is no try.
  • 1/16/2008 9:07 AM In reply to

    Re: How fast is it?

    Heh, my 7 year old son is learning stuff in second grade that I don't remember learning until several years later. Of course, he's also been using a computer since kindergarten and I didn't start using one until high school (way back in the 80s). :(

    That's just the way the world's changed. I expect in another decade every newborn will be given a laptop. :D

    Oh, and I'm probably the old man around here at 41 (although I think Z-Man is catching up to me ;) ).

    Jim Perry - Microsoft XNA MVP
    If people spent a minute searching the forums and reading the FAQs before posting I'd be out of a job.
    Got some XNA Game Studio/XNA Framework development info to share with the community? Put it on the XNA Wiki.
    Please mark posts as Answers or Good Feedback when appropriate.
  • 1/16/2008 10:32 AM In reply to

    Re: How fast is it?

    Machaira:

    Oh, and I'm probably the old man around here at 41 (although I think Z-Man is catching up to me ;) ).

    Nope. 45 here.

     

  • 1/16/2008 11:02 AM In reply to

    Re: How fast is it?

    Kris:

    It's not really fair to compare "hand optimized SSE" code to C# performance on matrix multiplication.  "hand optimized SSE" is realy processor specific assembly language, and has NOTHING to do with the language of C++.  None of the SSE operations are defined in the C++ standards.  It just so happens that C++ has a better ability to call assembly instructions.  (Managed C++ can do the same thing btw.)


    How is it not fair to make this comparison?  I'm showing something you can do in C++ that you cannot do in C#.  Using SSE intrinsics is processor-specific, yes, but that doesn't matter.  The point is this is something C++ can do that C# cannot do.

    I think you missed the point that I was trying to make.  I said I wasn't comparing C++ to C# in terms of overall "power."  I was stating an example of a case where the low-level power of C++ can be helpful.  C++ allows you to drop down into assembly (which is a part of the language), and C# does not.  Does this make C++ a better language for everyone and everything?  No.  It's just an example.  If you don't need that power, then don't use it.

    Processor functionality like SIMD instructions are not a part of any language.  They are just instructions used by compilers/programmers to generate faster code.  They're no more or less processor-specific then mov, push, or pop.

    If you want to be technical about it, I didn't even use real inline assembly.  I used compiler intrinsics, which are definitely a part of the C++ language.  The compiler just so happens to treat these "function calls" as directives to emit specific instructions.

    Yes, Managed C++ can do the same thing, at a cost.  In fact, I showed just that with the P/Invoke measurement.  That was C# code calling into a Managed C++ multiplier using intrinsics (and special data types to get the proper memory alignment).  The result?  341 ms, slower than just letting the .NET CLR do it naively (yes, I meant naive, not native) due to the P/Invoke costs.

    Kris:

    Do this:  Implement your own matrix multiplication code in C#, and then implement it again in C++ using pointers or whatever, but NO assembly instructions or SSE extensions, and see what the performance difference is.


    That's almost exactly what I did!  The only difference is that I used XNA's Matrix.Multiply routine instead of my own, which is a pretty standard 4x4 multiplier.  The exact same code (as shown by Reflector, ignoring slight syntax differences) was used between the C# and compiler-optimized C++ versions.  The best I could get out of the .NET run-time was 557 ms, while the Visual C++ 2008 compiler was able to do the same in 300 ms, with the compiler/linker configured to maximize speed optimizations.  Exact same code.

    But that's just differences in compilers.  The whole point was that C++ allows you to go further and hand-optimize the code with assembly/intrinsics for your target platform.  The result?  138 ms.

    These measurements were taken on a Core 2 Duo E6600.  All execution was done on the command-line outside of any IDE, and all binaries were built in release mode.

    Again, I am stressing the fact that this is not meant as "proof" of C++'s superiority!  It is merely an example of it's strengths over C# and managed languages in general.  There are also strengths of C# over C++.  It's important to keep your perspective on these issues.

    Microsoft DirectX/XNA MVP
  • 1/16/2008 11:04 AM In reply to

    Re: How fast is it?

    Kris:

    Anyways, all of this is pretty much MOOT, since all of the real pro coding is now done in HLSL anyways.


    Only in the graphics domain.
    Microsoft DirectX/XNA MVP
  • 1/16/2008 11:06 AM In reply to

    Re: How fast is it?

    ArcaneDreams:

    How far along in the industry in years are you guys? Please don't tell me you are a bunch of 23 year olds. :(


    I'm 22.

    I'm not "in" the games industry.  I'm a student, and game programming is just a hobby.  At least for right now.


    Microsoft DirectX/XNA MVP
  • 1/16/2008 1:26 PM In reply to

    Re: How fast is it?

    ArcaneDreams:

    You guys are smart. I can't wait till I can actually contribute to these converstations. :p

    How far along in the industry in years are you guys? Please don't tell me you are a bunch of 23 year olds. :(

    I'm 25, just got out of college, only been in the industry about half a year.

    XNA QuickStart Engine (3D Game Engine for XNA) | My site
    "I'll be whatever I want to do!", Philip J. Fry
  • 1/16/2008 1:35 PM In reply to

    Re: How fast is it?

    Shaw-  How well do those SSE instructions execute on the XBox 360?








    btw- No, I'm not really 14.  But I'm teaching my 9 year old Son how to program, and use Modo right now, so he should be where I'm at now by the time he's 14. :0
  • 1/16/2008 4:03 PM In reply to

    Re: How fast is it?

    Kris:

    Shaw-  How well do those SSE instructions execute on the XBox 360?


    On Xbox, you use Altivec/VMX.  But you can't do that without a dev kit.  It's the same deal, with C++ you can with C# you cannot.
    Microsoft DirectX/XNA MVP
  • 1/16/2008 4:19 PM In reply to

    Re: How fast is it?

    David Hunt:
    Machaira:

    Oh, and I'm probably the old man around here at 41 (although I think Z-Man is catching up to me ;) ).

    Nope. 45 here.

    Yea!!!!

     * Machaira feels better about himself

    Jim Perry - Microsoft XNA MVP
    If people spent a minute searching the forums and reading the FAQs before posting I'd be out of a job.
    Got some XNA Game Studio/XNA Framework development info to share with the community? Put it on the XNA Wiki.
    Please mark posts as Answers or Good Feedback when appropriate.
Page 2 of 2 (42 items) < Previous 1 2 Previous Next