As Aeon hinted above, the issue with managed code is not the general case, it is rather the specific cases where the .NET runtime is not as optimized as some would wish it were. In the general case, you're not going to see much of a difference, if any. IL is compiled to machine code, after all, and the CLR does not do a bad job in the general case of generating the machine code. However, you do get differences in special cases, like floating-point vector operations. I have yet to see the Desktop CLR generate proper vector code for operations such as 4x4 matrix multiplication.
For example, the optimized JIT'ed code for XNA's Matrix.Multiply will use fmul and faddp on x86 to do the multiplication operation-by-operation (disassembly given by cordbg). Clearly, properly vectorized SSE instructions would perform better (see table below), but there are a variety of reasons the CLR does not do this (yet), the greatest of which is the time vs. benefit ratio. Very rarely will you see traditional .NET programs bottlenecked by floating-point performance. Games and other math-heavy applications are an exception here, but they are the minority in the managed world. The majority of .NET applications are business applications that are much more concerned with productivity and reliability than vectorized math operations. So why should the CLR team cater to the minority?
This is where native languages like C++ score their big wins in special-case performance. The Visual Studio 2005/2008 C++ compiler can somewhat vectorize code like 4x4 matrix multiplication and outperform CLR-generated code.
The big win, however, is in programmer control. When the C++ compiler fails to generate sufficiently fast code, the programmer can drop down into intrinsics or even inline assembly to squeeze the last bit of performance out of the code. For instance, you can rewrite the matrix multiplication routine with hand-optimized code using SSE2 intrinsics and score a substantial win. The big difference here between the managed world and the native world is that this is impossible in the managed world. A programmer cannot drop down to a lower-level to help the CLR generate better code. What is generated for you is all you get in the managed world. In other words, you're at the mercy of the run-time code generator.
That said, I'm not trying to bash .NET or C# or any managed language. I use XNA and .NET for some of my work, and I enjoy using it. It's fun and productive, but you also have to use it for what it is and know its limitations, just like with any language/product. Unless you're rewriting Havok and are knowledgable in x86 architecture and instruction set, chances are you will write C# code that is just as fast as C++, and be more productive doing it. That's the big win for the majority of .NET developers who really don't care and don't need to know what's going on internally. Writing GUI front-ends and database interaction back-ends wil practically never require these sorts of low-level optimizations.
To give some concrete numbers to the vector math case, I compared XNA's Matrix.Multiply (in .NET) to what the Visual Studio 2008 C++ compiler will generate. The (native) C++ routine I used is a direct copy of what Reflector gives as MC++ output for XNA's Matrix.Multiply to make the comparison as accurate as possible. I then wrote the matrix multiplication using SSE2 intrinsics, including a version in C++/CLI. The times below are the total running time over 10,000,000 successive multiplies, where the same matrices are used each iteration to ensure cache hits and remove the dependency on possible memory access latency differences.
Machine: Core 2 Duo E6600 clocked at 3.0GHz
Managed XNA - Matrix.Multiply(ref,ref,out): 557ms
Managed w/ P/Invoke to SSE2 function: 341ms
Native C++ - Full Compiler Optimizations: 300ms
Native C++ - hand-optimized SSE2: 138ms
Again, the point of this test if
not to "prove" that C#/.NET is slower than C++ in the general case! The point is to show that there do exist special cases where native code clearly and substantially outperforms managed code. It is up to the developer/team to decide if the performance requirements of the program are better suited to C# (which is very productive and fast in the general case) or C++ (which can be less productive, but gives more opportunity for low-level optimizations).
There are more special cases than just the vector math case, but the important thing to remember from all this is that the cases where you will notice a speed difference and can achieve significantly better performance in C++ are just that: special cases. In general, you will not notice the difference in everyday usage.
For the purpose of this discussion, the Xbox CLR has been intentionally left out. We are all aware of the performance issues regarding it, and just bringing them up again is an exercise in futility.