I've noticed that when you run an XNA app under NVPerfHUD my Model index and vertex buffers get created with D3DUSAGE_SOFTWAREPROCESSING. This is not the case when running normally, and native C++ apps do not exhibit this behavior under NVPerfHud, so I believe it is an XNA issue.
I'm using the PreparingDeviceSettings hook to select the PerfHUD adaptor, and you are required to set it to DeviceType.Reference for PerfHUD to work (which is consistent with the Nvidia documentation).
I wonder if this is XNA assuming that a reference device does not support hardware vertex processing -- even though the caps say it does.
This really diminishes the usefulness of perfhud since the timings are all off -- I will see draw calls taking 5 or 6 ms of CPU time which would not happen if the vertex processing was done on the GPU.
Anyone know a workaround or a way to force XNA to use hardware vertex processing under perfhud?