XAudio2 is an API aimed at games, so it needs to be able to handle multiple voices better, as in hundreds of voices at a time. Windows Media Player plays one voice, so it can use a much higher quality, but much more CPU intensive, samplerate converter without worrying about CPU issues. That's a possible reason the quality seems different. It's a trade off, and not a small one when you consider the number of voices we expect XAudio2 to be able to handle.
Another reason could simple be that the playback levels are exactly the same, or WMP is doing a bit of sweetening even if every effect seems to be off. We often hear that something played in WMP doesn't sound the same as it does in other apps, regardless of what the other app is. This isn't limited to using XAudio2.
There are a lot of reasons that can cause WMP, or other apps, to sound a bit different that have little to do with the underlying code. Until those are eliminated out we can only speculate.