I'm trying to localize sounds for 2-channel (left/right) headphones. After looking at some binaural recordings, it's apparent that 5.1 surround sound headphones are an unnecessary gimmic, as the human auditory system uses a number of audio cues to localize sounds with just 2-channels.
Two approaches for producing such 3D audio are:
1. Record with binaural microphones (one placed inside each ear or the ears of a mannequin head), capturing the sound as it would be heard if one were actually there. This I would call the brute force approach; the auditory analogue of motion capture.
2. Use HRTF (head-related transfer functions) and position data to localize monaural sounds synthetically.
What, if any, HRTF data does XNA's audio system use when localizing sounds with Cue.Apply3D.
Does or can XNA itself distinguish between 2-channel speakers and headphones when rendering 3D
audio?
If not, is it able to send position data to the sound card driver so the driver can use its own HRTF data for localizing sounds correctly for 2-channel headphones?
It seems to me like XNA is not capable of doing either of the above, and the results are not impressive, because I can't tell whether a sound is coming from in front of me or behind me. If a proper HRTF was used anywhere at all between the C# code and my eardrums, the audio
should sound noticeably different. If XNA doesn't use HRTF, then it probably should. Also, it really should have settings for targeting headphones specifically rather than stereo speakers, unless of course all this information is passed to the sound card driver and handled properly.
The most realistic 3D sounds you'll ever hear, besides actually being there, are going to come from in-ear 2-channel headphones. Example:
http://youtube.com/watch?v=wT1XuB95qMk (use in-ear headphones or it will suck)
External speakers, unfortunately, have "sweet spots", and 5.1 and even 7.1 surround is cheezy and sounds flat if your outside the sweet spot. It's also lazy to handle 3D audio by simply balancing the sound across 5 or 7 speakers, and I certainly hope such an approach is never used for 2-speakers (that would be stereo, the stuff that sounds like its inside your head rather than all around you). All modern games should use HRTFs to properly localize sounds for 2-channel headphones. Binaural recording's won't cut it, because characters are always spinning around relative to sound sources (unless of course you record the sound a hundred times from various positions, and cue/transition to the correct recording as the player turns while keeping them all synchronized).