Well, to be honest it is not clear to me how xna handles audio internally. But I know XAudio2,
and it seems to me that the problem isn't calling Play(...) at an exact time of the beat:
I made a small test where I update every 20th millisecond ( TargetElapsedTime = new TimeSpan(0, 0, 0, 0, 20) )
And call Play() using a 1 sample impulse ...