XNA’s new VisualizationData class opens up many possibilities in terms of authoring visualizations and (what I’m personally more interested in) creating music-generated gameplay. However, aside from the bundled XNA documentation, there isn’t a lot of information out there on how to use the VisualizationData class in a useful way. I will attempt to address this issue through this thread. I hope to document what I’ve learned about visualization data in a way that makes sense to the musically inclined and otherwise. If time allows, I’ll supplement my findings with code samples, links to information I’ve found around the web, and some video demos on YouTube. In the tradition of NeHe’s OpenGL tutorials that I’m sure many of us grew up on, I’ll try to keep my code as simple and clean as possible so that even novice XNA developers can understand what I’m doing. Also in that tradition, we’ll start with the very basics of getting access to a song’s visualization data and steadily progress to doing some cooler stuff with it. So let’s jump right in!
Getting Started
The VisualizationData class lives under the Microsoft.Xna.Framework.Media namespace. The easiest way to get started playing with visualization data is to play some music from your Windows Media Player library through the MediaPlayer, which we’ll do now. If you want to follow along, this first project is going to take you from an empty project to building your first music visualization. So create a new Windows Game (extending this to the Xbox 360 requres a little more footwork, which we’ll talk about in future posts), and insert the following code where appropriate (remember, this stuff requires XNA 3.0): (EDIT: Can't explain the formatting garbage below.)
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
MicrosoftInternetExplorer4
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
| MediaLibrary mediaLibrary; |
| SongCollection songCollection; |
| VisualizationData visualizationData; |
| |
| public VisualizationDataGame() |
| { |
| mediaLibrary = new MediaLibrary(); |
| visualizationData = new VisualizationData(); |
| songCollection = mediaLibrary.Songs; |
| } |
The MediaLibrary object represents the music that’s been added to your Windows Media Player library. So before this will work, you’ll need to make sure Media Player’s detected some songs. These songs can be MP3s, WMAs, or WAVs, as long as they aren’t DRM’d. (Side note: for some reason, I can’t get MP3’s to play through XNA’s MediaPlayer on my desktop, but they work fine on my laptop. They play fine through actual media players (Windows Media Player, Zune, iTunes, WinAmp, etc.). I’ve talked to the XNA development team, and we haven’t arrived at an explanation for this behavior. But so far, my desktop is the only instance of this happening. So I’d be interested in hearing from anyone else that has this problem.) Anyway, back to the code.
The SongCollection is a list of all of the songs contained in your media library, obtained through the MediaLibrary.Songs property. We’ll use this collection to tell the MediaPlayer what to play soon. VisualizationData is the object that will contain the frequency and sample data of the currently playing song.
| protected override void Initialize() |
| { |
| MediaPlayer.Play(songCollection); |
| } |
In our Initialize function, we simply tell the MediaPlayer (which is a static class, no need to instantiate it) to play all of the songs in our media library from start to finish. All that’s left is to get the visualization data:
| protected override void Update(GameTime gameTime) |
| { |
| MediaPlayer.GetVisualizationData(visualizationData); |
| } |
Our Update function fills the visualizationData object with the currently playing song’s frequency and sample information each frame. We need to call this each frame to keep the data up to date.
Easy stuff, right? If you were to insert this code into a new Windows game and hit the Run button, you’d see the customary cornflower blue screen and the first song in your media library playing. But this thread’s supposed to be about visualizing music, not just listening to it. So let’s get to the fun stuff.
Seeing Sounds
We can go about this one of two ways: I can try my best to explain what the frequency and sample data mean in words and hope you get it. Or we can get something on the screen for us to look at, which should make our discussion of the frequency and sample data much clearer. So let’s get through a little more code, and then we’ll talk about what we’re seeing.
The first thing you’ll want to do is open up your favorite graphics editor (mine’s Paint.NET), and create a 1x1 blank image. I called mine Blank.png. I then created a Textures subdirectory in my Content folder in my project. Place your blank image in this new subdirectory. Now we’re going to declare a new texture in our project:
Simple enough. Next, I like to stick to the common screen resolution of 1280x720. This creates a widescreen viewport on which to display our visualization data. So add the following to the game’s constructor:
| public VisualizationDataGame() |
| { |
| graphics = new GraphicsDeviceManager(this); |
| graphics.PreferredBackBufferWidth = 1280; |
| graphics.PreferredBackBufferHeight = 720; |
| graphics.ApplyChanges(); |
| |
| ... |
| } |
We’ll want to initialize the texture object in the LoadContent function, so it should look something like this:
| protected override void LoadContent() |
| { |
| // Create a new SpriteBatch, which can be used to draw textures |
| spriteBatch = new SpriteBatch(GraphicsDevice); |
| texture = Content.Load<Texture2D>("Textures/Blank"); |
| } |
Finally, our Draw function is where most of the work is being done. What we’re going to do is graph our frequency and sample data on the top and bottom halves of the screen, respectively.
| protected override void Draw(GameTime gameTime) |
| { |
| GraphicsDevice.Clear(Color.Black); |
| Viewport viewport = graphics.GraphicsDevice.Viewport; |
| int x, y, width, height; |
| |
| spriteBatch.Begin(); |
We start this function off by clearing the screen to black and declaring a few variables to make things more readable later. We then tell our sprite batch to begin drawing.
| for(int f = 0; f < visualizationData.Frequencies.Count; f++) |
| { |
| x = viewport.Width * f / visualizationData.Frequencies.Count; |
| y = (int)(viewport.Height / 2 - visualizationData.Frequencies[f] * viewport.Height / 2); |
| width = 1; |
| height = (int)(visualizationData.Frequencies[f] * viewport.Height / 2); |
| spriteBatch.Draw(texture, new Rectangle(x, y, width, height), Color.White); |
| } |
In this first loop, we iterate through each element in our frequency array. For each of these elements, we’re going to draw a lines juxtaposed left-to-right across the screen from halfway down the screen up to that frequency band’s power level (again, I’ll explain all the technical terms soon).( Technically, we’re really drawing lines from each frequency band’s power level down to halfway down the screen due to the way 2D screen coordinates are specified in SpriteBatch’s Draw method… semantics :))
| for (int s = 0; s < visualizationData.Samples.Count; s++) |
| { |
| x = viewport.Width * s / visualizationData.Samples.Count; |
| width = 1; |
| if (visualizationData.Samples[s] > 0.0f) |
| { |
| y = (int)(0.75f * viewport.Height - visualizationData.Samples[s] * viewport.Height / 4); |
| height = (int)(visualizationData.Samples[s] * viewport.Height / 4); |
| } |
| else |
| { |
| y = (int)(0.75f * viewport.Height); |
| height = (int)(-1.0f * visualizationData.Samples[s] * viewport.Height / 4); |
| } |
| spriteBatch.Draw(texture, new Rectangle(x, y, width, height), Color.White); |
| } |
| spriteBatch.End(); |
| |
| base.Draw(gameTime); |
| } |
In this second loop, we iterate through each element in the sample array. However, in this case, sample values can be negative, so we have to do a little extra footwork to draw their lines since you can’t specify a line with a negative height in the SpriteBatch’s Draw function. (Later, when we convert this to 3D, this will be a lot more elegant).
And there you have it, the Hello World of music visualization! If you’ve been following along, you can build your solution and run it to watch little white lines dance along to your music. Once you get over this first sense of accomplishment, you’ll undoubtedly ask yourself what these lines mean. And we’ll explore that next. But for now, take some time to observe the visualization for yourself so that our next discussion makes sense.
If you’re lazy like me, you can download the project here (coming soon). And for the truly lethargic, you can see the results here (also coming soon).
The Science of Sound
So now that we’ve got something to look at, let’s make some sense of this visualization data. But first, a disclaimer: what you are about to read is based largely on observation and speculation. I am not on the XNA development team. I do not claim to know exactly how all this stuff works. If others know more about this stuff than I do, this is where I invite them to chime in. Now, on with the show.
If you stayed awake during Physics class, you’ll remember that sound, in its most basic form, is vibration composed of many frequencies that the human ear can detect. Different sound waves are differentiated by certain properties such as frequency, wavelength, amplitude, and so on. Whenever frequency of a soundwave changes, its difference is reflected in the sound’s pitch. Also, whenever the amplitude of a soundwave changes, its difference is reflected in the sound’s loudness.
Music visualization is driven largely by these changes in pitch and loudness. VisualizationData’s Frequencies and Samples properties give us access to this data. Each of these properties are a collection of 256 floats (so in the example above, we could have done all of the drawing with one for loop, but for readability’s sake, I kept them seperated).
Each collection of sample data gives you a snapshot (at a very low resolution) of the waveform of the currently playing song at that instant in time. The values of these elements range from -1.0 to 1.0. Unfortunately, there’s not a whole lot to do with this raw data that I’ve discovered. The real magic comes from applying a Fast Fourier Transformation (FFT) to compute the frequency components that make up the sound you’re hearing. Computing the FFT involves a lot of math that I have to admit I slept through my senior year in college, but the good news is that we get this for free in the frequency data!
Each element of the frequency data represents a frequency band from 20Hz to 20KHz (the range of sound audible to the human ear). For the less musically inclined, the sounds near the 20Hz range would be low, like a bass drum, a tuba, or the lower notes of a piano (the ones on the left side). The sound near the 20KHz range would be high, like a flute, violin, or the high notes of a piano (the ones on the right side). What complicates things a bit is that the distribution of these bands is logarithmic, which means that elements at the higher end of the spectrum represent more frequencies than those at the lower end.
Each value of these elements (from 0.0 to 1.0) represent the power level of that frequency band. Take a look at this video (coming soon), which is our basic visualizer playing another song. Notice how the lower frequencies (on the left side of the screen) bounce up and down in synch with the bass drum of the beat at the beginning of the song. Similarly, the upper frequencies (on the right side of the screen) bounce up and down in synch with the snare drum of the beat. Cool!
Even if you’ve never heard the Mario Bros. theme song in your entire life (in which case you must not be from around here), you can probably tap your foot to the song in the above video if you have the slightest semblence of rhythm. So if hearing and reacting to a beat is so simple to humans, we should be able to teach our games to hear and react to beats in music as well. This is one of the topics we’ll explore on future posts. Some other topics I hope to discuss include:
- Porting our example to the Xbox 360 (only involves a few extra steps)
- Filtering out some audio data to more closely align what we hear with what we see
- On the same note, emphasizing some audio data for some purpose
- Some more applications of visualization data as visualizers in games
- Some experimental uses of visualization data to drive gameplay
Hopefully this post has been helpful in generating more thought around XNA’s new VisualizationData class. If you’ve been playing around with music visualization, add your experiences to this thread. Stay tuned for more!
Happy Holidays!
Ron