XNA Creators Club Online
Page 1 of 1 (12 items)
Sort Posts: Previous Next

Idea for GPU Physics

Last post 04/11/2009 22:03 by skytigercube. 11 replies.
  • 03/11/2009 19:58

    Idea for GPU Physics

    Targetting XBOX360

    has anybody used this approach?

    LOOP {

    - render frame to texture F
    - render F to backbuffer
    - process physics into texture P
    - present

    - render F to backbuffer
    - present

    - render F to backbuffer
    - present

    - getdata on texture P

    }

    the goal being:

    - reduce rendering load on CPU and GPU to minimum
    - keep physics fps high as possible
    - keep framebuffer actual update fps to around 30
    - avoid pipeline stall by waiting 3 frames before using getdata

    would this work?
  • 03/11/2009 20:32 In reply to

    Re: Idea for GPU Physics

    I don't understand what you mean by rendering to a texture followed by present. The present operation means "make the current contents of the backbuffer visible on the screen", so that isn't a meaningful thing when you are drawing to a surface other than the backbuffer, and especially not when the stuff you are drawing represents physics data rather than graphics.

    But yes, the longer you wait after rendering something before you read it back to the CPU, the less change of a stall. The maximum degree of pipeline latency from CPU to GPU is determined by how long you wait, so if you read back immediately, you force zero latency and thus get zero parallelism. If you wait a couple of frames before forcing synchronisation, you allow the GPU to run at most a couple of frames behind the CPU.
    XNA Framework Developer - blog - homepage
  • 03/11/2009 20:52 In reply to

    Re: Idea for GPU Physics

    Hi Shawn

    by "present" I mean whatever XNA Framework does after the Draw() method has finished

    which I guess is this (copied from MDX Device and SwapChain metadata):

            // 
            // Summary: 
            //     Presents the display with the contents of the next buffer in the sequence 
            //     of back buffers owned by the device / swapchain. 
            public void Present(); 
     

    when you say "the longer you wait" this must be measured by number of frames
    depending on how many back buffers in the swapchain xna is using behind the scenes?

    the point being that once a given frame has made it (down the swapchain) to the framebuffer - all the associated textures will be ready for reading

    What I am trying to ascertain is:

    - how many back buffers in the xna xbox360 swapchain? can I safely assume 3?
    - that calling GetData() after the texture is completely rendered etc. will not result in a pipeline stall


  • 03/11/2009 21:23 In reply to

    Re: Idea for GPU Physics

    skytigercube:
    by "present" I mean whatever XNA Framework does after the Draw() method has finished


    Sure. I don't understand what that has to do with GPU physics, though. Present is all about making graphics appear on the screen. When you are doing GPU physics, this is irrelevant and you would never call it.

    Likewise the concept of a swapchain is also about making graphics appear on the screen, and therefore irrelevant to GPU physics.

    What matters in this context is that there is a command buffer being passed from CPU to GPU, which contains some sequence of actions involving some combination of textures, buffers, rendertargets, shaders, etc. The GPU processes these commands in order at some time after the CPU submits them. If the CPU tries to access a resource while there are outstanding references to it in the not-yet-processed part of the command buffer, you get a stall. The only thing that matters here is what commands are in the buffer, and how far behind the GPU is in processing these commands: the concepts of swapchain and present have nothing to do with this.
    XNA Framework Developer - blog - homepage
  • 03/11/2009 22:24 In reply to

    Re: Idea for GPU Physics



    As I understand it:

    the commands that I will send to the GPU regarding physics will be associated with the current backbuffer

    in the same way that calling Clear() will clear the current backbuffer

    and the only guarantee available is that by the time that backbuffer is resolved and swapped to the frontbuffer

    those commands *will* have been executed

    although the driver/gpu might see there is no actual dependency between the endresult of the backbuffer and my physics commands
    it cannot defer execution of those commands till after the associated backbuffer has been swapped - because that would be crazy

    also there is never anyway to guarantee the commands are executed *before* the associated backbuffer has been swapped to the front

    that is why (I believe) the execution of the DirectX Present() function is relevant to my physics processing

    If I was using DirectX maybe I could setup a separate swapchain purely for physics - with a swaprate much higher than for rendering

    But in XNA my only choice is to use every frame for physics and 1 in N frames for rendering

    Which leads me back round (like a swapchain) to the two critical pieces of information I need:

    - how many backbuffers in the xna xbox360 swapchain
    - will reading back a (fully resolved on the gpu) texture result in stalling or slowdowns

    here is a picture

    -- frame 1 draw() method --------------------

    swapchain = {1, 2, 3} current back buffer = 1, 3 is visible

    render to physics texture
    render to texture
    render texture to backbuffer

    -- frame 2 draw() method --------------------

    swapchain = {3, 1, 2} current back buffer = 3, 2 is visible (gpu *may* have started processing my physics commands)
    render texture to backbuffer

    -- frame 3 draw() method --------------------

    swapchain = {2, 3, 1} current back buffer = 2, 1 is visible *** (my physics texture has been resolved!)
    render texture to backbuffer

    -- frame 4 draw() method --------------------

    swapchain = {1, 2, 3} current back buffer = 1, 3 is visible
    read from physics texture
    process physics
    render to physics texture
    render to texture
    render texture to backbuffer

    -- etc. --------------------

    [EDIT] maybe this explains it better:

    Using two backbuffer textures F1 F2 
    and two physics textures P1 P2 
     
    BackBuffer  |2|1|1|1|1|2|2|2|2|1|1|1|1|2|2|2|       which backbuffer texture is rendered 
    Render      |1| | | |2| | | |1| | | |2| | | |       when we render to backbuffer texture# 
    PWrite      | |1| | | |2| | | |1| | | |2| | |       which physics texture we write 
    PRead       | | |2| | | |1| | | |2| | | |1| |       which physics texture we read 
    Update      | | | |x| | | |x| | | |x| | | |x|       when we update game state 
     
    this should spread the workload as evenly as possible 
    and ensures plenty of swapchain presents before reading physics data to cpu 
     
     


  • 03/11/2009 23:26 In reply to

    Re: Idea for GPU Physics

    skytigercube:
    also there is never anyway to guarantee the commands are executed *before* the associated backbuffer has been swapped to the front

    that is why (I believe) the execution of the DirectX Present() function is relevant to my physics processing


    That's not how it works.

    The GPU processes a series of drawing commands. It processes these commands in the same order they are submitted (but usually some time later than they are submitted). Some of these commands draw data. Others move data around. Present is just one specific command that makes data visible on the screen. Just like any other command, it is processed in order whenever the GPU gets around to it. Commands aren't tied to buffers like you seem to think: they're just processed in order. The backbuffer is not special: it's just a rendertarget similar to any other rendertarget you might create yourself, except with the special property that its contents can be displayed on the screen by the present command.

    So the question is not what buffer you are drawing to or whether you call present or what swapchain you are using. The question is how many commands are in the GPU command buffer, what order they are in, and making sure you leave enough time for the GPU to have finished processing a command before you try to access the output of that command.

    skytigercube:
    - how many backbuffers in the xna xbox360 swapchain


    This is 100% irrelevant to what you are doing.
    XNA Framework Developer - blog - homepage
  • 04/11/2009 0:31 In reply to

    Re: Idea for GPU Physics


    what other approach is there "to make sure you leave enough time for the GPU" than counting frames?

    and the number of frames you count == number of backbuffers in the swapchain

    Are you suggesting that I wait a number of milliseconds?

    In one of your gpu particle articles (with the circular buffer) you mentioned waiting 2 frames "to ensure the gpu has finished reading that part of the texture"

    Is that not the exact same logic I am describing here?

  • 04/11/2009 1:11 In reply to

    Re: Idea for GPU Physics

    There isn't any scientific way to know exactly how long to wait, because you never know exactly how far behind the CPU the GPU is running. This is entirely determined by the GPU hardware, driver, and what workload you are giving it.

    So you're basically just picking an arbitrary amount of time, waiting that long, and crossing your fingers that you waited long enough. If it turns out this wasn't long enough for your particular hardware/driver/workload, you'll get a stall, in which case you need to wait longer to fix that.

    Whether you measure the wait time in frames, seconds, CPU instructions, etc, is irrelevant since the amount of time you are waiting was an arbitrary choice in the first place.

    Yes, this means it is not possible to read back data from GPU to CPU in a timely fashion while maintaining good CPU/GPU parallelism. You have three choices: either don't read back data, or accept that your data will be significantly late, or give up on parallelism and take the performance hit of a pipeline stall.

    XNA Framework Developer - blog - homepage
  • 04/11/2009 11:51 In reply to

    Re: Idea for GPU Physics

    in DirectX the Present() function will block until the swap is complete (swap of next backbuffer to frontbuffer)

    that means if there is work outstanding before the next backbuffer can be swapped to the front your CPU will block on Present()

    that is why a good trick in DX is to issue the gpu commands, do some other work, then call Present()

    this increases the probability that Present() will not block because the backbuffer (that is next to be swapped to the front)
    is ready to be swapped because all the commands in the queue have been processed

    in other words the CPU is always synced to the GPU based on the number of backbuffers in the swapchain

    Quote from DX Docs:
    To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call IDirect3DDevice9::EndScene as far ahead of calling present as possible
    because EndScene() flushes commands to the queue and Present() blocks until the commands (for the next present that will be visible) are processed

    Quote from another forum:
    Present() is often used as a throttle - it can't flip the buffers until any *pending* draw operations are out of the queue, and it'll also stop the application getting too far ahead (usually 2-3 frames ahead is the most you'll be allowed)...

    The reason this stops you adding commands to the gpu command queue is:

    - until Present() returns you can not call BeginScene()
    - without a call to BeginScene() you can not draw anything

    even if you use Present with DONOTWAIT option you still have to call it repeatedly until it stops returning WASSTILLDRAWING

    To sum up:

    Present() forces a synchronization between the CPU and GPU for the *next* frame to be swapped to the *front*

    which means if the number of backbuffers is 3 you need to wait 3 frames before you readback your texture
    to ensure it is resolved

    The number of backbuffers is likely to be 2 or 3 or 4 maximum


  • 04/11/2009 15:55 In reply to

    Re: Idea for GPU Physics

    I wrote a test application with a simulated physics and rendering load
    only tested on pc (athlon 3870)

    re-rendering the scene once every 3 frames to a texture

    rendering the texture to the backbuffer every frame

    and calling GetData() after 3 frames has passed

    So far so good

    the biggest bottleneck is still GetData() at between 10 and 25 milliseconds

    but with a target of 25/30 fps that is OK (40 milliseconds to 33.333 milliseconds available)

    and I get around 150 fps which equates to 150 / 3 = 50 fps for rendering

    frame 0 = render scene to texture
    frame 1 = read physics data
    frame 2 = write physics data

    (every frame also renders the scene texture to the backbuffer)

    if I combine everything into a single frame - I get around 20 fps = dead duck


  • 04/11/2009 16:35 In reply to

    Re: Idea for GPU Physics

    Answer
    Reply Quote
    skytigercube:
    in DirectX the Present() function will block until the swap is complete (swap of next backbuffer to frontbuffer)


    This is not correct. The Present function kicks off the D3D command buffer to the driver, and blocks until the driver reports that it has scheduled the swap. That doesn't mean the swap has actually taken place yet! Drivers can and do defer the actual command processing in order to maximize parallelism.

    skytigercube:
    that is why a good trick in DX is to issue the gpu commands, do some other work, then call Present()


    I'm afraid you are mistaken. This would be a good technique if the graphics runtime used a significantly shorter pipeline than the amount of work being issued per frame, as was the case on eg. Nintendo GameCube, but this is not the case for DirectX. When the pipeline is deep enough to hold one or more entire frames worth of drawing commands, so the GPU can run an entire frame or more behind the CPU, it makes no difference how you interleave your rendering commands with other CPU work.

    skytigercube:
    Quote from DX Docs:
    To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call IDirect3DDevice9::EndScene as far ahead of calling present as possible
    because EndScene() flushes commands to the queue and Present() blocks until the commands (for the next present that will be visible) are processed


    Heh. This was probably true sometime roundabout the turn of the century, but it is no longer correct. These days, BeginScene/EndScene are a no-op. They are completely ignored by every modern driver. The last people who tried to make use of them as a driver hint were PowerVR with their Kyro tiled architecture, but they had so many compatibility issues as a result of games calling them incorrectly, they ended up ignoring these hints and automatically inferring when to resolve tiles by analyzing rendertarget usage patterns instead. Every subsequent driver/hardware combo that I know of has either not needed this concept at all (most Windows GPUs), or figured out what to do automatically (eg. XNA Framework resolve behaviors on Xbox).


    XNA Framework Developer - blog - homepage
  • 04/11/2009 22:03 In reply to

    Re: Idea for GPU Physics

    OK

    I see what you are driving at ...

    ... the number of frames I need to wait can vary depending on the workload, regardless of the swapchain configuration because of the command queue

    I can write my application to wait N frames, test it on the XBOX and if it works in the wild - I am lucky!

    I could use an occlusion query to detect when the physics texture has been rendered

    Is the precise point in time the occlusion query transitions to IsComplete = true documented?

    Thanks for your efforts to educate me :-)

Page 1 of 1 (12 items) Previous Next