-
|
|
|
Hello.
My app uses a thread to render and the main thread to update. After see some Gamefest presentations, I am trying to implement multiple buffering. The fact is that when creating the swap-chain with more that 1 buffer (I assume that if introducing 1 as the BufferCount, it is the front-buffer) the performance is quite improved.
So, is that the way to implement multiple-buffering, DXGI just manages it?
|
|
-
|
|
|
There are multiple forms of multi-buffering. The multi-buffering the swap chain does has more to do with preventing rendering to a front buffer (which is typically slow and can produce tearing). Multi-buffering also is used to enable parallelism between multiple processing units. Traditionally, muti-buffering was used to ensure the GPU would be able to work on frame N, while the CPU worked on frame N+1. Now with multicore CPUs, the same technique is being discussed in light of enabling parallelism between CPU cores. DXGI manages the swap chain aspects, while the application must manage the parallelism aspects.
|
|
-
|
|
|
Ok, thank you.
I am interested in enabling parallelism between some threads (the typical Update/Render paradigm). Could you tell me some advice about it?
The fact is that I have researched 'bout this, but every page I found talks about double buffer in DirectX 9. You know, the application updates one frame, while the runtime manages the front buffer, not talking about managing multiple buffers.
|
|
-
|
|
|
I'm unsure what your 'typical' Update/Render paradigm entails, but the concept is rather simple. The Update thread usually hands off data to the Render thread. If you only have one piece of data, then the data is usually protected with synchronization (which would only let one thread manipulate it at a time); and the Update thread would not be able to concurrently generate the data while the Render thread worked on it. The general concept of multibuffering is to have two pieces of data and ping-pong between the two, such that Update can concurrently generate data, while Render works on the previous data. In the past, it has been intuitive to multi-buffer D3D resources, such as textures and queries. But, recently, it has become more popular for people to attempt to multi-buffer the graphics command stream, itself.
|
|
-
|
|
|
Brian Klamik:The general concept of multibuffering is to have two pieces of data and ping-pong between the two, such that Update can concurrently generate data, while Render works on the previous data.n
This was what I refered to. any hint about how to achieve it?
|
|
-
|
|
|
Hi trogo
I wanna implement double buffering..
COuld u tell me hoe i can create a thread for rendering..
thank you
|
|
-
|
|
|
duodevil:Hi trogo
I wanna implement double buffering..
COuld u tell me hoe i can create a thread for rendering..
thank you
Sure.
I have post the code about how the thread is created in pastebin.com. The same about how the threaded function here.
|
|
-
-
- (8305)
-
premium membership
MVP
-
Posts
6,142
|
|
It sounds like you need to double-buffer at the application layer, which has nothing to do with Direct3D.
Your Update function will want to mutate the state of the game (a bunch of combinations of position/orientation/velocity/model/effect information in a scene graph, typically).
Your Render function will want to present a scene based on that same scene graph information.
To avoid a read/write hazard, you have to create a copy of your scene graph data, and hand off to the render thread, for each frame, while the Update function goes on mutating the scene graph in place. Alternatively, you can create two scene graphs, and copy from old to new each frame, but it boils down to the same thing.
So, your update will look something like:
update_loop() { if (!rendering_running) { copy_data_to_renderer(game_state); rendering_kickoff(); } update_one_physics_step(); }
render_loop() { wait_for_kickoff(); render_state(game_state_copy); rendering_running = false; }
rendering_kickoff() { rendering_running = true; set_event(render_event); }
Yes, you'll have to figure out what the proper primitive is for each part of those loops (rendering_running needs to be volatile, you'll need to handle termination requests, etc), but this sketch should show you how it can be made to work at the application level.
Jon Watte, Direct3D MVP Tweets, occasionallykW X-port 3ds Max .X exporter kW Animation source code
|
|
-
|
|
|
OMG! Thank you a lot jwatte.
Nowadays I am restructuring my codebase (switching it to OOP), so it will last some time until I implement this algorithm.
|
|
-
|
|
|
im trying to implement multithreading as well in my application and after reading this post i wonder if im doing it the right way.
im not using multi buffers on the D3D part but working with some self made buffer thingy.
currently im still testing whats the best way.
First of all my render thread using a "renderQue", wich is a list of commands to execute by the render thread.
For example, when i call RenderManager::SetVertexBuffer() it will create a add a new command to the list, rather then directly execute the code.
The command code is pretty simple using derived classes
class IRenderCommand
{
public:
virtual void ExecuteCommand();
};
sample of the SetIndexBuffer code
class CmdSetIB : public IRenderCommand
{
public:
void ExecuteCommand()
{
pD3DDevice->IASetIndexBuffer( pBuffer );
}
public:
ID3D10Device pDevice;
ID3D10Buffer pBuffer;
};
The render thread just loops trough the command list and execute each command.
Currently the update/render thread running at the same speed.
so the update thread does:
- Wait for 'FrameTick' event
- Update renderable object (fill the command que)
- Fire 'BeginRender' event
- Update non renderable objects
- Wait for 'RenderComplete' event
this make my update/render thread run at the same speed.
So i fill my render command list, fire up the render thread (wich should be idle at this point) and process/empty the command list)
Now i wonder, how much will buffering increase my game code?
i could make something like this:
- Wait for 'FrameTick' event
- Update renderable objects (fill the command que)
- Fire 'SwapRenderBuffer' event (this make the render thread to copy the command que)
- Update non renderable object
this would make the render thread keep looping without delays and keep processing the same command que till the 'SwatRenderBuffer' even is called to make copy of the new command list.
so the first method i currently testing with is pretty simple, its just allow the update thread to do some stuff while im rendering instead of waiting for the render code to complete before continue. While the second method would probably have a higher render FPS as update FPS (is this a good thing or asking for problems?)
Next to my update/render thread i also wanna add a paralel thread for the networking system and add some kind of thread pool with background workers for loading data on the background.
so my main questions: Sync threads or use buffering and will render/update at different FPS improve performance or is that a bad idea?
|
|
-
|
|
|
Nightmare:so the update thread does:
- Wait for 'FrameTick' event
- Update renderable object (fill the command que)
- Fire 'BeginRender' event
- Update non renderable objects
- Wait for 'RenderComplete' event
So, with this algorithm, you are using just 1 buffer, isn't?
Nightmare:Now i wonder, how much will buffering increase my game code?
i could make something like this:
- Wait for 'FrameTick' event
- Update renderable objects (fill the command que)
- Fire 'SwapRenderBuffer' event (this make the render thread to copy the command que)
- Update non renderable object
this would make the render thread keep looping without delays and keep processing the same command que till the 'SwatRenderBuffer' even is called to make copy of the new command list.
And, with this one you would use 2 buffer at least.
Nightmare:so the first method i currently testing with is pretty simple, its just allow the update thread to do some stuff while im rendering instead of waiting for the render code to complete before continue. While the second method would probably have a higher render FPS as update FPS (is this a good thing or asking for problems?)
That's it. With multithreading you will be able to do other things while some other thread is rendering. Using 2 buffers is nosense to me if you are not using multithreading. So, as you are using multithreading, multi-buffering should work fine with you.
Multibuffering allows you to make a scene more complex. While rendered-buffer is being drawing, updated-buffer should be updated. And, the scene could become as complex as more and more buffers are added to your engine. Recommended are 2 buffers for the update thread and one for the rendering thread. This way, your render thread can consume an already just processed buffer, while you are working with the other one. All of this while maintaining your fps or even improving it.
Nightmare:Next to my update/render thread i also wanna add a paralel thread for the networking system and add some kind of thread pool with background workers for loading data on the background.
Just test it. But take note that you should not create more hard-working threads than cores are on the system. So, if running a Athlon 64 X2 (for example), you should not create more than 2 threads. Side-note: I am using a Athlon 64 with just 1 core and my engine works fine. Obviously the game using it doesn't use very complex scenes.
Nightmare:so my main questions: Sync threads or use buffering and will render/update at different FPS improve performance or is that a bad idea?
Nope. Allways sync threads. Because if not done this way, you are going to loose control over your FPS rates. The main question here is: sync by using just 1 buffer (no sync needed at all) or sync by using multiple buffers? I tested both of them, and performance improved a lot by using multiple buffers.
Finally, there are people knowing much more than I, so wait for a answer from them while testing your solution.
|
|
-
|
|
|
Time to dig up an old topic :D
I been to busy with other stuff last months to complete the testing on different multi threading procedures.
Recently i started working on my engine again, but considering to rewrite the entire render part. Currently the engine support both D3D9 and D3D10 by using abstract classes and multiple render dll's for each API. The abstract classes cause some limitation and require lot of extra work to make both work the same way. Having the render queue made this a bit more flexible, but required an insane amount of extra programming. Basicly i had to create a class for each command of the D3D SDK's.
During the time period i have worked on other projects, i collected some idea's to improve the overal engine. One of the projects i worked on, was a high performance network server using IOCP, which also implement threading.
So instead of starting from the render engine and see how i can make it thread safe, i started from the threads and see how i should design my render libraries to be compactible and thread safe with these threads.
The basics of the threading model pretty much stay the same. We start with 2 thread.
1) Update thread
2) Render thread
Update thread
1) Wait for an event (shutdown or frametick event)
2) Prepare the render thread (different implementation possible here, see bellow for idea's)
3) Set the render event (this will trigger the render thread to start drawing stuff to the screen)
4) Game logic
Render thread
1) Wait for an event (shutdown or render event)
2) Render everything
3) Set the frametick event (this will trigger the update thread to prepare the next frame)
Now, the command queue was a real bad idea, to reduce the work i create some abstract classes that are way more flexible on multi render api's and the threading model and are most compactible with a simple or complex scene manager.
The base class for this is the IRenderObject class. So any object in the engine that is drawable implement this class. This give a main advantage to the multi rendering api model and the scene management.
Still i have few issues to figure out such as the object loading, when and how it should happen.
This is the basic idea behind the abstract class.
In step 2 of the update thread, any visible render object is added to a list. This list has no lock or syncronize as it should only be writen by step 2 of the update thread. At this point the render thread is idle and will not try to read the list. Once all visible objects is added to the render object list, we continue to step 3, so tell the render thread to start drawing each render object. Here we don't have to worry anymore about the multiple render libraries. Each render library can have a render object class like StaticObject, SkyBox, ... where all D3D (or even OpenGL) code is written inside the class code. So all our render thread has to do, i loop trough the object and call the Draw method.
Now we have a good design for drawing the objects, but comes the issue of object creation. The data itself is not important as it loaded in step 4 of the update thread or even in some background workers. This data is stored in an memory buffer of the class before creating the D3D Buffers.
The main issue on the render buffer is that they must be created inside the render thread. The most easy implementation for this would be
if(pRenderObject->IsDeviceObjectsCreated() == false)
{
pRenderObject->CreateDeviceObjects();
}
pRenderObject->Render();
But how will this affect the render thread for large objects? When adding this in a "loading scene" where a simple GUI screen with progress bar is shown, this should not matter much. But in some case, specially with a multi threading system, u want content stream and do not want the object creation to take up to much time.
The last part of the IRenderObject class is the class parameters such as position. For this we need some kind of double buffer system. Altho it's not really a double buffer as the values used by the game logic and rendering are not the same. The buffering is also something that happens in step 2 of the update thread, so does not need any locks or syncronize objects.
The basic idea for info like position, rotation of objects is simple. U create some values on the object like position, rotation, scale ... the kinda info needed by the render engine for drawing the object on the correct position. These values are updated in step 4 of the update thread, so are updated on the same time as the render thread is accualy drawing the object. A simple solution for not needing syncronizing is adding an extra function to the IRenderObject called UpdateRenderData(). This function will do nothing else as create the matrices used by the render thread by using the values u added to the class like position, scale ... As this happens in the step 2 of the update thread, the game logic part is idle, so not trying to update any of the value needed for updating the render data.
Now the more tricky parts of the multi threading model.
I'm using IOCP for networking, its a pretty complicated part to learn, but once u created your IOCP framewor, its easy to use and implement.
Basicly what the network threads are doing is when they receive some data (for example player movement) is parsing the network code and create a nework message of it. We don't want to do to much work in the network thread to make sure they don't cause any delays. These network messages we forward to the message dispatcher. The dispatches is nothing else as a simple queue. This is one of the few objects in the engine that require locks and syncronisation because more as one network thread can try access the message queue and also the main thread is not aware of a network thread reading/writing the message queue. To reduce the time needed for the main thread to get the messages and process them, the best way would be lock the queueu, copy it, clear it, unlock it and process it. The processing of the network messages also happens in step 4 of the update thread, so does not require additional locks/syncs. When data need to be send back, no locking/sync needed with the IOCP model. U call the BeginSend method and the windows kernel / IOCP mechanism will do the rest and notify your network worker threads that some data must be sent.
When adding more worker threads like background file IO, this should be able to work without locking, except for the message queue that is also used by the network threads.
The game logic detect a new render object must be created. So first thing it does, is adding a new task to the background worker threads (like the network queue's, this need locking). One of the background worker thread pick up the task and start reading the file data and store it into a internal buffer. This buffer is so far only used by the worker thread, so again, no locks. Once the buffer is fully filled with the file data, the worker thread create a new message and add it to the message queue of the main thread. In step 4 of the update thread, the game logic will pickup and process this message. This will cause the thread to copy the data to the IRenderObject data buffer. The SetData function will automaticly reset the DeviceObjectsCreated parameter, so after the next frame trigger the render thread, the render device objects will be created with the data buffer stored in our class.
Now its time for even more thread ... our physics engine.
Peronsally im using PhysX which implement its own threading model. What we do here is let the physics engine process on the background. PhysX is using doublel buffering, so we don't have to care much about syncronization. All we have to do is duing step 4 of the update thread, read information like actor positions from the physics engine and set it on our render object class.
The actors can be stored on the render object itself, so calling the SetRenderData() method updates the matrices needed for rendering directly from the physics engine. This will depend what kinda physics engine u use or if u using any physics engine at all.
p.s.: This is a pure brainstorming idea, i havent found the time yet to write the code. But so far the idea seem to be pretty solid and flexible.
The only syncronize needed is what happens in the worker threads. The main/render thread are syncronized trough the events (based on the lockless programming articles of DXSDK), but the sync/locks are pretty limited.
These are the few locks required:
1) MessageQueue (can be locked by network workers, background workers and during step 4 of the update thread)
2) TaskList (can be locked by step 4 of the update thread and the background worker threads)
3) Internal locks during the double buffering of PhysX???
I will draw a schema later on how all works later and put online :)
|
|
-
|
|
|
| |
| HANDLE g_hEventShutdown = NULL; |
| HANDLE g_hEventSimulate = NULL; |
| HANDLE g_hEventRender = NULL; |
| |
| DWORD WINAPI UpdateThread(LPVOID lpParam) |
| { |
| HANDLE hEvents[] = {g_hEventShutdown, g_hEventSimulate}; |
| |
| // Run the thread loop |
| while(true) |
| { |
| // Wait for thread signals (simulare/shutdown) |
| if(WaitForMultipleObjects(2, hEvents, FALSE, INFINITE) == WAIT_OBJECT_0) |
| { |
| // Shutdown signal received, exit the thread loop |
| break; |
| } |
| |
| // Fill the render object list |
| |
| // Start next render step (tell the render thread to draw the next frame) |
| SetEvent(g_hEventRender); |
| |
| // Process the game logic |
| } |
| |
| // Thread exit sucessful |
| return 0; |
| } |
| |
| DWORD WINAPI RenderThread(LPVOID lpParam) |
| { |
| HANDLE hEvents[] = {g_hEventShutdown, g_hEventRender}; |
| |
| // Run the thread loop |
| while(true) |
| { |
| // Wait for thread signals (render/shutdown) |
| if(WaitForMultipleObjects(2, hEvents, FALSE, INFINITE) == WAIT_OBJECT_0) |
| { |
| // Shutdown signal received, exit the thread loop |
| break; |
| } |
| |
| // Process the render object list |
| |
| // Notify the update thread to render the next frame |
| SetEvent(g_hEventSimulate); |
| } |
| |
| // Thread exit sucessful |
| return 0; |
| } |
| |
| int _tmain(int argc, _TCHAR* argv[]) |
| { |
| // Create the events |
| g_hEventShutdown = CreateEvent(NULL, TRUE, FALSE, NULL); |
| g_hEventSimulate = CreateEvent(NULL, FALSE, FALSE, NULL); |
| g_hEventRender = CreateEvent(NULL, FALSE, FALSE, NULL); |
| |
| // Create the threads |
| DWORD dwDummy = 0; |
| HANDLE hThreadUpdate = CreateThread(NULL, 0, UpdateThread, NULL, 0, &dwDummy); |
| HANDLE hThreadRender = CreateThread(NULL, 0, RenderThread, NULL, 0, &dwDummy); |
| |
| // Start simulating |
| SetEvent(g_hEventSimulate); |
| |
| // Wait for all threads to exit |
| HANDLE hWait[] = {hThreadUpdate, hThreadRender}; |
| WaitForMultipleObjects(2, hWait, TRUE, INFINITE); |
| |
| // Cleanup the events |
| CloseHandle(g_hEventRender); |
| CloseHandle(g_hEventSimulate); |
| CloseHandle(g_hEventShutdown); |
| |
| // Application exit successful |
| return 0; |
| } |
| |
|
|
|