Earlier this year I became interested in how programs like Fraps and OBS are able to record game windows. To learn more about this I decided to write my own screen recording application. This is still something I’m working on but I wanted to write up what I’d done so far to reinforce my understanding.
Fraps and OBS both work by injecting themselves – through DLL injection – into a process and then installing hooks in the graphics API functions. The reason they hook the graphics functions is that doing so gives them access to the graphics objects used by the application. This enables them to grab the images presented each frame which can then be used to produce a video file.
There are many different graphics APIs available and quite a bit of effort is involved to support each one. I initially looked at supporting DirectX12 since it’s the newest API my graphics card supports. I later decided against this since its API is much lower level than previous generations and my goal was to learn about screen recording rather than graphics APIs. I did, however, make good progress with DirectX12 and I hope to come back to it once I’m more familiar with DirectX in general.
After deciding that I’d be using DirectX11, I had to figure out which functions I needed to hook. This led me to DirectX11’s swapchain which is responsible for handling framebuffers. Some functions that were of initial interest to me were Present, ResizeBuffers, GetBuffer and SetFullscreenState. The most important of these is Present. It’s called once for every frame that is presented to the user and it enables you to access the swapchain each frame. The ResizeBuffers function is also important since this should be called each time the screen is resized. This will enable my screen recorder to react to any changes to the screen’s size. GetBuffer and SetFullscreenState ended up mostly being useful for debugging in the beginning.
The first step I needed to take to hook these functions was to find their location in memory. I did this by creating my own swapchain using mostly nonsense (but valid) values as shown in the code block below.
D3D_FEATURE_LEVEL featLevel;
DXGI_SWAP_CHAIN_DESC sd{ 0 };
sd.BufferCount = 1;
sd.BufferUsage = DXGI_USAGE_BACK_BUFFER;
sd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
sd.BufferDesc.Height = 1;
sd.BufferDesc.Width = 1;
sd.OutputWindow = (HWND)1;
sd.Windowed = TRUE;
sd.SampleDesc.Count = 1;
ID3D11Device* pDevice = nullptr;
IDXGISwapChain* pSwapchain = nullptr;
HRESULT hr = D3D11CreateDeviceAndSwapChain(nullptr, D3D_DRIVER_TYPE_REFERENCE, nullptr, 0, nullptr, 0, D3D11_SDK_VERSION, &sd, &pSwapchain, &pDevice, &featLevel, nullptr);
I was then able to retrieve the locations of the functions I was interested in using my swapchain’s virtual method table.
uintptr_t* pSwapVMT = *(uintptr_t**)pSwapchain;
This code works since the pointer to the virtual method table is located at pSwapchain+0. So, by casting my swapchain pointer to a pointer to a pointer and then dereferencing it we get our virtual method table as an array of pointers. I found the swapchain virtual method table indices online and created an enum I could use to easily reference them. I could then initialise pointers to the functions as shown below.
void* pPresent = (void*)pSwapVMT[(int)SwapChainVMTIndices::Present];
void* pResizeBuffers = (void*)pSwapVMT[(int)SwapChainVMTIndices::ResizeBuffers];
void* pGetBuffer = (void*)pSwapVMT[(int)SwapChainVMTIndices::GetBuffer];
void* pSetFullScreenState = (void*)pSwapVMT[(int)SwapChainVMTIndices::SetFullscreenState];
Now that I had a way to get my function addresses, I could go about installing the hooks. I decided to do this manually as a programming exercise but an easier and more reliable way to do this would be to use Microsoft’s Detours library. The basic idea behind doing this manually is that you replace the bytes at the beginning of the function you’re wanting to hook (the ‘target’ function) with those of an assembly jmp instruction that points at a trampoline function. Each function I hook will have its own trampoline function which is created dynamically at runtime and handles jumps to and from its hook. The trampolines are also responsible for running the instructions replaced at the beginning of the target function. This is done immediately before jumping back and ensures that the registers and stack are in the correct state prior to running the target function’s code. An example of how the trampoline for ResizeBuffers could look is shown below (labelled and commented for clarity):
ReturnToTarget: ; Run the instructions replaced in ResizeBuffers and jump back to it
push rbp ; An instruction replaced in the target function. This could be anything. (i bytes in length)
mov rbp, rsp ; An instruction replaced in the target function. This could be anything. (j bytes in length)
mov rdx, 1 ; An instruction replaced in the target function. This could be anything. (k bytes in length)
jmp ResizeBuffers+(i+j+k) ; i+j+k comes from the number of bytes of the preceding 3 instructions. (See above instruction comments). Basically skips to the instructions after the above instructions in the original ResizeBuffers function code
GoToHook:
jmp ResizeBuffersDisplacement ; Relative number of bytes to jump to get to the ResizeBuffers hook
To get to my hook functions from the target functions I installed a jmp instruction which points to the GoToHook section of my trampolines. On the way back I couldn’t easily use a jmp since __asm blocks aren’t supported by Microsoft on x64. To get around this, I created a function pointer for each function which points to the ReturnToTarget label in each trampoline and gave it the same function header as each target function. I could then call these function pointers in my hooks when I wanted to return to my target functions. As an example, here is how I return to ResizeBuffers within my ResizeBuffers hook:
HRESULT __stdcall ResizeBuffersHook(IDXGISwapChain* pThis, UINT BufferCount, UINT Width, UINT Height, DXGI_FORMAT NewFormat, UINT SwapChainFlags)
{
return ReturnToResizeBuffers(pThis, BufferCount, Width, Height, NewFormat, SwapChainFlags);
}
Since I wanted to support both the x86 and x64 architectures I had to take into account the +/- 2gb range of displacement jmp instructions. This is only a potential problem on x64 since on x86 the addressable space is 4gb (±231) and wrap-around is possible. I handled cases where a jump exceeded this limit by using the following:
mov rax, jumpAddress
jmp rax
This method enables you to jump anywhere in a 64 bit application’s address space with the downside that it overwrites a register value. This could be an issue when jumping back to the target function if any of the instructions ran before jumping back involve setting rax to be a particular value that is used later in the function. As an example:
ReturnToTarget:
mov rbp, rsp ; (Copied from the start of the target function)
push rbp ; (Copied from the start of the target function)
mov rax, SomethingImportant ; This instruction writes something to rax that is later used in the target function. (Copied from the start of the target function)
mov rax, Target+X ; rax is overwritten with the address we're returning to in the target function. This means we lose the data written to rax in the previous instruction. This could cause issues
jmp rax
GoToHook:
jmp TargetHookDisplacement ; Relative number of bytes to jump to get to the hook function
I solved this problem by making it so that jumps back to the target function would always be displacement jumps. I did this by creating my trampoline functions in the closest free pages I could find to the target function. This meant that only jumps to my hooks would potentially need to use this workaround which wouldn’t cause any problems.
Now that I had all the graphics API hooks in place I could look at displaying a user interface. There are quite a lot of DirectX11 UI frameworks but I decided to use ImGui since it has a lot of support behind it and offers a wide variety of widgets. I did all of the rendering for my user interface in my Present hook since it runs every frame. This means that my user interface would run as smoothly as the host application. The code for the Present function hook currently looks like:
HRESULT __stdcall PresentHook(IDXGISwapChain* pThis, UINT SyncInterval, UINT Flags)
{
static bool init = false;
if (!init)
{
if (!InitImGui(pThis))
return FALSE;
init = true;
}
// Run per-frame code here
DrawOverlay();
return ReturnToPresent(pThis, SyncInterval, Flags);
}
The first time my Present hook is ran, ImGui will be initialised with the graphics objects it needs to draw to the screen. These are the device and device context, both of which can be easily be obtained using the swapchain with the following:
ID3D11Device* pDevice = nullptr;
ID3D11DeviceContext* pContext = nullptr;
HRESULT hr = pSwapchain->GetDevice(__uuidof(ID3D11Device), (void**)&pDevice);
pDevice->GetImmediateContext(&pContext);
Once ImGui is initialised it’s not difficult to get a simple window drawn to the screen. I’ve created a basic window with a ‘Start Recording’ button on it for now. This was easy to do after following some tutorials on the ImGui documentation.
void DrawMainWindow()
{
// Specify a default position/size in case there's no data in the .ini file.
ImGui::SetNextWindowPos(ImVec2(0, 0), ImGuiCond_FirstUseEver);
ImGui::SetNextWindowSize(ImVec2(130, 37), ImGuiCond_FirstUseEver);
// Main body of the main window starts here.
ImGui::Begin("Screen Recorder", (bool*)true, ImGuiWindowFlags_NoTitleBar | ImGuiWindowFlags_NoResize);
ImGui::Button("Start Recording");
ImGui::End();
}
Adding this finally gave me visual feedback that my Present hook was working correctly. The screenshot below shows my window being rendered within a sample DirectX11 application (top left).
A clear potential issue with this window is that it could appear in screen recordings. I got around this by capturing the frame and writing it to the video file before I drew the window onto the frame. In the video below you can see that my window is not present.
I’ve created a repo on my bitbucket which contains just the code discussed in this post. It does not contain any screen recording functionality since I still have quite a lot I’d like to do with that before releasing it. I’ve heavily commented the code so that it should be reasonably approachable for someone new to the concepts. If you want to run this for yourself, you will need a DLL injector and the Microsoft DirectX SDK.