In this tutorial, you will be introduced to several classes that will help you to create a robust and flexible framework for building DirectX 12 applications. Some of the problems that are solved with the classes introduced in this lesson are managing CPU descriptors, copying CPU descriptors to GPU visible descriptor heaps, managing resource state across multiple threads, and uploading dynamic buffer data to the GPU. To automatically manage the state and descriptors for resources, a custom command list class is also provided.
Contents
- 1 Introduction
- 2 Upload Buffer
- 2.1 UploadBuffer Class
- 2.1.1 UploadBuffer Header
- 2.1.2 UploadBuffer Preamble
- 2.1.3 UploadBuffer::UploadBuffer
- 2.1.4 UploadBuffer::Allocate
- 2.1.5 UploadBuffer::RequestPage
- 2.1.6 UploadBuffer::Reset
- 2.1.7 UploadBuffer::Page::Page
- 2.1.8 UploadBuffer::Page::~Page
- 2.1.9 UploadBuffer::Page::HasSpace
- 2.1.10 UploadBuffer::Page::Allocate
- 2.1.11 UploadBuffer::Page::Reset
- 2.1 UploadBuffer Class
- 3 Descriptor Allocator
- 3.1 DescriptorAllocator Class
- 3.2 DescriptorAllocatorPage Class
- 3.2.1 DescriptorAllocatorPage Header
- 3.2.2 DescriptorAllocatorPage Preamble
- 3.2.3 DescriptorAllocatorPage::DescriptorAllocatorPage
- 3.2.4 DescriptorAllocatorPage::GetHeapType
- 3.2.5 DescriptorAllocatorPage::NumFreeHandles
- 3.2.6 DescriptorAllocatorPage::HasSpace
- 3.2.7 DescriptorAllocatorPage::AddNewBlock
- 3.2.8 DescriptorAllocatorPage::Allocate
- 3.2.9 DescriptorAllocatorPage::ComputeOffset
- 3.2.10 DescriptorAllocatorPage::Free
- 3.2.11 DescriptorAllocatorPage::FreeBlock
- 3.2.12 DescriptorAllocatorPage::ReleaseStaleDescriptors
- 3.3 DescriptorAllocation Class
- 3.3.1 DescriptorAllocation Header
- 3.3.2 DescriptorAllocation Preamble
- 3.3.3 DescriptorAllocation Default Constructor
- 3.3.4 DescriptorAllocation Parameratized Constructor
- 3.3.5 DescriptorAllocation Destructor
- 3.3.6 DescriptorAllocation Move Constructor
- 3.3.7 DescriptorAllocation Move Assignment
- 3.3.8 DescriptorAllocation::Free
- 3.3.9 DescriptorAllocation::IsNull
- 3.3.10 DescriptorAllocation::GetDescriptorHandle
- 3.3.11 DescriptorAllocation::GetNumHandles
- 3.3.12 DescriptorAllocation::GetDescriptorAllocatorPage
- 4 Dynamic Descriptor Heap
- 4.1 DynamicDescriptorHeap Class
- 4.1.1 DynamicDescriptorHeap Header
- 4.1.2 DynamicDescriptorHeap Preamble
- 4.1.3 DynamicDescriptorHeap::DynamicDescriptorHeap
- 4.1.4 DynamicDescriptorHeap::ParseRootSignature
- 4.1.5 DynamicDescriptorHeap::StageDescriptors
- 4.1.6 DynamicDescriptorHeap::ComputeStaleDescriptorCount
- 4.1.7 DynamicDescriptorHeap::RequestDescriptorHeap
- 4.1.8 DynamicDescriptorHeap::CreateDescriptorHeap
- 4.1.9 DynamicDescriptorHeap::CommitStagedDescriptors
- 4.1.10 DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
- 4.1.11 DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch
- 4.1.12 DynamicDescriptorHeap::CopyDescriptor
- 4.1.13 DynamicDescriptorHeap::Reset
- 4.1 DynamicDescriptorHeap Class
- 5 Resource State Tracking
- 5.1 ResourceStateTracker Class
- 5.1.1 ResourceStateTracker Header
- 5.1.2 ResourceStateTracker Preamble
- 5.1.3 ResourceStateTracker::ResourceBarrier
- 5.1.4 ResourceStateTracker::TransitionResource
- 5.1.5 ResourceStateTracker::UAVBarrier
- 5.1.6 ResourceStateTracker::AliasBarrier
- 5.1.7 ResourceStateTracker::FlushResourceBarriers
- 5.1.8 ResourceStateTracker::FlushPendingResourceBarriers
- 5.1.9 ResourceStateTracker::CommitFinalResourceStates
- 5.1.10 ResourceStateTracker::Reset
- 5.1.11 ResourceStateTracker::Lock
- 5.1.12 ResourceStateTracker::Unlock
- 5.1.13 ResourceStateTracker::AddGlobalResourceState
- 5.1.14 ResourceStateTracker::RemoveGlobalResourceState
- 5.1 ResourceStateTracker Class
- 6 Custom Command List
- 7 Conclusion
- 8 Download the Source
- 9 References
The design of these classes prioritizes convenience for the graphics programmer when creating demos (for research purposes ) but may not reflect the most optimized implementations that would be used in production game engines. Feel free to share your thoughts in the comments below about how to improve the design of the classes shown here.
Introduction
As you have learned in the previous lessons, compared to DirectX 11 or OpenGL, DirectX 12 introduces a few architectural changes that creates some challenges for the graphics programmer. These architectural changes provide a lower-level rendering API but also require a lot of additional code to be written just to get anything to appear on screen. When I first started working with DirectX 12, I really struggled with issues such as memory management, descriptors, and resource state management. What’s the best memory management scheme to use to store resources? How do I make sure I have enough descriptors to render a frame?
In this lesson, I will introduce several classes that will greatly simplify the development of DirectX 12 applications. The first of these classes is the UploadBuffer
. The UploadBuffer
is a linear allocator that creates resources in an Upload Heap. The purpose of this class is to provide the ability to upload dynamic constant, vertex, and index buffer data (or any buffer data for that matter) to the GPU. The most common use-case for the UploadBuffer
class is to upload uniform data to a ConstantBuffer
used in a shader. Another typical use-case for the UploadBuffer
is for particle effects. If the particles are simulated on the CPU, the computed particle attributes need to be uploaded to the GPU every frame. Instead of creating a new upload buffer every frame, the UploadBuffer
is used to upload the particle data to the GPU. Another use-case for the UploadBuffer
is rendering a User Interface (UI). If the UI is dynamic (for example if you want to show run-time performance profiling) then the UI needs to be generated every frame with the new output. For each of these use cases, it is ideal to create a large resource in an upload heap, map a CPU pointer to the underlying resource, and copy the required data (using a memcpy
for example).
The next class that I will discuss is the DescriptorAllocator
class that is used to allocate a number of CPU visible descriptors. CPU visible descriptors are used for Render Target Views (RTV) and Depth-Stencil Views (DSV). CPU visible descriptors are also used to create Constant Buffer Views (CBV), Shader Resource Views (SRV), Unordered Access Views (UAV), and creating Samplers but CBVs, SRVs, UAVs, and Samplers require a corresponding GPU visible descriptor before they can be used in a shader.
Whenever a Draw
or Dispatch
command is executed on a command list, any resource that is read from or written to in the shader needs to be bound to the graphics or compute pipeline using a GPU visible descriptor. Although buffer resources can be bound to the GPU using inline descriptors (see Lesson 2), texture resources cannot be bound using inline descriptors and must be bound to the GPU using a descriptor table. If the shader uses a lot of textures (this is the case if you are doing Physically Based Rendering for example), then all of the textures needed during the draw or dispatch call must be bound to the graphics, or compute pipelines at the same time. Usually all of the SRV’s for the textures are bound in a contiguous block of GPU visible descriptors in a single descriptor table range. But if textures are loaded in random order, or the same texture is being used for multiple draw calls then how can one ensure that all of the textures are bound in a contiguous block of GPU visible descriptors? Another issue is that only a single descriptor heap of the same type (CBV_SRV_UAV
, or SAMPLER
) can be bound on the command list at any moment. So all GPU visible descriptors must come from a single descriptor heap (the descriptor heaps can only be changed between Draw
or Dispatch
calls)! Yet another issue arises since descriptors cannot be reused until the command list that is using them has completed executing on the GPU. So how do you know how many GPU visible descriptors need to be allocated up-front? In all but the most simple case, it is impossible to know how many GPU visible descriptors will ever be needed for an entire frame (or 3 frames in the case of triple-buffering). The DynamicDescriptorHeap
class described in this lesson solves the problem of ensuring that all of the GPU visible descriptors are copied to a single GPU visible descriptor heap before a Draw
or Dispatch
command is executed on the GPU.
Another tricky problem to solve in a DirectX 12 renderer is ensuring that resources are always in the correct state when they need to be. In order to perform a resource transition, both the before and after states of the resource need to be specified in the transition barrier. But if a resource is being used in different states in multiple command lists, then the graphics programmer needs to know exactly what state it was used in the previous command list that was executed. A naïve approach would be to create a class that stores both the resource and the current state of that resource. Anytime a transition barrier is performed on the resource, the current resource state is checked and used as the before state. This approach would work in a single-threaded renderer but wouldn’t work if the command lists are being built on different threads! In this case, there is no way to guarantee the state of the resource across multiple threads. The graphics programmer should only be concerned with implementing the graphics application and not concerned with synchronizing the state of a resource across multiple command lists, multiple command queues, and multiple threads! The ResourceStateTracker
class introduced in this lesson strives to solve the problem of tracking the resource state in a multi-threaded renderer.
In order to bring everything together and make the life of a graphics programmer as easy as possible, a custom CommandList
class is introduced which uses the aforementioned classes to simplify loading of texture and buffer resources, tracking resource state and minimizing transition barriers, and ensuring that all of the resources used in a shader are correctly bound to GPU visible descriptors. The goal of the custom CommandList
class described in this lesson is to abstract all of the complications of using DirectX 12 away and reduce the game specific code from thousands of lines of (user) code to just a few hundred.
Upload Buffer
The UploadBuffer
class provides a simple wrapper around a resource that is created in an upload heap. The UploadBuffer
is implemented as a linear allocator that allocates chunks or blocks of memory from memory pages. If a memory page cannot satisfy an allocation request, a new page is created and added to a list of available pages. A linear allocator can’t grow indefinitely so when a page of memory is no longer in use (for example, the command list that uses an allocation from that page is finished executing on the GPU) then the page can be returned to the list of available pages in the heap. The image below shows an example of a linear allocator.
A linear allocator is probably the simplest allocator to implement since it only needs to store two pointers per memory page (the base pointer, and the current offset in the page). The above image shows an example of a linear allocator after several allocations have been made. The red blocks represent allocated blocks while the green blocks represent free blocks within the page. Allocated blocks are not freed or returned back to the memory page but once all of the allocations are no longer being used, then the entire page of memory can be returned to the available pages for the allocator and the offset pointer within the page is reset to the base pointer. The green chunks of free memory between the allocated blocks are a result of external fragmentation created by the alignment of allocated blocks. For example, if the first allocation is a block of 64 bytes and the next allocation needs to be aligned to 256-bytes (constant buffers are required to be aligned to 256-bytes) then there are 192 bytes of unused space in the memory page between the first and second allocations.
The linear allocator also suffers from internal fragmentation when a block of memory is requested but the size of the allocation is smaller than the requested alignment. For example, a block of 64 bytes of memory is 256-byte aligned (this is typical of a constant buffer that contains only a single 4×4 matrix). The allocation returns 256 bytes even if only 64 bytes will ever be used.
The shaded area in the second allocation shown in the image above is unused memory resulting in internal fragmentation since only 64 bytes was allocated but it required 256 byte alignment so 192 bytes remain unused.
Regardless of the internal and external fragmentation issues, the linear allocator is ideal due to its simplicity and speed. Allocating from the linear allocator only requires the offset pointer to be updated which can be performed in constant time (\(\mathcal{O}(1)\)).
UploadBuffer Class
As mentioned in the introduction, the UploadBuffer
class is used to satisfy requests for memory that must be uploaded to the GPU. When the data in the upload buffer is no longer required, the memory pages can be reused. A page only becomes available again when the command list that is using memory from a page of memory in the upload buffer is finished executing on the GPU. In order to simplify the implementation of the UploadBuffer
class, it is assumed that each UploadBuffer
instance is associated to a single command list/allocator. In the first tutorial, you learned that a command allocator can’t be reset unless it is no longer “in-flight” on the command queue. Similar to the command allocator, the UploadBuffer
is only reset when any memory allocations from the UploadBuffer
are no longer “in-flight” on the command queue. This is shown later in this lesson when describing the custom CommandList
class.
The implementation of this UploadBuffer
class is inspired by the implementation of the LinearAllocator
class in the MiniEngine provided with the DirectX-Graphics-Samples repository available on GitHub [1].
The UploadBuffer
class provides the following functionality:
Allocate
: Allocates a chunk of memory that can be used to upload data to the GPU.Reset
: Release all allocations for reuse.
This provides a very simple interface definition for the UploadBuffer
class.
The header file for the UploadBuffer
class is shown next.
UploadBuffer Header
The UploadBuffer
header file defines the public, and private members of the class. The preamble is shown first which defines the header file dependencies for the class.
1 2 3 4 5 6 7 8 9 10 11 12 |
/** * An UploadBuffer provides a convenient method to upload resources to the GPU. */ #pragma once #include <Defines.h> #include <wrl.h> #include <d3d12.h> #include <memory> #include <deque> |
The Defines.h
header file included on line 6 contains a few useful macro definitions. This file is local to the project but the contents are not shown here for brevity. The source code for this file is available on GitHub here: Defines.h
The wrl.h
header file provides access to the ComPtr
template class.
The d3d12.h
header file contains the interfaces for the DirectX 12 API.
The memory
header contains the std::shared_ptr
which is used to track the lifetime of memory pages in the allocator. The deque
header contains the std::deque
container class which is used to store a pool of memory pages.
1 2 3 4 5 6 7 8 9 |
class UploadBuffer { public: // Use to upload data to the GPU struct Allocation { void* CPU; D3D12_GPU_VIRTUAL_ADDRESS GPU; }; |
The Allocation
structure defined on line 18 is used to return an allocation from the UploadBuffer::Allocate
method which is shown later.
1 2 3 4 |
/** * @param pageSize The size to use to allocate new pages in GPU memory. */ explicit UploadBuffer(size_t pageSize = _2MB); |
The UploadBuffer
class declares only a single constructor which takes the size of a memory page as its only argument. The default size of a page of memory is 2MB. 2MB should be sufficient for most cases, depending on usage. The size of a memory page should be approximately large enough to contain all of the allocations for a single command list. If a lot of dynamic memory allocations are made in the command list, then it may be worthwhile to make larger pages. It is important to understand that the memory pages are never returned to the system. Once a page is allocated, it is never deallocated unless the UploadBuffer
instance is destructed. The intention of the UploadBuffer
is that it is reused each frame so the same allocations will likely be made the next frame, but the data will be different. If the pages are never freed, then the cost of creating the pages each frame can be avoided.
1 2 3 4 |
/** * The maximum size of an allocation is the size of a single page. */ size_t GetPageSize() const { return m_PageSize; } |
The GetPageSize
method simply returns the size of a single page of the allocator. This can be used to check if an allocation can be satisfied by the UploadBuffer
. If an allocation can’t be satisfied (if the page size is too small for example) then this might be an indication that the page size needs to be larger.
1 2 3 4 5 6 7 8 |
/** * Allocate memory in an Upload heap. * An allocation must not exceed the size of a page. * Use a memcpy or similar method to copy the * buffer data to CPU pointer in the Allocation structure returned from * this function. */ Allocation Allocate(size_t sizeInBytes, size_t alignment); |
The Allocate
method allocates a chunk of memory with the specified allocation. The Allocation
structure returned from this method is used to copy the CPU memory into the GPU virtual address space.
1 2 3 4 5 |
/** * Release all allocated pages. This should only be done when the command list * is finished executing on the CommandQueue. */ void Reset(); |
The Reset
method is used to reset any allocations so that the memory can be reused for the next frame.
To keep track of the memory pages, an internal Page
struct is defined. The Page
struct stores a base CPU pointer, the offset within the page, and the ID3D12Resource
that holds the GPU memory.
1 2 3 4 5 |
private: // A single page for the allocator. struct Page { Page(size_t sizeInBytes); |
The Page
structure has only a single constructor which takes the size of the page as its only arguments. This is the same as the pageSize
argument that is passed to the constructor of the UplodBuffer
class.
1 2 3 |
// Check to see if the page has room to satisfy the requested // allocation. bool HasSpace(size_t sizeInBytes, size_t alignment ) const; |
The Page::HasSpace
method is used to check if the page can satisfy the requested allocation. If the allocation cannot be satisfied by the current page, the current page is retired and a new page is created.
1 2 3 4 5 |
// Allocate memory from the page. // Throws std::bad_alloc if the the allocation size is larger // that the page size or the size of the allocation exceeds the // remaining space in the page. Allocation Allocate(size_t sizeInBytes, size_t alignment); |
The Page::Allocate
method is used to perform the actual allocation with the memory page.
1 2 |
// Reset the page for reuse. void Reset(); |
The Page::Reset
method is used to reset the page for reuse. This resets the offset within the page to 0.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
private: Microsoft::WRL::ComPtr<ID3D12Resource> m_d3d12Resource; // Base pointer. void* m_CPUPtr; D3D12_GPU_VIRTUAL_ADDRESS m_GPUPtr; // Allocated page size. size_t m_PageSize; // Current allocation offset in bytes. size_t m_Offset; }; |
The data that is private to the Page
structure is the ID3D12Resource
that contains the GPU memory for the page, the CPU and GPU base pointers, and the current offset within the page. The m_PageSize
variable is also stored to make sure the requested allocation can be satisfied.
The UploadBuffer
class needs to keep track of a pool of pages and provide a method to create new pages as required.
1 2 |
// A pool of memory pages. using PagePool = std::deque< std::shared_ptr<Page> >; |
The PagePool
type alias defines a std::deque
container that stores pointers to the memory pages.
1 2 3 |
// Request a page from the pool of available pages // or create a new page if there are no available pages. std::shared_ptr<Page> RequestPage(); |
The RequestPage
private method is used to provide an available memory page if one is available. If there are no more available pages, a new one is created and added to the page pool.
1 2 |
PagePool m_PagePool; PagePool m_AvailablePages; |
The m_PagePool
member variable is a PagePool
used to hold all of the pages that have ever been created by the UploadBuffer
class. The m_AvailablePages
member variable on the other hand, is a pool of pages that are available for allocation.
1 2 3 4 5 6 |
std::shared_ptr<Page> m_CurrentPage; // The size of each page of memory. size_t m_PageSize; }; |
The m_CurrentPage
member variable is used to store a pointer to the current memory page. The m_PageSize
variable stores the size of a memory page. This is set to the pageSize
constructor argument and is used for allocating new pages.
View the full source code for UploadBuffer.h
UploadBuffer Preamble
The preamble for the source file of the UploadBuffer
class contains the header file dependencies that are specific to the implementation of the class.
1 2 3 4 5 6 7 8 9 10 |
#include <DX12LibPCH.h> #include <UploadBuffer.h> #include <Application.h> #include <Helpers.h> #include <d3dx12.h> #include <new> // for std::bad_alloc |
The DX12LibPCH.h
header file is the precompiled header file for the DX12Lib
project. All of the classes described in this article are part of the DX12Lib project.
The UploadBuffer.h
is the header file that was just described in the previous section.
The Helpers.h
header file contains some helper functions that are used by the UploadBuffer
class. The source code for this file can be retrieved here: Helpers.h.
The d3dx12.h
provides some helper functions and structs specific for DirectX 12. This file is hosted on GitHub and not distributed with the Windows 10 SDK. It is good practice to check GitHub if there is a new version of this file and always use the latest version in your own projects.
The new
header contains the std::bad_alloc
exception class which is thrown if an allocation larger than the size of a page is requested.
UploadBuffer::UploadBuffer
The UploadBuffer
class provides a single parameterized constructor. The constructor takes the size of a memory page as its only argument.
1 2 3 |
UploadBuffer::UploadBuffer(size_t pageSize) : m_PageSize(pageSize) {} |
Besides setting the m_PageSize
member variable, the constructor does nothing. Memory pages will only be allocated if an allocation is requested. The UploadBuffer
class is intended to be used as an internal class for the custom CommandList
class (that is shown later in the lesson). If dynamic allocations are not required by the command list, then no pages will be allocated. This is a typical example of Lazy Initialization.
UploadBuffer::Allocate
The Allocate
method is used to allocate a chunk (or block) of memory from a memory page. This method returns an UploadBuffer::Allocation
struct that was defined in the header file.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
UploadBuffer::Allocation UploadBuffer::Allocate(size_t sizeInBytes, size_t alignment) { if (sizeInBytes > m_PageSize) { throw std::bad_alloc(); } // If there is no current page, or the requested allocation exceeds the // remaining space in the current page, request a new page. if (!m_CurrentPage || !m_CurrentPage->HasSpace(sizeInBytes, alignment)) { m_CurrentPage = RequestPage(); } return m_CurrentPage->Allocate(sizeInBytes, alignment); } |
The Allocate
method takes two arguments:
size_t sizeInBytes
: The size of the allocation in bytes.size_t alignment
: The memory alignment of the allocation in bytes. For example, allocations for constant buffers must be aligned to 256 bytes.
If the size of the allocation exceeds the size of a memory page, the method throws a std::bad_alloc exception.
If there is either no memory page (this is the case when the UploadBuffer
is first created) or the current page cannot satisfy the request, a new page is requested.
On line 33, the actual allocation is made from the current memory page and the resulting allocation is returned to the caller.
UploadBuffer::RequestPage
If either the allocator does not have a page to make an allocation from, or the current page does not have the available space to satisfy the allocation request, a new page must be retrieved from the list of available pages or a new page must be created. The RequestPage
method will return a memory page that can be used to satisfy allocation requests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
std::shared_ptr<UploadBuffer::Page> UploadBuffer::RequestPage() { std::shared_ptr<Page> page; if (!m_AvailablePages.empty()) { page = m_AvailablePages.front(); m_AvailablePages.pop_front(); } else { page = std::make_shared<Page>(m_PageSize); m_PagePool.push_back(page); } return page; } |
If there are pages available in the m_AvailablePages
queue, the the Page
at the front of the queue is retrieved an popped off the queue.
If there are no available pages, then a new page is created and pushed to the back the m_PagePool
queue. The m_PagePool
queue stores all of the pages created by the allocator. In this case, the page is not added to the m_AvailablePages
queue because it is going to be used to satisfy the allocation request. When the UploadBuffer
is reset, the m_PagePool
queue is used reset the m_AvailablePages
queue (which is shown later when the Reset
function is described).
UploadBuffer::Reset
The Reset
method is used to reset all of the memory allocations so that they can be reused for the next frame (or more specifically, the next command list recording).
1 2 3 4 5 6 7 8 9 10 11 12 |
void UploadBuffer::Reset() { m_CurrentPage = nullptr; // Reset all available pages. m_AvailablePages = m_PagePool; for ( auto page : m_AvailablePages ) { // Reset the page for new allocations. page->Reset(); } } |
The Reset
method makes all of the pages available again by copying the m_PagePool
to the m_AvailablePages queue.
On line 60, the available pages are reset to prepare them for new allocations.
UploadBuffer::Page::Page
The constructor for a Page takes the size of the page as its only argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
UploadBuffer::Page::Page(size_t sizeInBytes) : m_PageSize(sizeInBytes) , m_Offset(0) , m_CPUPtr(nullptr) , m_GPUPtr(D3D12_GPU_VIRTUAL_ADDRESS(0)) { auto device = Application::Get().GetDevice(); ThrowIfFailed(device->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD), D3D12_HEAP_FLAG_NONE, &CD3DX12_RESOURCE_DESC::Buffer(m_PageSize), D3D12_RESOURCE_STATE_GENERIC_READ, nullptr, IID_PPV_ARGS(&m_d3d12Resource) )); m_GPUPtr = m_d3d12Resource->GetGPUVirtualAddress(); m_d3d12Resource->Map(0, nullptr, &m_CPUPtr); } |
The Page
constructor also creates the ID3D12Resource
as a committed resource in an upload heap. The creation of committed resource is described in Lesson 2 and for brevity is not described again here.
After the resource is created, the GPU and CPU addresses are retrieved using the ID3D12Resource::GetGPUVirtualAddress
and ID3D12Resource::Map
methods respectively. As long as the resource is created in an upload heap, it is safe to leave the resource mapped until the resource is no longer needed.
UploadBuffer::Page::~Page
The destructor for the Page
struct unmaps the resource memory using the ID3D12Resource::Unmap
method and resets the CPU and GPU pointers to 0. Since the m_d3d12Resource
is stored using a ComPtr
there is no need to explicitly release it since it will be automatically released after the Page
is destructed.
1 2 3 4 5 6 |
UploadBuffer::Page::~Page() { m_d3d12Resource->Unmap(0, nullptr); m_CPUPtr = nullptr; m_GPUPtr = D3D12_GPU_VIRTUAL_ADDRESS(0); } |
Before allocating memory from a Page
, the Page
must have enough space to satisfy the allocation request. The Page::HasSpace
method is used to check if the page can satisfy the requested allocation.
UploadBuffer::Page::HasSpace
The Page::HasSpace
method checks to see if the page can satisfy the requested allocation. This method returns true
if the allocation can be satisfied, or false
if the allocation cannot be satisfied.
1 2 3 4 5 6 7 |
bool UploadBuffer::Page::HasSpace(size_t sizeInBytes, size_t alignment ) const { size_t alignedSize = Math::AlignUp(sizeInBytes, alignment); size_t alignedOffset = Math::AlignUp(m_Offset, alignment); return alignedOffset + alignedSize <= m_PageSize; } |
The HasSpace
method must take the alignment into consideration. If the requested aligned allocation can be satisfied, this method returns true
.
UploadBuffer::Page::Allocate
The Page::Allocate
method is where the actual allocation occurs. This method returns an Allocation
structure that can be used to directly copy (using memcpy
for example) CPU data to the GPU and bind that GPU address to the pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
UploadBuffer::Allocation UploadBuffer::Page::Allocate(size_t sizeInBytes, size_t alignment) { if (!HasSpace(sizeInBytes, alignment)) { // Can't allocate space from page. throw std::bad_alloc(); } size_t alignedSize = Math::AlignUp(sizeInBytes, alignment); m_Offset = Math::AlignUp(m_Offset, alignment); Allocation allocation; allocation.CPU = static_cast<uint8_t*>(m_CPUPtr) + m_Offset; allocation.GPU = m_GPUPtr + m_Offset; m_Offset += alignedSize; return allocation; } |
If the Page
does not have enough space to satisfy the allocation request, this method will throw a std::bad_alloc
exception.
Page::Allocate
method shown here on lines 105 – 109. Feel free to remove this check in your own implementation. Both the size and the starting address of an allocation should be aligned to the requested alignment. In most cases the size of the allocation will already be aligned to the requested alignment (for example, when allocating memory for a vertex or index buffer) but to ensure correctness, the requested allocation size is explicitly aligned up to the requested alignment on line 111.
On line 112, the current offset within the page must also be aligned to the requested alignment.
On line 114 – 115 the aligned CPU and GPU addresses are written to the Allocation
structure that is returned by this method.
On line 118, the page’s pointer offset is incremented by the aligned size of the allocation.
On line 120, the Allocation
structure is returned to the caller.
Page::Allocate
method is not thread safe! If you require thread safety for this method then you may want to insert a std::lock_guard
before line 105 of this method. Since I do not use the same instance of an UploadBuffer
class across multiple threads, I consider this to be unnecessary overhead (there is some cost associated with locking and unlocking mutexes that I do not want to pay for here). UploadBuffer::Page::Reset
The Page::Reset
method simply resets the page’s pointer offset to 0 so that it can be used to make new allocations.
1 2 3 4 |
void UploadBuffer::Page::Reset() { m_Offset = 0; } |
This concludes the implementation of the UploadBuffer
class. In the next section, the DescriptorAllocator
class is described. As the name implies, the DescriptorAllocator
class is used to allocate (CPU visible) descriptors. CPU visible descriptors are used to create views for resources (for example Render Target Views (RTV), Depth-Stencil Views (DSV), Constant Buffer Views (CBV), Shader Resource Views (SRV), Unordered Access Views (UAV), and Samplers). Before a CBV, SRV, UAV, or Sampler can be used in a shader, it must be copied to a GPU visible descriptor. The DynamicDescriptorHeap
class handles copying of CPU visible descriptors to GPU visible descriptor heaps. The DynamicDescriptorHeap
class is the subject of the next following sections.
View the full source code for UploadBuffer.cpp
Descriptor Allocator
The DescriptorAllocator
class is used to allocate descriptors from a CPU visible descriptor heap. CPU visible descriptors are useful for “staging” resource descriptors in CPU memory and later copied to a GPU visible descriptor heap for use in a shader.
CPU visible descriptors are used for describing:
- Render Target Views (RTV)
- Depth-Stencil Views (DSV)
- Constant Buffer Views (CBV)
- Shader Resource Views (SRV)
- Unordered Access Views (UAV)
- Samplers
The DescriptorAllocator
class is used to allocate descriptors to the application when loading new resources (like textures). In a typical game engine, resources may need to be loaded and unloaded from memory at sporadic moments while the player moves around the level. To support large dynamic worlds, it may be necessary to initially load some resources, unload them from memory, and reload different resources. The DescriptorAllocator
manages all of the descriptors that are required to describe those resources. Descriptors that are no longer used (for example, when a resource is unloaded from memory) will be automatically returned back to the descriptor heap for reuse.
The DescriptorAllocator
class uses a free list memory allocation scheme inspired by the Variable Sized Memory Allocations Manager by Diligent Graphics [2] to manage the descriptors. A free list keeps track of a list of available allocations. Each entry of the free list stores the available allocations from a page of memory. Each entry of the free list stores the offset from the beginning of the memory page and the size of the available allocation. In order to satisfy the allocation, the free list is searched for an entry that is large enough to satisfy the allocation request. If the allocation cannot be satisfied by the current page, a new page is created in memory.
The above image shows an example of pages of memory that are allocated using a free list allocation strategy. The top image shows the initial state of the page before any allocations are made. In this case, the free list contains only a single entry which refers to the entire memory page. The bottom image shows an example of a memory page after several allocations have been made. In this case, the free list contains several entries which represent the available blocks of memory in the page.
To make a new allocation from the page, all of the entries in the free list are searched and the first block that is large enough to satisfy the request is used. If there are no free blocks that can satisfy the request, then a new page is allocated.
This strategy for allocating memory is called first-fit (find the first free block that fits) and is the easiest strategy to implement since it only consists of a linear search through the free list but it is not the most efficient method to use for allocation. A linear search has \(\mathcal{O}(n)\) (worst case) complexity (where \(n\) is the number of entries in the free list).
A better technique would be to sort the free blocks by their size and perform a binary-search through the sizes to find a block that is large enough to satisfy the request. If you remember for your algorithm analysis class, a binary search has \(\mathcal{O}(log_2n)\) complexity (where \(n\) is the number of values to search) which is better than \(\mathcal{O}(n)\).
The above image shows a memory page after several allocations have been made. The binary tree in the bottom of the image represents the entries of the free list sorted by size. Using the binary tree, an allocation of 160 bytes can be satisfied by searching just three nodes. Using the linear list would require five entries to be searched before the allocation could be satisfied. With only six entries in the free list, this may not seem like a significant performance improvement, but with thousands (or millions) of entries, the performance improvement is significant.
Three different classes are used to implement this strategy:
DescriptorAllocator
: This is the main interface to the application for requesting descriptors. TheDescriptorAllocator
class manages the descriptor pages.DescriptorAllocatorPage
: This class is a wrapper for aID3D12DescriptorHeap
. TheDescriptorAllocatorPage
also keeps track of the free list for the page.DescriptorAllocation
: This class wraps an allocation that is returned from theDescriptorAllocator::Allocate
method. TheDescriptorAllocation
class also stores a pointer back to the page it came from and will automatically free itself if the descriptor(s) are no longer required.
The DescriptorAllocator
class is described first.
DescriptorAllocator Class
The implementation of the DescriptorAllocator
class is very similar to the UploadBuffer
class shown in the previous section. The DescriptorAllocator
class stores a pool of DescriptorAllocatorPage
s. If there are no pages that can satisfy a request, a new page is created and added to the pool. Similar to the UploadBuffer
class, the DescriptorAllocator
class has a very simple public interface and only provides a method to allocate descriptors.
DescriptorAllocator Header
The header file for the DescriptorAllocator
class declares the public and private members of the class. The preamble for the header file is shown first which includes the dependencies for the class.
1 2 3 4 5 6 7 8 9 10 11 |
#include "DescriptorAllocation.h" #include "d3dx12.h" #include <cstdint> #include <mutex> #include <memory> #include <set> #include <vector> class DescriptorAllocatorPage; |
The DescriptorAllocator::Allocate
method returns a DescriptorAllocation
by value which requires the DescriptorAllocation.h
header file to be included (on line 40) in this file.
The ubiquitous d3dx12.h
header file included on line 42 is required for the DirectX 12 API and helper structures and functions.
The cstdint
header file included on line 44 is required for the fixed-width integer types (uint32_t
, and uint64_t
).
The mutex
header file is included for the std::mutex
synchronization primitive. The mutex
is used in the Allocate
method to allow allocations to be safely made across multiple threads.
The memory
header file is required for the std::shared_ptr
pointer class. Shared pointers are used to track the lifetime of the pages. Each allocation also stores a shared pointer back to the page it came from.
The set
header file includes the std::set
container class. A set
is used to store an ordered list of indices to the available pages in the page pool.
The vector
header file includes the std::vector
container class.
The DescriptorAllocatorPage
class is used by the DescriptorAllocator
class but the header file does not need to be included since the DescriptorAllocatorPage
class is only used as a pointer within the DescriptorAllocator
class. In this case, it is sufficient to provide a forward-declaration of the class (on line 50) without the need to include the header file.
The DescriptorAllocator
class defines two public member functions:
DescriptorAllocator::Allocate
: Allocates a number of contiguous descriptors from a CPU visible descriptor heap.DescriptorAllocator::ReleaseStaleDescriptors
: Frees any stale descriptors that can be returned to the list of available descriptors for reuse. This method should only be called after any of the descriptors that were freed are no longer being referenced by the command queue.
The definition of these methods is shown later. The declaration of these methods is made in the header file for the DescriptorAllocator
class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
class DescriptorAllocator { public: DescriptorAllocator(D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptorsPerHeap = 256); virtual ~DescriptorAllocator(); /** * Allocate a number of contiguous descriptors from a CPU visible descriptor heap. * * @param numDescriptors The number of contiguous descriptors to allocate. * Cannot be more than the number of descriptors per descriptor heap. */ DescriptorAllocation Allocate(uint32_t numDescriptors = 1); /** * When the frame has completed, the stale descriptors can be released. */ void ReleaseStaleDescriptors( uint64_t frameNumber ); |
The DescriptorAllocator
constructor declared on line 55 takes two parameters. The first is the type of descriptors that the DescriptorAllocator
will allocate. This can be one of the CBV_SRV_UAV
, SAMPLER
, RTV
, or DSV
types.
The second parameter to the constructor is the number of descriptors per descriptor heap. By default, descriptor heaps will be created with 256 descriptors. This value is arbitrary and only needs to be as large as the maximum number of contiguous descriptors that will ever be needed. If all of the descriptors in a descriptor heap have been exhausted, a new heap will be created to satisfy the allocation request.
The DescriptorAllocator::Allocate
method allocates a number contiguous descriptors from a descriptor heap. By default, only a single descriptor is allocated. The numDescriptors
argument can be specified if more than one descriptor is required. This method returns a DescriptorAllocation
which is a wrapper for the allocated descriptor. The DescriptorAllocation
class is described later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
private: using DescriptorHeapPool = std::vector< std::shared_ptr<DescriptorAllocatorPage> >; // Create a new heap with a specific number of descriptors. std::shared_ptr<DescriptorAllocatorPage> CreateAllocatorPage(); D3D12_DESCRIPTOR_HEAP_TYPE m_HeapType; uint32_t m_NumDescriptorsPerHeap; DescriptorHeapPool m_HeapPool; // Indices of available heaps in the heap pool. std::set<size_t> m_AvailableHeaps; std::mutex m_AllocationMutex; }; |
The DescriptorHeapPool
defined on line 72 is a type alias of a std::vector
of DescriptorAllocatorPage
s.
The DescriptorAllocator::CreateAllocatorPage
method declared on line 75 is an internal method that is used to create a new allocator page if there are no pages in the page pool that can satisfy the allocation request.
The m_HeapType
variable stores the type of descriptors to allocate. This variable is also used to create new descriptor heaps.
The m_NumDescriptorsPerHeap
variable stores the number of descriptors to create per descriptor heap.
The m_HeapPool
is a std::vector
of DescriptorAllocatorPage
s. This variable is used to keep track of all allocated pages.
The m_AvailableHeaps
is a std::set
of indices of available pages in the m_HeapPool
vector. If all of the descriptors in a DescriptorAllocatorPage
have been exhausted, then the index of that page in the m_HeapPool
vector is removed from the m_AvailableHeaps
set. This ensures that empty pages are skipped when looking for a DescriptorAllocatorPage
that can satisfy the allocation request.
Since the DescriptorAllocator
class is intended to be thread safe, a std::mutex
is used to guard against multiple threads allocating or deallocating from the DescriptorAllocator
at the same time.
In the next sections, the implementation of the DescriptorAllocator
is shown.
View the full source code for DescriptorAllocator.h
DescriptorAllocator Preamble
Before defining the methods of the DescriptorAllocator
class, a few header files used by the class need to be included.
1 2 3 4 |
#include <DX12LibPCH.h> #include <DescriptorAllocator.h> #include <DescriptorAllocatorPage.h> |
The DX12LibPCH.h
is the precompiled header file for the DX12Lib
project.
The DescriptorAllocator.h
header file included on line 3 was just described in the previous section and the DescriptorAllocatorPage.h
header file contains the declaration of the DescriptorAllocatorPage
class (which will be shown later).
DescriptorAllocator::DescriptorAllocator
Similar to the constructor for the UploadBuffer
class shown previously, the constructor for the DescriptorAllocator
class does very little except initializing the class’s member variables.
1 2 3 4 5 |
DescriptorAllocator::DescriptorAllocator(D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptorsPerHeap) : m_HeapType(type) , m_NumDescriptorsPerHeap(numDescriptorsPerHeap) { } |
The m_HeapType
and m_NumDescriptorsPerHeap
member variables are initialized based on the arguments passed to the constructor.
DescriptorAllocator::CreateAllocatorPage
The CreateAllocatorPage
method is used to create a new page of descriptors. The DescriptorAllocatorPage
class (which will be shown later) is a wrapper for the ID3D12DescriptorHeap
and manages the actual descriptors.
1 2 3 4 5 6 7 8 9 |
std::shared_ptr<DescriptorAllocatorPage> DescriptorAllocator::CreateAllocatorPage() { auto newPage = std::make_shared<DescriptorAllocatorPage>( m_HeapType, m_NumDescriptorsPerHeap ); m_HeapPool.emplace_back( newPage ); m_AvailableHeaps.insert( m_HeapPool.size() - 1 ); return newPage; } |
The DescriptorAllocator::CreateAllocatorPage
is very simple. On line 17 a new DescriptorAllocatorPage
is created and added to the pool. On line 20, the index of the page in the pool is added to the m_AvailableHeaps
set.
On line 22, the new page is returned to the calling function.
DescriptorAllocator::Allocate
The Allocate
method allocates a contiguous block of descriptors from a descriptor heap. The method iterates through the available descriptor heap (pages) and tries to allocate the requested number of descriptors until a descriptor heap (page) is able to fulfill the requested allocation. If there are no descriptor heaps that can fulfill the request, then a new descriptor heap (page) is created that can fulfill the request.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
DescriptorAllocation DescriptorAllocator::Allocate(uint32_t numDescriptors) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); DescriptorAllocation allocation; for ( auto iter = m_AvailableHeaps.begin(); iter != m_AvailableHeaps.end(); ++iter ) { auto allocatorPage = m_HeapPool[*iter]; allocation = allocatorPage->Allocate( numDescriptors ); if ( allocatorPage->NumFreeHandles() == 0 ) { iter = m_AvailableHeaps.erase( iter ); } // A valid allocation has been found. if ( !allocation.IsNull() ) { break; } } |
Before allocating any descriptors, the m_AllocationMutex
mutex is locked to ensure the current thread has exclusive access to the allocator.
The result of the allocation is stored in the allocation
variable defined on line 29.
On lines 31-47, the available descriptor heaps are iterated and on line 35 an allocation of the requested number of descriptors is made. If the allocator page was able to satisfy the requested number of descriptors, then a valid descriptor allocation is returned. If the allocation resulted in the allocator page becoming empty (the number of free descriptor handles reaches 0) then the index of the current page is removed from the set of available heaps (on line 39).
If a valid descriptor handle was allocated from the allocator page (the descriptor handle is not null) then the loop breaks on line 45.
If there were no available allocator pages (which is the case when the DescriptorAllocator
is created) or none of the available allocator pages could satisfy the request, then a new allocator page is created.
1 2 3 4 5 6 7 8 9 10 11 |
// No available heap could satisfy the requested number of descriptors. if ( allocation.IsNull() ) { m_NumDescriptorsPerHeap = std::max( m_NumDescriptorsPerHeap, numDescriptors ); auto newPage = CreateAllocatorPage(); allocation = newPage->Allocate( numDescriptors ); } return allocation; } |
On line 50, the descriptor allocation is checked for validity. If it is still an invalid descriptor (a null descriptor) then a new descriptor page, that is at least as large as the number of requested descriptors, is created on line 53 using the DescriptorAllocator::CreateAllocatorPage
method described earlier.
On line 55, the requested allocation is made (which should be guaranteed to succeed) and the resulting allocation is returned to the caller on line 58.
DescriptorAllocator::ReleaseStaleDescriptors
The last method of the DescriptorAllocator
class is the ReleaseStaleDescriptors
method. The ReleaseStaleDescriptors
method iterates over all of the descriptor heap pages and calls the page’s ReleaseStaleDescriptors
method. If, after releasing the stale descriptors, the page has free handles, it’s added to the list of available heaps.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
void DescriptorAllocator::ReleaseStaleDescriptors( uint64_t frameNumber ) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); for ( size_t i = 0; i < m_HeapPool.size(); ++i ) { auto page = m_HeapPool[i]; page->ReleaseStaleDescriptors( frameNumber ); if ( page->NumFreeHandles() > 0 ) { m_AvailableHeaps.insert( i ); } } } |
In order to prevent modifications of the DescriptorAllocator
in other threads, the m_AllocationMutex
mutex is locked on line 63.
On lines 65-75, the pages of heap pool are iterated calling the page’s ReleaseStaleDescriptors
method. The implementation of the DescriptorAllocatorPage::ReleaseStaleDescriptors
method is shown in the following sections.
Pages that have free descriptor handles are added to the set of available heaps on line 73. It’s okay to add the same index to the set multiple times since the std::set
is guaranteed to only store unique values.
View the full source code for DescriptorAllocator.cpp
DescriptorAllocatorPage Class
The purpose of the DescriptorAllocatorPage
class is to provide the free list allocator strategy for an ID3D12DescriptorHeap
. The DescriptorAllocatorPage
class is not intended to be used outside of the DescriptorAllocator
class so the library end user doesn’t necessarily need to know the details of this class. Knowing the details of this class is more interesting to someone who is writing their own DirectX 12 library or to someone who wants to understand the implementation details provided by the DX12Lib project that has been created for the purpose of these tutorials. As previously mentioned, the implementation of this class is heavily inspired by Variable Size Memory Allocations Manager from Diligent Graphics [2].
The DescriptorAllocatorPage
class must be able to satisfy descriptor allocation requests but it also needs to provide some functions to query the number of free handles and to check to see if it has sufficient space to satisfy a request. The DescriptorAllocatorPage
provides the following (public) methods:
HasSpace
: Check to see if theDescriptorAllocatorPage
has a contiguous block of descriptors that is large enough to satisfy a request.NumFreeHandles
: Returns the number of available descriptor handles in the descriptor heap. Note that due to fragmentation of the free list, allocations that are less than or equal to the number of free handles could still fail.Allocate
: Allocates a number of contiguous descriptors from the descriptor heap. If theDescriptorAllocatorPage
is not able to satisfy the request, this function will return a nullDescriptorAllocation
Free
: Returns aDescriptorAllocation
back to the heap. Since descriptors can’t be reused until the command list that is referencing them has finished executing on the command queue, the descriptors are not returned directly to the heap until the render frame has finished executing.ReleaseStaleDescriptors
: Returns any free’d descriptors back to the descriptor heap for reuse.
DescriptorAllocatorPage Header
The declaration of the DescriptorAllocatorPage
class is slightly more elaborate than the DescriptorAllocator
class described in the previous section. The DescriptorAllocatorPage
class is not only a wrapper for a ID3D12DescriptorHeap
but also implements a free list allocator to manage the descriptors in the heap.
1 2 3 4 5 6 7 8 9 10 |
#include "DescriptorAllocation.h" #include <d3d12.h> #include <wrl.h> #include <map> #include <memory> #include <mutex> #include <queue> |
Since the DescriptorAllocatorPage::Allocate
method (shown later) returns a DescriptorAllocation
object by value, the header file for DescriptorAllocation
class needs to be included on line 37 (a forward declaration is not sufficient).
The d3d12.h
header file is required for the ID3D12DescriptorHeap
.
The wrl.h
header file included on line 41 is required for the ComPtr
template class.
The map
, memory
, mutex
, and queue
headers are required for the STL types that are used by the DescriptorAllocatorPage
class.
1 2 3 4 5 6 7 8 9 10 11 12 |
class DescriptorAllocatorPage : public std::enable_shared_from_this<DescriptorAllocatorPage> { public: DescriptorAllocatorPage( D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptors ); D3D12_DESCRIPTOR_HEAP_TYPE GetHeapType() const; /** * Check to see if this descriptor page has a contiguous block of descriptors * large enough to satisfy the request. */ bool HasSpace( uint32_t numDescriptors ) const; |
The DescriptorAllocatorPage
class publically inherits from the std::enable_shared_from_this
template class. The std::enable_shared_from_this
template class provides the shared_from_this
member function which enables the DescriptorAllocatorPage
class to retrieve a std::shared_ptr
from itself (which will be used in the DescriptorAllocatorPage::Allocate
method shown later). This requires the DescriptorAllocatorPage
class to be created from a shared pointer using either std::make_shared
or std::shared_ptr<T>( new T(...) )
. This requirement is acceptable in this case since the DescriptorAllocatorPage
class should only be used by the DescriptorAllocator
class. On line 17 of the DescriptorAllocator::CreateAllocatorPage
method shown previously, the DescriptorAllocatorPage
is created using the std::make_shared
method.
The parameterized constructor for the DescriptorAllocatorPage
class is declared on line 51. The constructor takes two arguments: the type of descriptor heap to create and the number of descriptors to allocate in the descriptor heap.
The GetHeapType
method declared on line 53 simply returns the descriptor heap type that was used to construct the DescriptorAllocatorPage
.
The HasSpace
method declared on line 59 is used to check if the DescriptorAllocatorPage
has a contiguous block of descriptors in the descriptor heap that is large enough to satisfy a request. It is often more efficient to first check if an allocation request will succeed first before making an allocation request and then checking for failure.
1 2 3 4 5 6 7 8 9 10 11 |
/** * Get the number of available handles in the heap. */ uint32_t NumFreeHandles() const; /** * Allocate a number of descriptors from this descriptor heap. * If the allocation cannot be satisfied, then a NULL descriptor * is returned. */ DescriptorAllocation Allocate( uint32_t numDescriptors ); |
The NumFreeHandles
method defined on line 64 checks how many descriptor handles the DescriptorAllocatorPage
still has available. Due to fragmentation of the free list, an allocation request of a contiguous block of descriptors that is less than the total number of free handles could still fail. For example, the fragmented free list shown in the previous image has 544 free descriptors but the largest contiguous block is only 128 descriptors wide.
The Allocate
method defined on line 71 is used to allocate a number of descriptors from the descriptor heap. If the allocation fails, this method returns a null descriptor. This method returns a DescriptorAllocation
. To check if the descriptor is valid, the DescriptorAllocation::IsNull
method is used. This method is shown later in the section about the DescriptorAllocation
class.
1 2 3 4 5 6 7 8 9 10 11 12 |
/** * Return a descriptor back to the heap. * @param frameNumber Stale descriptors are not freed directly, but put * on a stale allocations queue. Stale allocations are returned to the heap * using the DescriptorAllocatorPage::ReleaseStaleAllocations method. */ void Free( DescriptorAllocation&& descriptorHandle, uint64_t frameNumber ); /** * Returned the stale descriptors back to the descriptor heap. */ void ReleaseStaleDescriptors( uint64_t frameNumber ); |
The Free
method declared on line 79 is used to free a DescriptorAllocation
that was previously allocated using the DescriptorAllocatorPage::Allocate
method. It is not required to call this method directly since the DescriptorAllocation
class will automatically free itself back to the DescriptorAllocatorPage
it came from if it is no longer in use. This method takes the DescriptorAllocation
as an r-value reference which implies that the DescriptorAllocation
is moved into the function leaving the original DescriptorAllocation
invalid.
The ReleaseStaleDescriptors
method defined on line 84 releases the stale descriptors back to the descriptor heap for reuse. This method take the completed frame number as its only argument. All of the descriptors that were released during that frame will be returned to the heap.
The DescriptorAllocatorPage
defines a few additional methods that are internal to this class.
1 2 3 4 5 6 7 8 9 10 11 12 |
protected: // Compute the offset of the descriptor handle from the start of the heap. uint32_t ComputeOffset( D3D12_CPU_DESCRIPTOR_HANDLE handle ); // Adds a new block to the free list. void AddNewBlock( uint32_t offset, uint32_t numDescriptors ); // Free a block of descriptors. // This will also merge free blocks in the free list to form larger blocks // that can be reused. void FreeBlock( uint32_t offset, uint32_t numDescriptors ); |
The ComputeOffset
method computes the number of descriptors from the base descriptor to the specified descriptor handle. This method is used to determine where a descriptor needs to be placed back in heap when the descriptor is free’d.
The AddNewBlock
method adds a block of descriptors to the free list. This method is used to initialize the free list (with a single block containing all descriptors), when splitting a block of descriptors during allocation, and for merging neighboring blocks when descriptors are free’d.
The FreeBlock
method is used to free a block of descriptors. This method is used by the ReleaseStaleDescriptors
method to commit the stale descriptors back to the descriptor heap. The FreeBlock
method also checks if neighboring blocks in the free list can be merged. Merging free blocks in the free list reduces the fragmentation in the free list.
The DescriptorAllocatorPage
class also defines some private data members.
1 2 3 4 5 |
private: // The offset (in descriptors) within the descriptor heap. using OffsetType = uint32_t; // The number of descriptors that are available. using SizeType = uint32_t; |
In order to improve code readability and reduce ambiguity, the OffsetType
type alias is defined to refer to an offset (in descriptors) within the descriptor heap. The SizeType
type alias is defined to refer to the number of descriptors in a block (in the free list).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
struct FreeBlockInfo; // A map that lists the free blocks by the offset within the descriptor heap. using FreeListByOffset = std::map<OffsetType, FreeBlockInfo>; // A map that lists the free blocks by size. // Needs to be a multimap since multiple blocks can have the same size. using FreeListBySize = std::multimap<SizeType, FreeListByOffset::iterator>; struct FreeBlockInfo { FreeBlockInfo( SizeType size ) : Size( size ) {} SizeType Size; FreeListBySize::iterator FreeListBySizeIt; }; |
The FreeBlockInfo
struct is forward declared on line 105 and defined on line 113. The forward declaration of the FreeBlockInfo
struct is required to create the FreeListByOffset
type alias on line 107. The FreeListByOffset
type is an alias of a std::map
which maps FreeBlockInfo
to the offset of the free block within the free list.
The FreeListBySize
type is an alias of a std::multimap
that provides a mechanisim to quickly find the first block in the free list that can satisfy an allocation request. The FreeListBySize
type needs to be a std::multimap
since there can be many blocks in the free list with the same size.
The FreeBlockInfo
struct simply stores the size of the block in the free list and a reference (iterator) to its entry in the FreeListBySize
map. The FreeBlockInfo
struct stores the iterator to its entry in the FreeListBySize
map so that the entry can be quickly removed (without searching) when merging neighboring blocks in the free list.
The image above shows an example of a free list after several allocations have been made. The FreeListByOffset
data structure stores a reference to the corresponding entry in the FreeListBySize
map. Similarly, each entry in the FreeListBySize
map stores a reference by to the corresponding entry in the FreeListByOffset
map. This solution resembles a bi-directional map (Bimap in Boost) which provides optimized searching on both offset and size of each entry in the free list.
The StaleDescriptorInfo
struct is used to keep track of descriptors in the descriptor heap that have been freed but can’t be reused until the frame in which they were freed is finished executing on the GPU.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
struct StaleDescriptorInfo { StaleDescriptorInfo( OffsetType offset, SizeType size, uint64_t frame ) : Offset( offset ) , Size( size ) , FrameNumber( frame ) {} // The offset within the descriptor heap. OffsetType Offset; // The number of descriptors SizeType Size; // The frame number that the descriptor was freed. uint64_t FrameNumber; }; |
The StaleDescriptorInfo
struct tracks the offset of the first descriptor and the number of descriptors in the descriptor range. The FrameNumber
parameter stores the frame that the descriptors were freed.
1 2 3 4 5 6 7 |
// Stale descriptors are queued for release until the frame that they were freed // has completed. using StaleDescriptorQueue = std::queue<StaleDescriptorInfo>; FreeListByOffset m_FreeListByOffset; FreeListBySize m_FreeListBySize; StaleDescriptorQueue m_StaleDescriptors; |
The StaleDescriptorQueue
is a type alias for a queue of StaleDescriptorInfo
s.
The m_FreeListByOffset
, m_FreeListBySize
, and m_StaleDescriptors
member variables are the necessary data structures to track the state of the free list.
1 2 3 4 5 6 7 8 9 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> m_d3d12DescriptorHeap; D3D12_DESCRIPTOR_HEAP_TYPE m_HeapType; CD3DX12_CPU_DESCRIPTOR_HANDLE m_BaseDescriptor; uint32_t m_DescriptorHandleIncrementSize; uint32_t m_NumDescriptorsInHeap; uint32_t m_NumFreeHandles; std::mutex m_AllocationMutex; }; |
On line 147, the underlying ID3D12DescriptorHeap
interface is defined.
The m_HeapType
variable defines the type of descriptor heap used by the DescriptorAllocatorPage
class.
Since the increment size of a descriptor within a descriptor heap is vendor specific, it must be queried at runtime (see Tutorial 1 for more information on descriptor heaps). The descriptor increment size is stored in the m_DescriptorHandleIncrementSize
member variable.
The total number of descriptors in the descriptor heap is saved in the m_NumDescriptorsInHeap
member variable and the total number of remaining descriptors in the heap is stored in the m_NumFreeHandles
member variable.
The m_AllocationMutex
defined on line 154 is used to ensure safe access allocations and deallocations across multiple threads.
View the full source code for DescriptorAllocatorPage.h
DescriptorAllocatorPage Preamble
The DescriptorAllocatorPage
class requires a few additional headers in order to compile.
1 2 3 4 |
#include <DX12LibPCH.h> #include <DescriptorAllocatorPage.h> #include <Application.h> |
The DX12LibPCH.h
provides a precompiled header file for the DX12Lib
project.
The DescriptorAllocatorPage.h
header file is described in the previous section.
The Application.h
header file provides access to the Application
class. The Application
class was briefly described in Tutorial 2. The Application
class is used to get access to the ID3D12Device
object.
DescriptorAllocatorPage::DescriptorAllocatorPage
The parameratized constructor for the DescriptorAllocatorPage
class takes the heap type and the number of descriptors to allocate in the heap as arguments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
DescriptorAllocatorPage::DescriptorAllocatorPage( D3D12_DESCRIPTOR_HEAP_TYPE type, uint32_t numDescriptors ) : m_HeapType( type ) , m_NumDescriptorsInHeap( numDescriptors ) { auto device = Application::Get().GetDevice(); D3D12_DESCRIPTOR_HEAP_DESC heapDesc = {}; heapDesc.Type = m_HeapType; heapDesc.NumDescriptors = m_NumDescriptorsInHeap; ThrowIfFailed( device->CreateDescriptorHeap( &heapDesc, IID_PPV_ARGS( &m_d3d12DescriptorHeap ) ) ); m_BaseDescriptor = m_d3d12DescriptorHeap->GetCPUDescriptorHandleForHeapStart(); m_DescriptorHandleIncrementSize = device->GetDescriptorHandleIncrementSize( m_HeapType ); m_NumFreeHandles = m_NumDescriptorsInHeap; // Initialize the free lists AddNewBlock( 0, m_NumFreeHandles ); } |
On line 10, a pointer to the ID3D12Device
is retrieved from the Application
class.
Before creating the ID3D12DescriptorHeap
object, it must be described. The D3D12_DESCRIPTOR_HEAP_DESC
is used to describe the ID3D12DescriptorHeap
and has the following members [3]:
D3D12_DESCRIPTOR_HEAP_TYPE Type
: Specifies the types of descriptors in the heap.UINT NumDescriptors
: The number of descriptors in the heap.D3D12_DESCRIPTOR_HEAP_FLAGS Flags
: A combination ofD3D12_DESCRIPTOR_HEAP_FLAGS
values that are combined by using a bitwise OR operation. The following flags are currently available:D3D12_DESCRIPTOR_HEAP_FLAG_NONE
: Indicates default usage of a heap.D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE
: This flag can optionally be set on a descriptor heap to indicate it is be bound on a command list for reference by shaders. Descriptor heaps created without this flag allow applications the option to stage descriptors in CPU memory before copying them to a shader visible descriptor heap, as a convenience. But it is also fine for applications to directly create descriptors into shader visible descriptor heaps with no requirement to stage anything on the CPU.
This flag only applies to CBV, SRV, UAV and samplers. It does not apply to other descriptor heap types since shaders do not directly reference the other types.
UINT NodeMask
: For single-adapter operation, set this to zero. If there are multiple adapter nodes, set a bit to identify the node (one of the device’s physical adapters) to which the descriptor heap applies. Each bit in the mask corresponds to a single node. Only one bit must be set.
On line 16, the actual ID3D12DescriptorHeap
is created using the ID3D12Device::CreateDescriptorHeap
method.
On line 18, the m_BaseDescriptor
member variable is initialized to the first descriptor handle in the heap and on line 19 the increment size of a descriptor in the descriptor heap is queried using the ID3D12Device::GetDescriptorHandleIncrementSize
method. On line 20, the number of free handles in the DescriptorAllocatorPage
is initialized to the number of handles in the ID3D12DescriptorHeap
.
On line 23 a single block of descriptors is added to the free list using the AddNewBlock
method. The new block has an offset of 0 and a size of m_NumFreeHandles
.
DescriptorAllocatorPage::GetHeapType
The GetHeapType
method is simply a getter method that returns the heap type.
1 2 3 4 |
D3D12_DESCRIPTOR_HEAP_TYPE DescriptorAllocatorPage::GetHeapType() const { return m_HeapType; } |
DescriptorAllocatorPage::NumFreeHandles
The NumFreeHandles
method is simply a getter method that returns the number of free handles that are currently available in the heap.
1 2 3 4 |
uint32_t DescriptorAllocatorPage::NumFreeHandles() const { return m_NumFreeHandles; } |
DescriptorAllocatorPage::HasSpace
The HasSpace
method is used to check if the DescriptorAllocatorPage
has a free block of descriptors that is large enough to satisfy a request for a particular number of descriptors.
1 2 3 4 |
bool DescriptorAllocatorPage::HasSpace( uint32_t numDescriptors ) const { return m_FreeListBySize.lower_bound(numDescriptors) != m_FreeListBySize.end(); } |
The std::map::lower_bound
method is used to find the first entry in the free list that is not less than (in other words: greater than or equal to) the requested number of descriptors. If no such element exists that is not less than numDescriptors
, then the past-the-end iterator is returned which indicates that the free list cannot satisfy the requested number of descriptors. If the DescriptorAllocatorPage
is not able to satisfy the request, then the DescriptorAllocator
will create a new page (as was shown previously in the DescriptorAllocator::Allocate
method).
DescriptorAllocatorPage::AddNewBlock
The AddNewBlock
method adds a block to the free list. The block is added to both the FreeListByOffset
map and the FreeListBySize
map. Both lists are linked to create the bi-directional map for optimized lookups.
1 2 3 4 5 6 |
void DescriptorAllocatorPage::AddNewBlock( uint32_t offset, uint32_t numDescriptors ) { auto offsetIt = m_FreeListByOffset.emplace( offset, numDescriptors ); auto sizeIt = m_FreeListBySize.emplace( numDescriptors, offsetIt.first ); offsetIt.first->second.FreeListBySizeIt = sizeIt; } |
On line 43, the std::map::emplace
method is used to emplace an element into the m_FreeListByOffset
map. This method returns a std::pair
where the first element is an iterator to the inserted element. The iterator to the inserted element is used to add an entry to the m_FreeListBySize
multimap
on line 44.
On line 45, the FreeBlockInfo
‘s FreeListBySizeIt
member variable needs to be patched to point to the corresponding iterator in the m_FreeListBySize
multimap
.
DescriptorAllocatorPage::Allocate
The Allocate
method is used to allocate descriptors from the free list. When a block of descriptors is allocated from the free list, it is possible that the existing free block needs to be split and the remaining descriptors are “returned” to the free list. For example, if only a single descriptor is requested by the caller and the free list has a free block of 100 descriptors, then the free block of 100 descriptors is removed from the heap, 1 descriptor allocated from that block, and a free block of 99 descriptors is added back to the free list.
1 2 3 |
DescriptorAllocation DescriptorAllocatorPage::Allocate( uint32_t numDescriptors ) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); |
In order to prevent any race conditions that may occur by multiple threads making allocations on the same DescriptorAllocatorPage
, the m_AllocationMutex
is locked line 50.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// There are less than the requested number of descriptors left in the heap. // Return a NULL descriptor and try another heap. if ( numDescriptors > m_NumFreeHandles ) { return DescriptorAllocation(); } // Get the first block that is large enough to satisfy the request. auto smallestBlockIt = m_FreeListBySize.lower_bound( numDescriptors ); if ( smallestBlockIt == m_FreeListBySize.end() ) { // There was no free block that could satisfy the request. return DescriptorAllocation(); } |
On lines 54 and 61 the free list is checked to make sure that there are enough free descriptor handles to satisfy the request. If there are not enough descriptor handles, a default (null) DescriptorAllocation
is returned to the calling function. If these checks pass, then smallestBlockIt
contains an iterator to the first entry in the FreeListBySize
multimap
that is not less than the requested number of descriptors.
1 2 3 4 5 6 7 8 |
// The size of the smallest block that satisfies the request. auto blockSize = smallestBlockIt->first; // The pointer to the same entry in the FreeListByOffset map. auto offsetIt = smallestBlockIt->second; // The offset in the descriptor heap. auto offset = offsetIt->first; |
The smallestBlockIt
is used to retrieve the size of the free block and get the iterator to the corresponding entry in the FreeListByOffset
map in \(\mathcal{O}(1)\) constant time (which is better than \(\mathcal{O}(\log_2{n})\) logarithmic time complexity of the std::map::find
method).
The free block that was found needs to be removed from the free list and a new block that results from splitting the free block needs to be added back to the free list.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Remove the existing free block from the free list. m_FreeListBySize.erase( smallestBlockIt ); m_FreeListByOffset.erase( offsetIt ); // Compute the new free block that results from splitting this block. auto newOffset = offset + numDescriptors; auto newSize = blockSize - numDescriptors; if ( newSize > 0 ) { // If the allocation didn't exactly match the requested size, // return the left-over to the free list. AddNewBlock( newOffset, newSize ); } |
On lines 77-78 the free block that was found is removed from the free list.
On lines 81-82 the size and offset of the the new free block that resulted from splitting the current block is computed and if the size is not 0, the new block is added to the free list using the AddNewBlock
method on line 88.
1 2 3 4 5 6 7 |
// Decrement free handles. m_NumFreeHandles -= numDescriptors; return DescriptorAllocation( CD3DX12_CPU_DESCRIPTOR_HANDLE( m_BaseDescriptor, offset, m_DescriptorHandleIncrementSize ), numDescriptors, m_DescriptorHandleIncrementSize, shared_from_this() ); } |
The total number of free handles is decremented by the number of requested descriptors on line 92 and the resulting DescriptorAllocation
is returned to the calling function on line 94.
DescriptorAllocatorPage::ComputeOffset
The ComputeOffset
method is used to compute the offset (in descriptor handles) from the base descriptor (first descriptor in the descriptor heap) to a given descriptor.
1 2 3 4 |
uint32_t DescriptorAllocatorPage::ComputeOffset( D3D12_CPU_DESCRIPTOR_HANDLE handle ) { return static_cast<uint32_t>( handle.ptr - m_BaseDescriptor.ptr ) / m_DescriptorHandleIncrementSize; } |
The ComputeOffset
method is used by the Free
method (shown next) in order to compute the offset of a descriptor in the descriptor heap. Since a D3D12_CPU_DESCRIPTOR_HANDLE
is just a structure that contains a single SIZE_T
member variable, computing the offset of a descriptor in a descriptor heap is a matter of simple arithmetic.
DescriptorAllocatorPage::Free
The Free
method returns a block of descriptors back to the free list. Descriptors are not immediately returned to the free list but instead are added to a queue of stale descriptors. Descriptors are only returned to the free list once the frame they were freed in is finished executing on the GPU. This ensures that descriptors are not reused until they are no longer being referenced by a GPU command.
1 2 3 4 5 6 7 8 9 10 |
void DescriptorAllocatorPage::Free( DescriptorAllocation&& descriptor, uint64_t frameNumber ) { // Compute the offset of the descriptor within the descriptor heap. auto offset = ComputeOffset( descriptor.GetDescriptorHandle() ); std::lock_guard<std::mutex> lock( m_AllocationMutex ); // Don't add the block directly to the free list until the frame has completed. m_StaleDescriptors.emplace( offset, descriptor.GetNumHandles(), frameNumber ); } |
The DescriptorAllocation
doesn’t store the offset of the descriptor within the descriptor heap but the offset can be computed using the ComputeOffset
method.
In order to guarantee the m_StaleDescriptors
queue is only modified on a single thread at a time, the m_AllocationMutex
mutex
is locked on line 109 and the StaleDescriptorInfo
is added to the m_StaleDescriptors
queue on line 112.
DescriptorAllocatorPage::FreeBlock
The FreeBlock
method is executed when the stale descriptors are added back to the free list. When adding a block back to the free list, neighboring blocks should be merged to minimize fragmentation of the free list. Two cases need to be considered when adding a block back to the free list:
- Case 1: There is a block in the free list that is immediately preceding the block being freed.
- Case 2: There is a block in the free list that is immediately following the block being freed.
- Case 3: There is both a block in the free list immediately preceding and immediately following the block being freed.
- Case 4: There is neither a block in the free list immediately preceding nor immediately following the block being freed.
If Case 1 is true then the previous block in the free list needs to be merged with the block being freed. If Case 2 is true then the next block in the free list needs to be merged with the block being freed.
The above image shows the two cases that can occur when returning a block back to the free list. Case 3 and Case 4 do not need to be handled in any special way since those cases are already handled implicitly.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
void DescriptorAllocatorPage::FreeBlock( uint32_t offset, uint32_t numDescriptors ) { // Find the first element whose offset is greater than the specified offset. // This is the block that should appear after the block that is being freed. auto nextBlockIt = m_FreeListByOffset.upper_bound( offset ); // Find the block that appears before the block being freed. auto prevBlockIt = nextBlockIt; // If it's not the first block in the list. if ( prevBlockIt != m_FreeListByOffset.begin() ) { // Go to the previous block in the list. --prevBlockIt; } else { // Otherwise, just set it to the end of the list to indicate that no // block comes before the one being freed. prevBlockIt = m_FreeListByOffset.end(); } // Add the number of free handles back to the heap. // This needs to be done before merging any blocks since merging // blocks modifies the numDescriptors variable. m_NumFreeHandles += numDescriptors; |
On line 119, the block that comes after the block being freed is queried from the FreeListByOffset
map using the std::map::upper_bound
method. The upper_bound
method returns the first element whos key is strictly greater than the specified key. If no such element exists, this method returns the past-the-end (end
) iterator.
The previous block in the free list (prevBlockIt
) is the one that appears just before the block being freed. The previous block is initialized on line 122 to be the same as the next block (nextBlockIt
) and if it is not the first element in the free list, then it is decremented on line 127 to point to the previous element. If the free list is completely empty (Case 4), then the nextBlockIt
, prevBlockIt
, and begin
iterator will all point to the past-the-end (end
) iterator.
If there is only a single item in the free list then it either comes before or after the element being freed. If it comes after the block being freed, then nextBlockIt
will point to that element and prevBlockIt
will be set to the end
iterator on line 133. If it comes before the block being freed then nextBlockIt
will point to the end
iterator and the prevBlockIt
will point to that element after being decremented on line 127.
The number of free handles is incremented by the number of handles being freed on line 139.
First Case 1 is checked (the previous block is immediately preceding the block being freed).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
if ( prevBlockIt != m_FreeListByOffset.end() && offset == prevBlockIt->first + prevBlockIt->second.Size ) { // The previous block is exactly behind the block that is to be freed. // // PrevBlock.Offset Offset // | | // |<-----PrevBlock.Size----->|<------Size-------->| // // Increase the block size by the size of merging with the previous block. offset = prevBlockIt->first; numDescriptors += prevBlockIt->second.Size; // Remove the previous block from the free list. m_FreeListBySize.erase( prevBlockIt->second.FreeListBySizeIt ); m_FreeListByOffset.erase( prevBlockIt ); } |
If there is a block immediately preceding the block being freed then that block is merged with the block being freed.
Case 2 is checked next (the next block in the free list is following the block being freed).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
if ( nextBlockIt != m_FreeListByOffset.end() && offset + numDescriptors == nextBlockIt->first ) { // The next block is exactly in front of the block that is to be freed. // // Offset NextBlock.Offset // | | // |<------Size-------->|<-----NextBlock.Size----->| // Increase the block size by the size of merging with the next block. numDescriptors += nextBlockIt->second.Size; // Remove the next block from the free list. m_FreeListBySize.erase( nextBlockIt->second.FreeListBySizeIt ); m_FreeListByOffset.erase( nextBlockIt ); } |
Again, the block immediately following the block being freed is merged with the block being freed.
Case 3 and Case 4 do not need to be handled explicitly since they are being implicitly handled.
The final step is to add the new (merged) block back into the free list.
1 2 3 |
// Add the freed block to the free list. AddNewBlock( offset, numDescriptors ); } |
On line 178 the new block is added back into the free list using the AddNewBlock
method.
DescriptorAllocatorPage::ReleaseStaleDescriptors
Stale descriptors are returned to the free list using the ReleaseStaleDescriptors
method when the frame that they were freed in is finished executing on the GPU.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
void DescriptorAllocatorPage::ReleaseStaleDescriptors( uint64_t frameNumber ) { std::lock_guard<std::mutex> lock( m_AllocationMutex ); while ( !m_StaleDescriptors.empty() && m_StaleDescriptors.front().FrameNumber <= frameNumber ) { auto& staleDescriptor = m_StaleDescriptors.front(); // The offset of the descriptor in the heap. auto offset = staleDescriptor.Offset; // The number of descriptors that were allocated. auto numDescriptors = staleDescriptor.Size; FreeBlock( offset, numDescriptors ); m_StaleDescriptors.pop(); } } |
To ensure the m_StaleDescriptors
queue
is not being modified on any other thread, the m_AllocationMutex
mutex
is locked on line 181.
On lines 183-195, the m_StaleDescriptors
queue
is checked for any entries. If there is an entry for which the frame number is less than (or equal to) the completed frame number, its entry is popped off the queue and the block is returned back to the free list using the FreeBlock
method described in the previous section.
The final class in the triad of classes that constitute the descriptor allocation scheme used by the DX12Lib project is the DescriptorAllocation
class and is the subject of the next section.
View the full source code for DescriptorAllocatorPage.cpp
DescriptorAllocation Class
The DescriptorAllocation
class is used by the DescriptorAllocator
to represent a single allocation of contiguous descriptors in a descriptor heap. The DescriptorAllocation
class is a move-only self-freeing type that is used as a wrapper for a D3D12_CPU_DESCRIPTOR_HANDLE
. The reason why the DescriptorAllocation
must be a move-only class is to ensure there is only a single instance of a particular allocation. This guarantees that if the descriptor is destroyed or replaced, the original descriptor will be returned back to the descriptor heap (from) whence it came.
The DescriptorAllocation
class provides the following (public) method:
IsNull
: Check to see if theDescriptorAllocation
contains a valid descriptor handle.GetDescriptorHandle
: Get the descriptor handle to the underlyingD3D12_CPU_DESCRIPTOR_HANDLE
GetNumHandles
: Gets the number of consecutive descriptors in theDescriptorAllocation
.
DescriptorAllocation Header
The header file is used to declare the DescriptorAllocation
class. Additional header files that are necessary to compile the DescriptorAllocation
are shown first.
1 2 3 4 5 6 |
#include <d3d12.h> #include <cstdint> #include <memory> class DescriptorAllocatorPage; |
The d3d12.h
header is necessary for the D3D12_CPU_DESCRIPTOR_HANDLE
type.
The cstdint
header file is included to provide the uint32_t
type.
The memory
header file is included to provide access to the std::shared_ptr
type.
The DescriptorAllocatorPage
is forward declared on line 42 to avoid including the header file for that class. The DescriptorAllocatorPage
is used as a template argument for a std::shared_ptr
which doesn’t require a complete type.
1 2 3 4 5 6 7 8 9 10 |
class DescriptorAllocation { public: // Creates a NULL descriptor. DescriptorAllocation(); DescriptorAllocation( D3D12_CPU_DESCRIPTOR_HANDLE descriptor, uint32_t numHandles, uint32_t descriptorSize, std::shared_ptr<DescriptorAllocatorPage> page ); // The destructor will automatically free the allocation. ~DescriptorAllocation(); |
The DescriptorAllocation
class provides a default constructor which initializes the descriptor as a null descriptor.
The parameterized constructor declared on line 50 is used by the DescriptorAllocatorPage::Allocate
method to construct a valid DescriptorAllocation
.
The destructor declared on line 53 is necessary to ensure the allocation is returned to the DescriptorAllocatorPage
that it came from.
1 2 3 4 5 6 7 |
// Copies are not allowed. DescriptorAllocation( const DescriptorAllocation& ) = delete; DescriptorAllocation& operator=( const DescriptorAllocation& ) = delete; // Move is allowed. DescriptorAllocation( DescriptorAllocation&& allocation ); DescriptorAllocation& operator=( DescriptorAllocation&& other ); |
It is not allowed to make copies of the DescriptorAllocation
to prevent any accidental copies, the copy constructor and copy assignment operator are deleted from the class to prevent the compiler from auto generating them.
Moving the DescriptorAllocation
to another DescriptorAllocation
is allowed (and in fact, required). Both the move constructor and the move assignment operator are declared on lines 60 and 61.
1 2 |
// Check if this a valid descriptor. bool IsNull() const; |
The IsNull
method is used to check if the DescriptorAllocation
contains a valid descriptor.
1 2 |
// Get a descriptor at a particular offset in the allocation. D3D12_CPU_DESCRIPTOR_HANDLE GetDescriptorHandle( uint32_t offset = 0 ) const; |
The DescriptorAllocation
can contain a block of consecutive descriptors in a descriptor heap. The GetDescriptorHandle
method is used to get the underlying D3D12_CPU_DESCRIPTOR_HANDLE
at a particular offset within the contigious block of descriptors.
1 2 |
// Get the number of (consecutive) handles for this allocation. uint32_t GetNumHandles() const; |
The GetNumHandles
is used to get the number of consecutive descriptor handles that are contained in the DescriptorAllocation
.
1 2 3 |
// Get the heap that this allocation came from. // (For internal use only). std::shared_ptr<DescriptorAllocatorPage> GetDescriptorAllocatorPage() const; |
The GetDescriptorAllocatorPage
method is used to query the DescriptorAllocatorPage
where the DescriptorAllocation
came from.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
private: // Free the descriptor back to the heap it came from. void Free(); // The base descriptor. D3D12_CPU_DESCRIPTOR_HANDLE m_Descriptor; // The number of descriptors in this allocation. uint32_t m_NumHandles; // The offset to the next descriptor. uint32_t m_DescriptorSize; // A pointer back to the original page where this allocation came from. std::shared_ptr<DescriptorAllocatorPage> m_Page; }; |
The Free
method is used by the DescriptorAllocation
class to return itself back to the DescriptorAllocatorPage
it came from. This method is used if the DescriptorAllocation
is destructed or when another DescriptorAllocation
is being (move) assigned to it.
The m_Descriptor
member variable is the handle to the first D3D12_CPU_DESCRIPTOR_HANDLE
in the allocation.
The m_NumHandles
member variable stores the total number of descriptors in the DescriptorAllocation
.
The m_DescriptorSize
member variable stores the increment size for each descriptor. This is used to compute the offset of a particular descriptor within the allocation.
The m_Page
member variable stores a std::shared_ptr
back to the DescriptorAllocatorPage
that the DescriptorAllocation
came from.
View the full source code for DescriptorAllocation.h
DescriptorAllocation Preamble
The implementation of the DescriptorAllocation
class is fairly simple as it acts as a wrapper class for the underlying D3D12_CPU_DESCRIPTOR_HANDLE
and provides a few accessor methods that describe the allocation.
1 2 3 4 5 6 |
#include <DX12LibPCH.h> #include <DescriptorAllocation.h> #include <Application.h> #include <DescriptorAllocatorPage.h> |
The DX12LibPCH.h
header file provides the precompiled header file for the DX12Lib
project and must be the first include that appears in the implementation file.
The DescriptorAllocation.h
header is included next and provides the declaration of the DescriptorAllocation
class that was shown in the previous section.
The Application.h
header provides the declaration of the Application
class. When freeing a DescriptorAllocation
it is necessary to provide the current frame of execution which is provided by the Application
class.
The DescriptorAllocatorPage.h
header file is necessary to be able to call the DescriptorAllocatorPage::Free
method when freeing the DescriptorAllocation
.
DescriptorAllocation Default Constructor
The default constructor for the DescriptorAllocation
class simply initializes it as a null descriptor.
1 2 3 4 5 6 |
DescriptorAllocation::DescriptorAllocation() : m_Descriptor{ 0 } , m_NumHandles( 0 ) , m_DescriptorSize( 0 ) , m_Page( nullptr ) {} |
DescriptorAllocation Parameratized Constructor
The parameterized constructor for the DescriptorAllocation
class initializes it as a valid descriptor (assuming the parameters are valid).
1 2 3 4 5 6 |
DescriptorAllocation::DescriptorAllocation( D3D12_CPU_DESCRIPTOR_HANDLE descriptor, uint32_t numHandles, uint32_t descriptorSize, std::shared_ptr<DescriptorAllocatorPage> page ) : m_Descriptor( descriptor ) , m_NumHandles( numHandles ) , m_DescriptorSize( descriptorSize ) , m_Page( page ) {} |
The member variables being initialized here are described in the DescriptorAllocation Header section and shouldn’t require additional explanation.
DescriptorAllocation Destructor
The destructor for the DescriptorAllocation
class must ensure that the descriptor is freed back to the DescriptorAllocatorPage
it came from by calling the Free
method.
1 2 3 4 |
DescriptorAllocation::~DescriptorAllocation() { Free(); } |
DescriptorAllocation Move Constructor
The move constructor allows the DescriptorAllocation
to be moved. The original DescriptorAllocation
must be made invalid but the allocation should not be freed.
1 2 3 4 5 6 7 8 9 10 |
DescriptorAllocation::DescriptorAllocation( DescriptorAllocation&& allocation ) : m_Descriptor(allocation.m_Descriptor) , m_NumHandles(allocation.m_NumHandles) , m_DescriptorSize(allocation.m_DescriptorSize) , m_Page(std::move(allocation.m_Page)) { allocation.m_Descriptor.ptr = 0; allocation.m_NumHandles = 0; allocation.m_DescriptorSize = 0; } |
DescriptorAllocation Move Assignment
The move assignment operator behaves similar to the move constructor except the original descriptor must be freed using the Free
method before moving another descriptor into the current one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
DescriptorAllocation& DescriptorAllocation::operator=( DescriptorAllocation&& other ) { // Free this descriptor if it points to anything. Free(); m_Descriptor = other.m_Descriptor; m_NumHandles = other.m_NumHandles; m_DescriptorSize = other.m_DescriptorSize; m_Page = std::move( other.m_Page ); other.m_Descriptor.ptr = 0; other.m_NumHandles = 0; other.m_DescriptorSize = 0; return *this; } |
DescriptorAllocation::Free
If the DescriptorAllocation
either goes out of scope or is replaced by another descriptor, it must be freed. The Free
method is used to return the DescriptorAllocation
back to the DescriptorAllocatorPage
it came from.
1 2 3 4 5 6 7 8 9 10 11 12 |
void DescriptorAllocation::Free() { if ( !IsNull() && m_Page ) { m_Page->Free( std::move( *this ), Application::GetFrameCount() ); m_Descriptor.ptr = 0; m_NumHandles = 0; m_DescriptorSize = 0; m_Page.reset(); } } |
If the DescriptorAllocation
is valid (not null) then it is returned back to the DescriptorAllocatorPage
it came from using the DescriptorAllocatorPage::Free
method.
DescriptorAllocation::IsNull
The IsNull
method check to see if the underlying D3D12_CPU_DESCRIPTOR_HANDLE
is valid.
1 2 3 4 5 |
// Check if this a valid descriptor. bool DescriptorAllocation::IsNull() const { return m_Descriptor.ptr == 0; } |
DescriptorAllocation::GetDescriptorHandle
The GetDescriptorHandle
method returns a D3D12_CPU_DESCRIPTOR_HANDLE
for the descriptor at a particular offset within the DescriptorAllocation
.
1 2 3 4 5 6 |
// Get a descriptor at a particular offset in the allocation. D3D12_CPU_DESCRIPTOR_HANDLE DescriptorAllocation::GetDescriptorHandle( uint32_t offset ) const { assert( offset < m_NumHandles ); return { m_Descriptor.ptr + ( m_DescriptorSize * offset ) }; } |
DescriptorAllocation::GetNumHandles
The GetNumHandles
method returns the number of descriptor handles in the DescriptorAllocation
.
1 2 3 4 |
uint32_t DescriptorAllocation::GetNumHandles() const { return m_NumHandles; } |
DescriptorAllocation::GetDescriptorAllocatorPage
The GetDescriptorAllocatorPage
method returns the std::shared_ptr
to the DescriptorAllocatorPage
where the DescriptorAllocation
originated from.
1 2 3 4 |
std::shared_ptr<DescriptorAllocatorPage> DescriptorAllocation::GetDescriptorAllocatorPage() const { return m_Page; } |
This concludes the description of the classes that are used to implement the descriptor allocation strategy used by the DX12Lib project. The DescriptorAllocator
class provides a simple interface for allocating and freeing descriptors using a free list memory management scheme. The DescriptorAllocatorPage
class is used internally to manage allocations and the DescriptorAllocation
class is used to represent a single allocation from the descriptor heap.
The DynamicDescriptorHeap
class provides a flexible solution for ensuring the CPU visible descriptors are copied to the correct location in a GPU visible descriptor heap for rendering on the GPU. The DynamicDescriptorHeap
class is the subject of the next section.
View the full source code for DescriptorAllocation.cpp
Dynamic Descriptor Heap
The purpose of the DynamicDescriptorHeap
class is to allocate GPU visible descriptors that are used for binding CBV, SRV, UAV, and Samplers to the GPU pipeline for rendering or compute invocations. This is necessary since the descriptors provided by the DescriptorAllocator
class shown in the previous section are CPU visible and cannot be used to bind resources to the GPU rendering pipeline. The DynamicDescriptorHeap
class provides a staging area for CPU visible descriptors that are committed to GPU visible descriptor heaps when a Draw
or Dispatch
method is invoked on the command list.
Since only a single CBV_SRV_UAV
descriptor heap and a single SAMPLER
descriptor heap can be bound to the command list at the same time, the DynamicDescriptorHeap
class also ensures that the currently bound descriptor heap has a sufficient number of descriptors to commit all of the staged descriptors before a Draw
or Dispatch
command is executed. If the currently bound descriptor heap runs out of descriptors, then a new descriptor heap is bound to the command list.
DynamicDescriptorHeap
class shown in this article is designed to provide functionality similar to that of DirectX 11 where dynamic descriptor indexing wasn’t supported.The DynamicDescriptorHeap
class caches staged descriptors in a descriptor cache that is configured to match the layout of the root signature. For example, if the root signature has the following layout:
Index | Type | Range Type | Num Desriptors |
---|---|---|---|
0 | CBV | – | – |
1 | DESCRIPTOR_TABLE | SRV | 6 |
2 | DESCRIPTOR_TABLE | CBV | 3 |
3 | DESCRIPTOR_TABLE | UAV | 3 |
4 | DESCRIPTOR_TABLE | SAMPLER | 4 |
Then the descriptor table cache for the CBV_SRV_UAV
dynamic descriptor heap would look like this:
There are a few interesting things to note in the image above. The first entry (root index 0) in the descriptor table cache is empty because the root signature contains an inline Constant Buffer View (CBV). Since an inline CBV does not require a descriptor, there is no reason to allocate any space for it in the descriptor handle cache.
The second entry in the descriptor table cache has six SRV descriptors and a pointer to the first entry in the descriptor handle cache. Similarly, the third and fourth entries in the descriptor table cache each have three descriptors and a pointer to their corresponding entry in the descriptor handle cache.
The fourth entry in the descriptor table cache is empty despite the fact that the root signature layout has a descriptor table that contains four SAMPLER
s. Since CBV_SRV_UAV
descriptors and SAMPLER
descriptors cannot be stored in the same descriptor heap, there is a seperate DynamicDescriptorHeap
for each CBV_SRV_UAV
and SAMPLER
descriptor types.
DynamicDescriptorHeap Class
The design of the DynamicDescriptorHeap
class is heavily based on the DynamicDescriptorHeap implementation from Microsoft’s DirectX Samples on GitHub [1].
The DynamicDescriptorHeap
class provides the following functionality:
- Stage Descriptors: Stage CPU visible descriptors to the descriptor table cache.
- Commit Staged Descriptors: Commit the staged descriptors to a GPU visible descriptor heap.
- Copy a Descriptor: Directly copy a CPU visible descriptor to a GPU visible descriptor heap. This is useful for the
ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat
and theID3D12GraphicsCommandList::ClearUnorderedAccessViewUint
methods.
DynamicDescriptorHeap Header
In this section the declaration of the DynamicDescriptorHeap
class is described. The DynamicDescriptorHeap
class provides methods for staging CPU visible descriptors and committing those descriptors to a GPU visible descriptor heap before a Draw
or Dispatch
command is executed. The DynamicDescriptorHeap
class also provides a method to copy a single CPU visible descriptor to a GPU visible descriptor heap. Copying of single descriptors is required for the ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat
and the ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint
methods. These methods require both a CPU and a GPU visible descriptor for the resource to be cleared.
A method to parse the root signature and configure the descriptor table cache is also provided. The DX12Lib project provides a RootSignature
class for the purpose of determining the layout of the root signature but this class is not described here. The RootSignature
class is a wrapper for a ID3D12RootSignature
. For more information on the RootSignature
class, refer to the GitHub repository (RootSignature.h, and RootSignature.cpp).
1 2 3 4 5 6 7 8 9 10 |
#include "d3dx12.h" #include <wrl.h> #include <cstdint> #include <memory> #include <queue> class CommandList; class RootSignature; |
The d3dx12.h
header file provides some helper types for working with DirectX 12. The d3dx12.h
header file also includes the d3d12.h
file so it does not need to be included directly.
The wrl.h
header file includes the ComPtr
template class.
The cstdint
header provides access to the standard integer types (such as uint32_t
). The memory
header file is required for the std::unique_ptr
and the queue
header file is required for the std::queue
container class.
The CommandList
and RootSignature
classes are forward declared on lines 9 and 10. The header files are only required in the implementation file for the DynamicDescriptorHeap
class.
1 2 3 4 5 6 7 8 |
class DynamicDescriptorHeap { public: DynamicDescriptorHeap( D3D12_DESCRIPTOR_HEAP_TYPE heapType, uint32_t numDescriptorsPerHeap = 1024); virtual ~DynamicDescriptorHeap(); |
The DynamicDescriptorHeap
class has a single constructor which takes a D3D12_DESCRIPTOR_HEAP_TYPE
argument and the number of descriptors to allocate per heap.
On line 55, the destructor for the DynamicDescriptorHeap
class is declared.
1 2 3 4 5 6 |
/** * Stages a contiguous range of CPU visible descriptors. * Descriptors are not copied to the GPU visible descriptor heap until * the CommitStagedDescriptors function is called. */ void StageDescriptors(uint32_t rootParameterIndex, uint32_t offset, uint32_t numDescriptors, const D3D12_CPU_DESCRIPTOR_HANDLE srcDescriptors); |
CPU visible descriptors are staged to the DynamicDescriptorHeap
using the StageDescriptors
method. This method has the following arguments:
uint32_t rootParameterIndex
: The index of root parameter to copy the descriptors to. This must be configured as aDESCRIPTOR_TABLE
in the currently bound root signature.uint32_t offset
: The offset within the descriptor table to copy the descriptors to. This value can span descriptor ranges within the table butoffset
+numDescriptors
must not exceed the total number of descriptors in the descriptor table.uint32_t numDescriptors
: The number of contiguous descriptors to copy starting fromsrcDescriptors
.const D3D12_CPU_DESCRIPTOR_HANDLE srcDescriptors
: The base descriptor to start copying descriptors from.
The StageDescriptors
method is used to copy any number of contiguous CPU visible descriptors to the DynamicDescriptorHeap
. Using this method, only the descriptor handles are copied to the DynamicDescriptorHeap
but not the contents of the descriptor. For this reason, the CPU visible descriptors cannot be reused or overwritten (using ID3D12Device::CreateShaderResourceView
for example) until the CommitStagedDescriptors
method is invoked.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
/** * Copy all of the staged descriptors to the GPU visible descriptor heap and * bind the descriptor heap and the descriptor tables to the command list. * The passed-in function object is used to set the GPU visible descriptors * on the command list. Two possible functions are: * * Before a draw : ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable * * Before a dispatch: ID3D12GraphicsCommandList::SetComputeRootDescriptorTable * * Since the DynamicDescriptorHeap can't know which function will be used, it must * be passed as an argument to the function. */ void CommitStagedDescriptors( CommandList& commandList, std::function<void(ID3D12GraphicsCommandList*, UINT, D3D12_GPU_DESCRIPTOR_HANDLE)> setFunc ); void CommitStagedDescriptorsForDraw(CommandList& commandList); void CommitStagedDescriptorsForDispatch(CommandList& commandList); |
The CommitStagedDescriptors
family of methods is used to commit any staged descriptors to the GPU visible descriptor heaps. The CommitStagedDescriptorsForDraw
uses the ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable
method to bind the descriptors to the graphics pipeline while the CommitStagedDescriptorsForDispatch
method uses the ID3D12GraphicsCommandList::SetComputeRootDescriptorTable
method to bind the descriptors to the compute pipeline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
/** * Copies a single CPU visible descriptor to a GPU visible descriptor heap. * This is useful for the * * ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat * * ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint * methods which require both a CPU and GPU visible descriptors for a UAV * resource. * * @param commandList The command list is required in case the GPU visible * descriptor heap needs to be updated on the command list. * @param cpuDescriptor The CPU descriptor to copy into a GPU visible * descriptor heap. * * @return The GPU visible descriptor. */ D3D12_GPU_DESCRIPTOR_HANDLE CopyDescriptor( CommandList& comandList, D3D12_CPU_DESCRIPTOR_HANDLE cpuDescriptor); |
When clearing a UAV resources using either the ID3D12GraphicsCommandList::ClearUnorderedAccessViewFloat
or the ID3D12GraphicsCommandList::ClearUnorderedAccessViewUint
method, both a CPU and a GPU visible descriptor are required. The CopyDescriptor
method is used to copy a single CPU visible descriptor into a GPU visible descriptor heap. This method accepts a CommandList
as its only argument in case the currently bound descriptor heap needs to be updated on the command list as a result of copying the descriptor.
1 2 3 4 5 6 |
/** * Parse the root signature to determine which root parameters contain * descriptor tables and determine the number of descriptors needed for * each table. */ void ParseRootSignature( const RootSignature& rootSignature); |
Using the ParseRootSignature
method, the the DynamicDescriptorHeap
is informed of any changes to the currently bound root signature on the command list. This method updates the layout of the descriptors in the descriptor cache to match the descriptor layout in the root signature (as described in the introduction to this section).
1 2 3 4 5 6 |
/** * Reset used descriptors. This should only be done if any descriptors * that are being referenced by a command list has finished executing on the * command queue. */ void Reset(); |
The Reset
method is used to reset the allocated descriptor heaps and descriptor cache. This should only be done when the command queue is finished processing any commands that are referencing any descriptors in the DynamicDescriptorHeap
.
1 2 3 4 5 |
private: // Request a descriptor heap if one is available. Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> RequestDescriptorHeap(); // Create a new descriptor heap of no descriptor heap is available. Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> CreateDescriptorHeap(); |
The RequestDescriptorHeap
method is used to get an available descriptor heap. If there are no available descriptor heaps, then a new descriptor heap is created using the CreateDescriptorHeap
method.
1 2 3 |
// Compute the number of stale descriptors that need to be copied // to GPU visible descriptor heap. uint32_t ComputeStaleDescriptorCount() const; |
The ComputeStaleDescriptorCount
method returns the number of CPU visible descriptors that need to be copied to the GPU visible descriptor heap.
1 2 3 4 5 6 |
/** * The maximum number of descriptor tables per root signature. * A 32-bit mask is used to keep track of the root parameter indices that * are descriptor tables. */ static const uint32_t MaxDescriptorTables = 32; |
The MaxDescriptorTables
constant represents the maximum number of descriptor tables that can exist in the root signature. The limit of 32 descriptor tables was chosen since a 32-bit bitmask is used to indicate which entries of the root signature uses a descriptor table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
/** * A structure that represents a descriptor table entry in the root signature. */ struct DescriptorTableCache { DescriptorTableCache() : NumDescriptors(0) , BaseDescriptor(nullptr) {} // Reset the table cache. void Reset() { NumDescriptors = 0; BaseDescriptor = nullptr; } // The number of descriptors in this descriptor table. uint32_t NumDescriptors; // The pointer to the descriptor in the descriptor handle cache. D3D12_CPU_DESCRIPTOR_HANDLE* BaseDescriptor; }; |
The DescriptorTableCache
struct represents a single entry in the DescriptorTableCache
array. Each entry in the descriptor cache stores the number of descriptors in the descriptor table and a pointer to the descriptor handle in the descriptor handle cache. By default, each entry in the descriptor table cache is empty (0 descriptors and a null pointer) which indicates that that entry in the currently bound root signature does not use a descriptor table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Describes the type of descriptors that can be staged using this // dynamic descriptor heap. // Valid values are: // * D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV // * D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER // This parameter also determines the type of GPU visible descriptor heap to // create. D3D12_DESCRIPTOR_HEAP_TYPE m_DescriptorHeapType; // The number of descriptors to allocate in new GPU visible descriptor heaps. uint32_t m_NumDescriptorsPerHeap; // The increment size of a descriptor. uint32_t m_DescriptorHandleIncrementSize; |
The m_DescriptorHeapType
member variable stores the type of descriptor heap the DynamicDescriptorHeap
uses. This can be either D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV
or D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER
.
The m_NumDescriptorsPerHeap
variable indicates how many descriptors to allocate for each descriptor heap.
The m_DescriptorHandleIncrementSize
variable indicates the offset between descriptors in the descriptor heap. Since the increment size of a descriptor within a descriptor heap is vendor specific, it must be queried at runtime.
1 2 3 4 5 |
// The descriptor handle cache. std::unique_ptr<D3D12_CPU_DESCRIPTOR_HANDLE[]> m_DescriptorHandleCache; // Descriptor handle cache per descriptor table. DescriptorTableCache m_DescriptorTableCache[MaxDescriptorTables]; |
The m_DescriptorHandleCache
variable is an array of D3D12_CPU_DESCRIPTOR_HANDLE
s. The number of descriptors that can be cached is determined by the numDescriptors
argument passed to the paramertized constructor of the DynamicDescriptorHeap
class.
The m_DescriptorTableCache
variable is an array of DescriptorTableCache
structs. This array is statically sized to the maximum number of descriptor tables that can appear in a root signature (MaxDescriptorTables
). The layout of the m_DescriptorTableCache
array is configured in the ParseRootSignature
method shown later.
1 2 3 4 5 6 7 |
// Each bit in the bit mask represents the index in the root signature // that contains a descriptor table. uint32_t m_DescriptorTableBitMask; // Each bit set in the bit mask represents a descriptor table // in the root signature that has changed since the last time the // descriptors were copied. uint32_t m_StaleDescriptorTableBitMask; |
The m_DescriptorTableBitMask
variable indicates which entries in the currently bound root signature contains a descriptor table. The m_StaleDescriptorTableBitMask
variable is used to indicate which descriptor table entries have been modified since the previous commit. If a root signature has multiple descriptor table entries (as is shown in the example in the introduction to this section) but only one of the descriptor tables is modified between draw (or dispatch) commands, then only the modified descriptor table needs to be copied the GPU visible descriptor heap. Any unmodified descriptor tables can be left as-is.
1 2 3 4 |
using DescriptorHeapPool = std::queue< Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> >; DescriptorHeapPool m_DescriptorHeapPool; DescriptorHeapPool m_AvailableDescriptorHeaps; |
The DescriptorHeapPool
is an alias type for a std::queue
of ID3D12DescriptorHeap
s.
The m_DescriptorHeapPool
variable stores all of the descriptor heaps created by the DynamicDescriptorHeap
class and the m_AvailableDescriptorHeaps
variable stores only the descriptor heaps that still contain descriptors. When a descriptor heap does not contain enough descriptors to commit all staged descriptors to the descriptor heap then it is removed from the m_AvailableDescriptorHeaps
queue until the DynamicDescriptorHeap
is reset.
1 2 3 4 5 6 7 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> m_CurrentDescriptorHeap; CD3DX12_GPU_DESCRIPTOR_HANDLE m_CurrentGPUDescriptorHandle; CD3DX12_CPU_DESCRIPTOR_HANDLE m_CurrentCPUDescriptorHandle; uint32_t m_NumFreeHandles; }; |
The m_CurrentDescriptorHeap
variable points to the current descriptor heap that is bound to the command list.
The m_CurrentGPUDescriptorHandle
and m_CurrentCPUDescriptorHandle
variables store the current GPU and CPU descriptor handles within the m_CurrentDescriptorHeap
descriptor heap.
The m_NumFreeHandles
variable stores the number of descriptor handles that are still available in the currently bound descriptor heap.
View the full source code for DynamicDescriptorHeap.h
DynamicDescriptorHeap Preamble
The preamble for the DynamicDescriptorHeap
implementation file contains the additional headers that are required to compile the class.
1 2 3 4 5 6 7 |
#include <DX12LibPCH.h> #include <DynamicDescriptorHeap.h> #include <Application.h> #include <CommandList.h> #include <RootSignature.h> |
The DX12LibPCH.h
header file is the precompiled header file for the DX12Lib project.
The DynamicDescriptorHeap.h
header file contains the declaration for the DynamicDescriptorHeap
class. This header file is described in the previous section.
The Application.h
header file is required to get access to the ID2D12Device
which is owned by the Application
class.
The CommandList.h
header file contains the declaration of the CommandList
class and the RootSignature.h
header file contains the declaration of the RootSignature
class. These classes are part of the DX12Lib project but are not described in detail in this lesson.
DynamicDescriptorHeap::DynamicDescriptorHeap
The constructor for the DynamicDescriptorHeap
initializes the variables for the DynamicDescriptorHeap
and allocates storage for the descriptor handle cache based on the maximum number of descriptors per descriptor heap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DynamicDescriptorHeap::DynamicDescriptorHeap(D3D12_DESCRIPTOR_HEAP_TYPE heapType, uint32_t numDescriptorsPerHeap) : m_DescriptorHeapType(heapType) , m_NumDescriptorsPerHeap(numDescriptorsPerHeap) , m_DescriptorTableBitMask(0) , m_StaleDescriptorTableBitMask(0) , m_CurrentCPUDescriptorHandle(D3D12_DEFAULT) , m_CurrentGPUDescriptorHandle(D3D12_DEFAULT) , m_NumFreeHandles(0) { m_DescriptorHandleIncrementSize = Application::Get().GetDescriptorHandleIncrementSize(heapType); // Allocate space for staging CPU visible descriptors. m_DescriptorHandleCache = std::make_unique<D3D12_CPU_DESCRIPTOR_HANDLE[]>(m_NumDescriptorsPerHeap); } |
Since the increment size of a descriptor in a descriptor heap is vendor specific, it must be queried at runtime. The increment size of a descriptor is queried on line 18.
On line 21, the descriptor handle cache is created based on the maximum number of descriptors that can be copied to the GPU visible descriptor heap.
DynamicDescriptorHeap::ParseRootSignature
Before any descriptors can be staged to the DynamicDescriptorHeap
the layout of the descriptor tables in the root signature must be known. The ParseRootSignature
method is used to configure the layout of the descriptor cache whenever the root signature is changed on the command list.
1 2 3 4 5 6 7 |
void DynamicDescriptorHeap::ParseRootSignature(const RootSignature& rootSignature) { // If the root signature changes, all descriptors must be (re)bound to the // command list. m_StaleDescriptorTableBitMask = 0; const auto& rootSignatureDesc = rootSignature.GetRootSignatureDesc(); |
The only argument to the ParseRootSignature
method is a reference to a RootSignature
. The RootSignature
class is part of the DX12Lib project but is not described in any detail in this lesson. The RootSignature
class provides a wrapper for a ID3D12RootSignature
with some additional methods to query the layout of the root signature.
Whenever the root signature changes on the command list, any stale descriptors that were staged but not committed should be bound again to the graphics or compute pipelines. The m_StaleDescriptorTableBitMask
variable is reset on line 31 to indicate that no descriptors should be copied to a GPU visible descriptor heap until new descriptors are staged to the DynamicDescriptorHeap
.
The root signature description used to create the root signature is cached in the RootSignature
class. This value is queried on line 33 so that the layout of the root signature can be determined.
1 2 3 4 |
// Get a bit mask that represents the root parameter indices that match the // descriptor heap type for this dynamic descriptor heap. m_DescriptorTableBitMask = rootSignature.GetDescriptorTableBitMask(m_DescriptorHeapType); uint32_t descriptorTableBitMask = m_DescriptorTableBitMask; |
A bitmask that represents the indices of the root signature that has a descriptor table for a particular descriptor heap type is queried on line 37. The bitmask for the root signature described in the example above looks like this:
The above image shows an example of a descriptor table bitmask for the CBV_SRV_UAV
descriptor heap type shown in the example above. In this case, the parameters at root indices 1, 2, and 3 have a descriptor table matching the heap type.
A copy of the descriptor table bitmask is initialized on line 38 so it can be scanned and cleared without modifying the class member variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
uint32_t currentOffset = 0; DWORD rootIndex; while (_BitScanForward(&rootIndex, descriptorTableBitMask) && rootIndex < rootSignatureDesc.NumParameters) { uint32_t numDescriptors = rootSignature.GetNumDescriptors(rootIndex); DescriptorTableCache& descriptorTableCache = m_DescriptorTableCache[rootIndex]; descriptorTableCache.NumDescriptors = numDescriptors; descriptorTableCache.BaseDescriptor = m_DescriptorHandleCache.get() + currentOffset; currentOffset += numDescriptors; // Flip the descriptor table bit so it's not scanned again for the current index. descriptorTableBitMask ^= (1 << rootIndex); } |
While there are bits enabled in the descriptorTableBitMask
bitmask variable, each index of the root signature is queried on line 44 for the number of descriptors in the descriptor table. The corresponding entry of the descriptor table cache is retrieved on line 46 and the number of descriptors and a pointer to the entry in the descriptor handle cache are stored on lines 47-48.
The _BitScanForward
function is actually a compiler intrinsic that scans a bitfield from least-significant bit (LSB) to most-significant bit (MSB) and stores the position of the first set bit in the index argument. Compiler intrinsics are usually faster than calling an equivalent function because intrinsics usually boil down to a single CPU instruction in the compiled executable.
The current offset in the descriptor handle cache is updated on line 50 by the number of descriptors in the descriptor table.
On line 53, the bit in the descriptorTableBitMask
is flipped to 0 so that the current index is not scanned again in the while
loop.
1 2 3 |
// Make sure the maximum number of descriptors per descriptor heap has not been exceeded. assert(currentOffset <= m_NumDescriptorsPerHeap && "The root signature requires more than the maximum number of descriptors per descriptor heap. Consider increasing the maximum number of descriptors per descriptor heap."); } |
Before leaving the ParseRootSignature
method, the post condition that the total number of descriptors of the root signature does not exceed the maximum number of descriptors that can be copied to the GPU visible descriptor heap is checked.
DynamicDescriptorHeap::StageDescriptors
The StageDescriptors
method is used to copy the CPU descriptor handles to prepare them for committing them to the GPU visible descriptor heap later.
1 2 3 4 5 6 7 8 |
void DynamicDescriptorHeap::StageDescriptors(uint32_t rootParameterIndex, uint32_t offset, uint32_t numDescriptors, const D3D12_CPU_DESCRIPTOR_HANDLE srcDescriptor) { // Cannot stage more than the maximum number of descriptors per heap. // Cannot stage more than MaxDescriptorTables root parameters. if (numDescriptors > m_NumDescriptorsPerHeap || rootParameterIndex >= MaxDescriptorTables ) { throw std::bad_alloc(); } |
Before copying any descriptors, the preconditions of the arguments are checked to ensure the user is not able to copy more descriptors than can fit in a descriptor heap or tries to set descriptors at an invalid index in the descriptor table cache. If either of these is the case, a std::bad_alloc
exception is thrown.
1 2 3 4 5 6 7 8 |
DescriptorTableCache& descriptorTableCache = m_DescriptorTableCache[rootParameterIndex]; // Check that the number of descriptors to copy does not exceed the number // of descriptors expected in the descriptor table. if ( (offset + numDescriptors) > descriptorTableCache.NumDescriptors) { throw std::length_error("Number of descriptors exceeds the number of descriptors in the descriptor table."); } |
A reference to the corresponding entry in the descriptor table cache is retrieved on line 69 and an additional check to ensure the user isn’t copying more descriptors than the current descriptor table is configured for is made on lines 73-76. If the user tries to copy a descriptor beyond the number of descriptors in the descriptor table, an std::length_error
exception is thrown.
1 2 3 4 5 |
D3D12_CPU_DESCRIPTOR_HANDLE* dstDescriptor = (descriptorTableCache.BaseDescriptor + offset); for (uint32_t i = 0; i < numDescriptors; ++i) { dstDescriptor[i] = CD3DX12_CPU_DESCRIPTOR_HANDLE(srcDescriptor, i, m_DescriptorHandleIncrementSize); } |
A pointer to the descriptor handle at a particular offset in the descriptor table cache is retrieved on line 78.
On lines 79-82 the descriptor handles are copied to the descriptor handle cache.
1 2 3 4 |
// Set the root parameter index bit to make sure the descriptor table // at that index is bound to the command list. m_StaleDescriptorTableBitMask |= (1 << rootParameterIndex); } |
To ensure the staged descriptors are committed to the GPU visible descriptor heap when the CommitStagedDescriptors
method is invoked, the corresponding bit in the m_StaleDescriptorTableBitMask
variable is set to 1 on line 86.
DynamicDescriptorHeap::ComputeStaleDescriptorCount
The ComputeStaleDescriptorCount
method is used to determine the number of descriptors that need to be committed to the GPU visible descriptor heap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
uint32_t DynamicDescriptorHeap::ComputeStaleDescriptorCount() const { uint32_t numStaleDescriptors = 0; DWORD i; DWORD staleDescriptorsBitMask = m_StaleDescriptorTableBitMask; while ( _BitScanForward( &i, staleDescriptorsBitMask ) ) { numStaleDescriptors += m_DescriptorTableCache[i].NumDescriptors; staleDescriptorsBitMask ^= ( 1 << i ); } return numStaleDescriptors; } |
The ComputeStaleDescriptorCount
method is fairly simple. It counts the number of descriptors in any descriptor table cache whose corresponding bit in the m_StaleDescriptorTableBitMask
is set.
DynamicDescriptorHeap::RequestDescriptorHeap
The RequestDescriptorHeap
method retrieves a descriptor heap from the list of availble descriptor heaps. If there are no descriptor heaps available, a new one is created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> DynamicDescriptorHeap::RequestDescriptorHeap() { Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> descriptorHeap; if (!m_AvailableDescriptorHeaps.empty()) { descriptorHeap = m_AvailableDescriptorHeaps.front(); m_AvailableDescriptorHeaps.pop(); } else { descriptorHeap = CreateDescriptorHeap(); m_DescriptorHeapPool.push(descriptorHeap); } return descriptorHeap; } |
If the m_AvailableDescriptorHeaps
queue is not empty, then the first element is popped off the queue. If the m_AvailableDescriptorHeaps
queue is empty, then a new descriptor heap is created on 114 and added to the m_DescriptorHeapPool
.
DynamicDescriptorHeap::CreateDescriptorHeap
If the m_AvailableDescriptorHeaps
queue is empty, then a new descriptor heap is crated using the CreateDescriptorHeap
method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> DynamicDescriptorHeap::CreateDescriptorHeap() { auto device = Application::Get().GetDevice(); D3D12_DESCRIPTOR_HEAP_DESC descriptorHeapDesc = {}; descriptorHeapDesc.Type = m_DescriptorHeapType; descriptorHeapDesc.NumDescriptors = m_NumDescriptorsPerHeap; descriptorHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE; Microsoft::WRL::ComPtr<ID3D12DescriptorHeap> descriptorHeap; ThrowIfFailed(device->CreateDescriptorHeap(&descriptorHeapDesc, IID_PPV_ARGS(&descriptorHeap))); return descriptorHeap; } |
Descriptor heap creation is described in detail in the first lesson in this series. What is interesting to note here is that the descriptor heap is created with the D3D12_DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE
flag which enables these descriptors to be mapped to the command list and used to access resources in a HLSL shader.
DynamicDescriptorHeap::CommitStagedDescriptors
Arguably the most interesting (and most complex) method of the DynamicDescriptorHeap
class is the CommitStagedDescriptors
method. This method copies the staged descriptors in the descriptor table cache to the GPU visible descriptor heap and binds the descriptors to the command list using the appropriate method.
1 2 3 4 |
void DynamicDescriptorHeap::CommitStagedDescriptors(CommandList& commandList, std::function<void(ID3D12GraphicsCommandList*, UINT, D3D12_GPU_DESCRIPTOR_HANDLE)> setFunc) { // Compute the number of descriptors that need to be copied uint32_t numDescriptorsToCommit = ComputeStaleDescriptorCount(); |
The CommitStagedDescriptors
method takes two parameters: the command list used to bind the descriptors and a setter function that is either ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable
or ID3D12GraphicsCommandList::SetComputeRootDescriptorTable
depending on the command being executed on the command list.
DynamicDescriptorHeap::CommitStagedDescriptors
method should not be called directly. The DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
and the DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch
should be used instead.The number of descriptors that need to be committed is computed on line 139 using the ComputeStaleDescriptorCount
method described earlier.
1 2 3 4 5 |
if ( numDescriptorsToCommit > 0 ) { auto device = Application::Get().GetDevice(); auto d3d12GraphicsCommandList = commandList.GetGraphicsCommandList().Get(); assert(d3d12GraphicsCommandList != nullptr); |
If there are no descriptors to commit, the CommitStagedDescriptors
method should do nothing. The ID3D12Device
is retrieved from the Application
class on line 143 and a pointer to the ID3D12GraphicsCommandList
is retrieved on 144. On line 145, the pointer to the ID3D12GraphicsCommandList
is checked to make sure it is not null.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
if ( !m_CurrentDescriptorHeap || m_NumFreeHandles < numDescriptorsToCommit ) { m_CurrentDescriptorHeap = RequestDescriptorHeap(); m_CurrentCPUDescriptorHandle = m_CurrentDescriptorHeap->GetCPUDescriptorHandleForHeapStart(); m_CurrentGPUDescriptorHandle = m_CurrentDescriptorHeap->GetGPUDescriptorHandleForHeapStart(); m_NumFreeHandles = m_NumDescriptorsPerHeap; commandList.SetDescriptorHeap(m_DescriptorHeapType, m_CurrentDescriptorHeap.Get()); // When updating the descriptor heap on the command list, all descriptor // tables must be (re)recopied to the new descriptor heap (not just // the stale descriptor tables). m_StaleDescriptorTableBitMask = m_DescriptorTableBitMask; } |
If either the m_CurrentDescriptorHeap
is null (which is the case when the DynamicDescriptorHeap
is first created or after it has been reset) or there are not enough free handles to commit to the descriptor heap, a new heap retrieved using the RequestDescriptorHeap
method on line 149.
On lines 150-151 the CPU and GPU descriptor handles are set to the first descriptors in the new heap and the number of free handles is reset to the total number of descriptors in the descriptor heap.
The CommandList::SetDescriptorHeap
method is used to ensure the command list has the new descriptor heap bound.
When changing descriptor heaps, it is necessary to copy all of the staged descriptors to the descriptor heap (not just the ones that have been updated since the last time the descriptors were committed). Resetting the m_StaleDescriptorTableBitMask
variable to the value of the m_DescriptorTableBitMask
on line 159 ensures that all of the staged descriptors are copied to the new descriptor heap.
1 2 3 4 5 6 |
DWORD rootIndex; // Scan from LSB to MSB for a bit set in staleDescriptorsBitMask while ( _BitScanForward( &rootIndex, m_StaleDescriptorTableBitMask ) ) { UINT numSrcDescriptors = m_DescriptorTableCache[rootIndex].NumDescriptors; D3D12_CPU_DESCRIPTOR_HANDLE* pSrcDescriptorHandles = m_DescriptorTableCache[rootIndex].BaseDescriptor; |
The _BitScanForward
intrinsic method is used to iterate the stale descriptor tables that need to be committed to the GPU visible desccriptor heap.
On lines 166-165, the number of descriptors and the pointer to the CPU visible descriptors in the descriptor table cache is retrieved.
1 2 3 4 5 6 7 8 |
D3D12_CPU_DESCRIPTOR_HANDLE pDestDescriptorRangeStarts[] = { m_CurrentCPUDescriptorHandle }; UINT pDestDescriptorRangeSizes[] = { numSrcDescriptors }; |
Before the descriptors are copied to the GPU visible descriptor heap, it is necssary to configure an array that contains the destination descriptor handles and an array that contains the destination descriptor ranges.
1 2 3 |
// Copy the staged CPU visible descriptors to the GPU visible descriptor heap. device->CopyDescriptors(1, pDestDescriptorRangeStarts, pDestDescriptorRangeSizes, numSrcDescriptors, pSrcDescriptorHandles, nullptr, m_DescriptorHeapType); |
The CPU descriptor handles are copied to the GPU visible descriptor heap on line 178 using the ID3D12Device::CopyDescriptors
method. This method has the following signature [4]:
1 2 3 4 5 6 7 8 9 |
void CopyDescriptors( UINT NumDestDescriptorRanges, const D3D12_CPU_DESCRIPTOR_HANDLE *pDestDescriptorRangeStarts, const UINT *pDestDescriptorRangeSizes, UINT NumSrcDescriptorRanges, const D3D12_CPU_DESCRIPTOR_HANDLE *pSrcDescriptorRangeStarts, const UINT *pSrcDescriptorRangeSizes, D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType ); |
And takes the following arguments:
UINT NumDestDescriptorRanges
: The number of destination descriptor ranges to copy to. In this case, there is only 1 destintion descriptor range.const D3D12_CPU_DESCRIPTOR_HANDLE *pDestDescriptorRangeStarts
: An array ofD3D12_CPU_DESCRIPTOR_HANDLE
s to copy to.const UINT *pDestDescriptorRangeSizes
: An array of destination descriptor range sizes to copy to.UINT NumSrcDescriptorRanges
: The number of source descriptor ranges to copy from. There is no requirement that the source descriptors appear contigiously in the same CPU visible descriptor heap (or that they come from the same descriptor heap) the number of source ranges is equal to the number of descriptors to copy. That is, the size of each source descriptor range is 1.const D3D12_CPU_DESCRIPTOR_HANDLE *pSrcDescriptorRangeStarts
: An array ofD3D12_CPU_DESCRIPTOR_HANDLE
s to copy from.const UINT *pSrcDescriptorRangeSizes
: An array of source descriptor range sizes to copy from. This parameter is optional and if null, then each descriptor range size is considered to be 1 and the descriptors are copied one at a time. Since the source descriptors do not appear in a consecutive range in the source descriptor heaps, this behaviour is exactly what is required.D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType
: Specifies the type of descriptor heap to copy with.
1 2 |
// Set the descriptors on the command list using the passed-in setter function. setFunc(d3d12GraphicsCommandList, rootIndex, m_CurrentGPUDescriptorHandle); |
Using the setter function passed to the CommitStagedDescriptors
method, the GPU visible descriptors are set on the command list.
1 2 3 4 |
// Offset current CPU and GPU descriptor handles. m_CurrentCPUDescriptorHandle.Offset(numSrcDescriptors, m_DescriptorHandleIncrementSize); m_CurrentGPUDescriptorHandle.Offset(numSrcDescriptors, m_DescriptorHandleIncrementSize); m_NumFreeHandles -= numSrcDescriptors; |
The current CPU and GPU descriptor handles are incremented on lines 186-187 by the number of descriptors that were copied and the number of free handles in the current descriptor heap is decremented on line 188.
1 2 3 4 5 |
// Flip the stale bit so the descriptor table is not recopied again unless it is updated with a new descriptor. m_StaleDescriptorTableBitMask ^= (1 << rootIndex); } } } |
To ensure the current descriptor table is not copied again (unless the descriptors are updated) the corresponding bit in the m_StaleDescriptorTableBitMask
bitmask is inverted on line 191.
DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
The CommitStagedDescriptorsForDraw
method is a helper method that forwards the ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable
method to the CommitStagedDescriptors
method.
1 2 3 4 |
void DynamicDescriptorHeap::CommitStagedDescriptorsForDraw(CommandList& commandList) { CommitStagedDescriptors(commandList, &ID3D12GraphicsCommandList::SetGraphicsRootDescriptorTable); } |
DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch
The CommitStagedDescriptorsForDispatch
method is a helper method that forwards the ID3D12GraphicsCommandList::SetComputeRootDescriptorTable
method to the CommitStagedDescriptors
method.
1 2 3 4 |
void DynamicDescriptorHeap::CommitStagedDescriptorsForDispatch(CommandList& commandList) { CommitStagedDescriptors(commandList, &ID3D12GraphicsCommandList::SetComputeRootDescriptorTable); } |
DynamicDescriptorHeap::CopyDescriptor
The CopyDescriptor
method is used to copy a single CPU visible descriptor to a GPU visible descriptor heap.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
D3D12_GPU_DESCRIPTOR_HANDLE DynamicDescriptorHeap::CopyDescriptor(CommandList& comandList, D3D12_CPU_DESCRIPTOR_HANDLE cpuDescriptor) { if (!m_CurrentDescriptorHeap || m_NumFreeHandles < 1) { m_CurrentDescriptorHeap = RequestDescriptorHeap(); m_CurrentCPUDescriptorHandle = m_CurrentDescriptorHeap->GetCPUDescriptorHandleForHeapStart(); m_CurrentGPUDescriptorHandle = m_CurrentDescriptorHeap->GetGPUDescriptorHandleForHeapStart(); m_NumFreeHandles = m_NumDescriptorsPerHeap; comandList.SetDescriptorHeap(m_DescriptorHeapType, m_CurrentDescriptorHeap.Get()); // When updating the descriptor heap on the command list, all descriptor // tables must be (re)recopied to the new descriptor heap (not just // the stale descriptor tables). m_StaleDescriptorTableBitMask = m_DescriptorTableBitMask; } |
Similar to the CommitStagedDescriptors
method, there must be at least one descriptor avaiable in the currently bound descriptor heap. If the current descriptor heap is not valid or there are no free descriptors in the descirptor heap, a new descriptor heap is requested on line 210. If the current descriptor heap changes, then the new descriptor heap must be updated on the command list. It is also important to reset the m_StaleDescriptorTableBitMask
to ensure that all descriptors are copied to the new GPU visible descriptor heap before a draw or dispatch command is executed on the command list.
1 2 3 4 5 6 7 8 9 10 11 |
auto device = Application::Get().GetDevice(); D3D12_GPU_DESCRIPTOR_HANDLE hGPU = m_CurrentGPUDescriptorHandle; device->CopyDescriptorsSimple(1, m_CurrentCPUDescriptorHandle, cpuDescriptor, m_DescriptorHeapType); m_CurrentCPUDescriptorHandle.Offset(1, m_DescriptorHandleIncrementSize); m_CurrentGPUDescriptorHandle.Offset(1, m_DescriptorHandleIncrementSize); m_NumFreeHandles -= 1; return hGPU; } |
Since only a single descriptor is being copied from the source descriptor to the destination descriptor the ID3D12Device::CopyDescriptorsSimple
method is used. This method has the following signature [5]:
1 2 3 4 5 6 |
void CopyDescriptorsSimple( UINT NumDescriptors, D3D12_CPU_DESCRIPTOR_HANDLE DestDescriptorRangeStart, D3D12_CPU_DESCRIPTOR_HANDLE SrcDescriptorRangeStart, D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType ); |
And takes the following parameters:
UINT NumDescriptors
: The number of descriptors to copy. Both source and destination descriptors are considered to be consecutively ordered in the descriptor heap.D3D12_CPU_DESCRIPTOR_HANDLE DestDescriptorRangeStart
: AD3D12_CPU_DESCRIPTOR_HANDLE
that describes the destination descriptors to start to copy to.D3D12_CPU_DESCRIPTOR_HANDLE SrcDescriptorRangeStart
: AD3D12_CPU_DESCRIPTOR_HANDLE
that describes the source descriptors to start to copy from.D3D12_DESCRIPTOR_HEAP_TYPE DescriptorHeapsType
: Specifies the type of descriptor heap to copy with.
After copying the descriptor to the GPU visible descriptor heap, the current CPU and GPU handles are incremented, the number free handles is decremented, and the GPU descriptor handle is returned on line 232.
DynamicDescriptorHeap::Reset
The Reset
method is called on the DynamicDescriptorHeap
class when the commands that are referencing any descriptor in the DynamicDescriptorHeap
have finished executing on the GPU. When the DynamicDescriptorHeap
is reset, all of the descriptor heaps are made avaiable again and the descriptor table cache is reset.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
void DynamicDescriptorHeap::Reset() { m_AvailableDescriptorHeaps = m_DescriptorHeapPool; m_CurrentDescriptorHeap.Reset(); m_CurrentCPUDescriptorHandle = CD3DX12_CPU_DESCRIPTOR_HANDLE(D3D12_DEFAULT); m_CurrentGPUDescriptorHandle = CD3DX12_GPU_DESCRIPTOR_HANDLE(D3D12_DEFAULT); m_NumFreeHandles = 0; m_DescriptorTableBitMask = 0; m_StaleDescriptorTableBitMask = 0; // Reset the table cache for (int i = 0; i < MaxDescriptorTables; ++i) { m_DescriptorTableCache[i].Reset(); } } |
On line 237 the m_DescriptorHeapPool
(which is a queue
that contains all of the descriptor heaps created by the DynamicDescriptorHeap
class) is copied to the m_AvailableDescriptorHeaps
queue
effectively making all of the descriptor heaps avaialable again and ready for new allocations.
On line 238 the (ComPtr
) for the current descriptor heap is reset. This ensures that a request for an available descriptor heap is made when descriptors are copied (using either the CommitStagedDescriptors
method or the CopyDescriptor
method).
On lines 239-243, the descriptor handles, number of free descriptors, and descriptor table bit masks are all reset.
On lines 246-249, the descriptor table cache is reset (removing all descriptor table entries from the descriptor table cache). Before any new descriptors can be stagged to the DynamicDescriptorHeap
, a root signature must be parsed using the ParseRootSignature
method.
This concludes the description of the DynamicDescriptorHeap
class. In the next section, the ResourceStateTracker
is described. The ResourceStateTracker
class is used to track state transitions for (sub)resources.
View the full source code for DynamicDescriptorHeap.cpp
Resource State Tracking
In previous version of DirectX, the state of a resource was automatically tracked by the graphics driver. Since DirectX 12, it is the responsibility of the graphics programmer to transition resources to the correct state before using the resource on the command list.
Certain operations can only be performed on a resource if the resource is in the correct state. For example, before a resource can be used for reading in a pixel shader, the resource must be in the D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE
state and before a resource can be written to in a compute shader, the resource must be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS
state. In a single-threaded environment, keeping track of the state of a resource is trivial since all operations on a resource generally occur in linear order. In a multi-threaded environment however it is possible that a resource must be in the D3D12_RESOURCE_STATE_DEPTH_WRITE
state in one thread but needs to be in the D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE
in another thread at the same time (this is common for performing shadow mapping for example).
When transitioning a resource to another state, both the before and after states must be known. It becomes even more complicated since each subresource of a resource can be in a different state. For example, when performing mipmapping in a compute shader, the first subresource of a texture should be in the D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE
so that it can be read from in a compute shader and the other subresoruces should be in the D3D12_RESOURCE_STATE_UNORDERED_ACCESS
state so that they can be written to in a compute shader.
Tracking the state of a resource (and all of its subresources) across multiple command lists and multiple threads can be tedious and error prone. The purpose of the ResourceStateTracker
class is to track the state of a resource within a command list and to ensure correct resource state transitions even when the resource is being used in different states on different threads.
ResourceStateTracker
class tracks the state of a resource within a command list. The class shown here is intended to be used by the custom CommandList
class that is described later in this lesson. It is not intended to be used outside of the CommandList
class and certain assumptions have been made in the design of the ResourceStateTracker
class. For example, while it is possible to build different command lists across different threads, a single command list will not be shared across multiple threads (command list building is a single-threaded operation). This allows for some simplifying assumptions such as the state of a resource will not be modified by multiple threads within the same command list.The design of the ResourceStateTracker
class described here is influenced by “the inimitable” Sebastian Merry in his YouTube video about resource barriers and resource state tracking which can be seen here: https://youtu.be/nmB2XMasz2o.
When submitting a resource state transition barrier to the ResourceStateTracker
, it first checks if the resource has been used on the current command list before. If the resource has not been used on the command list yet, it adds the transition barrier to a list of pending barriers (which are not directly added to the command list) and it adds the after state of the resource to a list of “known state” for that resource. The next time a transition barrier is sent to the ResourceStateTracker
for the same resource, it uses the known state of the resource as the before state for the transition and adds the barrier to the command list.
When submitting the command list to the command queue for execution, the pending barriers are compared against the global state of the resource. If the global state and the pending state are different, then the pending barrier is added to another command list that is inserted into the command queue before the command list that is being executed.
The image above depicts two command lists (A and B) being built on seperate threads. Both command lists are accessing the same resource but each command lists requires the resource to be in a different state. In this case, Command List B does not know what state Command List A left the resource in. To ensure the resource is transitioned to the correct state required by Command List B, an intermediate Command List (C) is injected into the command queue between A and B.
The singular purpose of the intermediate Command List (C) is to ensure that any resources used by Command List B are in the correct state before executing the command list on the command queue.
To implement this strategy, several data structures are required:
- Pending Resource Transition Barriers Array: If, during command list building, a resource is being bound on a command list for the first time, its previous state is unknown and a transition barrier to transition the resource into the expected state is added to the Pending Resource Transition Barriers Array. Pending resource transition barriers should not be confused with split barriers. Split barriers are not used by the
ResourceStateTracker
class. - Final Resource State Map: After a resource has been used on the command list at least once, its final known state is added to the Final Resource State Map indexed by a pointer to the resource.
- Resource Transition Barriers Array: If the state of a resource is known (it has an entry in the Final Resource State Map) then any resource state transition is added to the Resource Transition Barriers Array and added directly to the command list before a draw or dispatch command is executed (or any command that requires transition barriers to be committed to the command list).
- Global Resource State Map: A command list may contain any number of state transitions for a resource. When a command list is executed on the command queue, the last known state of the resource is committed to the Global Resource State Map. It is the Global Resource State Map that is used to determine if a Pending Resource Transition Barrier is added to the intermediate command list or not.
ResourceStateTracker Class
The interface of the ResourceStateTracker
class is fairly simple. It provides a method to add a resource barrier to the state tracker and several helper methods that allow specific barrier types (transition, UAV, or alias) to be added to the resource state tracker. The ResourceStateTracker
class also provides a method to flush any pending resource barriers. Flushing of pending resource barriers is only needs to be done when the command list is executed on the command queue. Flushing of pending resource barriers is handled automatically by the CommandList
class which is described later in this lesson.
The ResourceStateTracker
class also provides a method to flush non-pending resource barriers to the command list. A non-pending resource barrier is any UAV, aliasing, or transition barrier where the before state of the resource is already known.
The ResourceStateTracker
class also exposes a few methods to commit the final resource states to the global resource state map and to register and unregister resources to the global resource state map (when a resource is created, it is added to the global resource state map and when it is destroyed, it is removed from the global resource state map).
ResourceStateTracker Header
The ResourceStateTracker
has a few dependencies on external libraries. Those dependencies are included first.
1 2 3 4 5 6 7 8 9 |
#include <d3d12.h> #include <mutex> #include <map> #include <unordered_map> #include <vector> class CommandList; class Resource; |
The d3d12.h
header file is included on line 47 for the D3D12_RESOURCE_STATES
, D3D12_RESOURCE_BARRIER
, and ID3D12Resource
types.
The mutex
header file is required for the std::mutex
type which is used to syncronize access to the global resource state map.
The map
and unordered_map
header files are required for the std::map
and std::unordered_map
types used for the global resource state map.
The vector
header file is required for the std::vector
type that is used for the pending and non-pending resource barriers.
Since the CommandList
and Resource
types are only used as pointer or reference types in the header, it is sufficient to forward-declare these types in the header file for the ResourceStateTracker
class.
1 2 3 4 5 |
class ResourceStateTracker { public: ResourceStateTracker(); virtual ~ResourceStateTracker(); |
The declaration for the ResourceStateTracker
class starts with the constructor and destructor declared on lines 60-61.
1 2 3 4 5 6 |
/** * Push a resource barrier to the resource state tracker. * * @param barrier The resource barrier to push to the resource state tracker. */ void ResourceBarrier(const D3D12_RESOURCE_BARRIER& barrier); |
The ResourceBarrier
method is used to push any type of resource barrier (transition, UAV, or alias) to the ResourceStateTracker
. The ResourceStateTracker
also has a few methods to push specific barrier types.
1 2 3 4 5 6 7 8 9 10 |
/** * Push a transition resource barrier to the resource state tracker. * * @param resource The resource to transition. * @param stateAfter The state to transition the resource to. * @param subResource The subresource to transition. By default, this is D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES * which indicates that all subresources should be transitioned to the same state. */ void TransitionResource( ID3D12Resource* resource, D3D12_RESOURCE_STATES stateAfter, UINT subResource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES ); void TransitionResource(const Resource& resource, D3D12_RESOURCE_STATES stateAfter, UINT subResource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES); |
The TransitionResource
methods are used to forward a transition resource barrrier to the ResourceBarrier
method. By default, all subresources of a resource are transitioned to the same state. Notice that the TransitionResource
method does not need to know the before state of the transition barrier. This is because the ResourceStateTracker
is able to resolve the before state of the (sub)resource before the resource barrier is added to the command list.
1 2 3 4 5 6 7 |
/** * Push a UAV resource barrier for the given resource. * * @param resource The resource to add a UAV barrier for. Can be NULL which * indicates that any UAV access could require the barrier. */ void UAVBarrier(const Resource* resource = nullptr); |
The UAVBarrier
method is used to add a UAV barrier to the resource state tracker. UAV barriers are briefly discussed in Lesson 1.
1 2 3 4 5 6 7 8 9 10 |
/** * Push an aliasing barrier for the given resource. * * @param beforeResource The resource currently occupying the space in the heap. * @param afterResource The resource that will be occupying the space in the heap. * * Either the beforeResource or the afterResource parameters can be NULL which * indicates that any placed or reserved resource could cause aliasing. */ void AliasBarrier(const Resource* resourceBefore = nullptr, const Resource* resourceAfter = nullptr); |
The AliasBarrier
method is used to add an aliasing barrier to the resource state tracker. Neither UAV nor aliasing barriers need to track the state of a resource so it is not strictly necessary to use the ResourceStateTracker
class for those barrier types. The UAVBarrier
and AliasBarrier
methods are provided by the ResourceStateTracker
class so that resource barriers are managed in a consistent way and to ensure that resource barriers are aggregated to minimize the number of calls to the ID3D12GraphicsCommandList::ResourceBarrier
method (which is recommended).
1 2 3 4 5 6 |
/** * Flush any pending resource barriers to the command list. * * @return The number of resource barriers that were flushed to the command list. */ uint32_t FlushPendingResourceBarriers(CommandList& commandList); |
The FlushPendingResourceBarriers
method is used to flush any pending resource barriers to the specified command list. This method is called just after the command list is closed and just before it is executed on the command queue.
1 2 3 4 5 |
/** * Flush any (non-pending) resource barriers that have been pushed to the resource state * tracker. */ void FlushResourceBarriers(CommandList& commandList); |
The FlushResourceBarriers
method is used to flush any non-pending resource barriers to the specified command list.
1 2 3 4 5 |
/** * Commit final resource states to the global resource state map. * This must be called when the command list is closed. */ void CommitFinalResourceStates(); |
The CommitFinalResourceStates
method is used to update the global resource state map with the final known states of the resources.
1 2 3 4 |
/** * Reset state tracking. This must be done when the command list is reset. */ void Reset(); |
The Reset
method is called on the ResourceStateTracker
whenever the command list that owns the ResourceStateTracker
is reset. This ensures the pending, non-pending, and final resource state arrays are all reset.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
/** * The global state must be locked before flushing pending resource barriers * and committing the final resource state to the global resource state. * This ensures consistency of the global resource state between command list * executions. */ static void Lock(); /** * Unlocks the global resource state after the final states have been committed * to the global resource state array. */ static void Unlock(); |
In order to ensure consistent resource state across multiple threads, the global resource state map must be locked before pending resource barriers are flushed (since this requires reading from the global resource state map) and before flushing the final resource states to the global resource state map. The Lock
method ensures that the current thread has exclusive ownership of the global resource state map and the Unlock
method releases control allowing other threads to access the global resource state map. A std::mutex
is used to ensure exclusive ownership of the global resource state map.
1 2 3 4 5 6 7 8 9 10 11 |
/** * Add a resource with a given state to the global resource state array (map). * This should be done when the resource is created for the first time. */ static void AddGlobalResourceState(ID3D12Resource* resource, D3D12_RESOURCE_STATES state); /** * Remove a resource from the global resource state array (map). * This should only be done when the resource is destroyed. */ static void RemoveGlobalResourceState(ID3D12Resource* resource); |
The AddGlobalResourceState
method is used to register a resoruce and its initial state with the global resource state map. This is done whenever a new resource is created. Just before a resource is destroyed, the RemoveGlobalResourceState
can be used to remove the resource from the global resource state map.
ComPtr
s it is often difficult (or impossible) to know when a resource is about to be released. For this reason, resources are stored as raw pointers in the global resource state map and their ref counter is not increased and won’t prevent the resource from being released. Although the Windows Runtime Library provides a WeakRef
class, its usage seems unintuitive and did not add much value for this use case (since having dangling pointers in the global resource state map does not cause any problems).
1 2 3 |
private: // An array (vector) of resource barriers. using ResourceBarriers = std::vector<D3D12_RESOURCE_BARRIER>; |
ResourceBarriers
is a type alias for a vector
array of D3D12_RESOURCE_BARRIER
s.
1 2 3 4 5 6 7 |
// Pending resource transitions are committed before a command list // is executed on the command queue. This guarantees that resources will // be in the expected state at the beginning of a command list. ResourceBarriers m_PendingResourceBarriers; // Resource barriers that need to be committed to the command list. ResourceBarriers m_ResourceBarriers; |
The m_PendingResourceBarriers
member variable is used to store pending resource barriers and the m_ResourceBarriers
member variable is used to store non-pending resource barriers.
The ResourceState
struct is an internal struct that is used to track the state of the subresources of a resource.
1 2 3 4 5 6 7 |
// Tracks the state of a particular resource and all of its subresources. struct ResourceState { // Initialize all of the subresources within a resource to the given state. explicit ResourceState(D3D12_RESOURCE_STATES state = D3D12_RESOURCE_STATE_COMMON) : State(state) {} |
The ResourceState
struct is used by the ResourceStateTracker
to track the state of a resource and all of its subresources. By default, a resource (and all of its subresources) is initialized to the D3D12_RESOURCE_STATE_COMMON
state.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
// Set a subresource to a particular state. void SetSubresourceState(UINT subresource, D3D12_RESOURCE_STATES state) { if (subresource == D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES) { State = state; SubresourceState.clear(); } else { SubresourceState[subresource] = state; } } |
The ResourceState::SetSubresourceState
method is used to set the state of a (sub)resource. If D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES
is specified as the subresource index, then the state of the entire resource is updated and the SubresourceState
map (defined later) is cleared. Otherwise, the subresource index is used to update the state of the subresource within the SubresourceState
map
on line 182.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Get the state of a (sub)resource within the resource. // If the specified subresource is not found in the SubresourceState array (map) // then the state of the resource (D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES) is // returned. D3D12_RESOURCE_STATES GetSubresourceState(UINT subresource) const { D3D12_RESOURCE_STATES state = State; const auto iter = SubresourceState.find(subresource); if (iter != SubresourceState.end()) { state = iter->second; } return state; } |
The ResourceState::GetSubresourceState
method is used to query the state of the (sub)resource. If the subresource index is not found in the SubresourceState
map
, the state of the entire resource (State
) is returned.
1 2 3 4 5 |
// If the SubresourceState array (map) is empty, then the State variable defines // the state of all of the subresources. D3D12_RESOURCE_STATES State; std::map<UINT, D3D12_RESOURCE_STATES> SubresourceState; }; |
The SubresourceState
map
is used to store the current state of a subresource. If the SubresourceState
map
is empty, then the resource and all of its subresources are in the state defined by the State
variable.
1 |
using ResourceStateMap = std::unordered_map<ID3D12Resource*, ResourceState>; |
ResourceStateMap
is a type alias of a std::unordered_map
which maps a resource (pointer) to its ResourceState
.
1 2 3 4 |
// The final (last known state) of the resources within a command list. // The final resource state is committed to the global resource state when the // command list is closed but before it is executed on the command queue. ResourceStateMap m_FinalResourceState; |
The m_FinalResourceState
member variable stores the final known state of a resource in the ResourceStateTracker
.
1 2 3 |
// The global resource state array (map) stores the state of a resource // between command list execution. static ResourceStateMap ms_GlobalResourceState; |
The ms_GlobalResourceState
static member variable is used to store the global state of a resource. The global resource state map is updated whenever a command list is closed (just before it is executed on the command queue).
1 2 3 4 |
// The mutex protects shared access to the GlobalResourceState map. static std::mutex ms_GlobalMutex; static bool ms_IsLocked; }; |
The ms_GlobalMutex
and ms_IsLocked
static member variables are used to synchronize access to the global resource state map across multiple threads.
View the full source code for ResourceStateTracker.h
ResourceStateTracker Preamble
Besides the header file for the ResourceStateTracker
class, several other header files are included in the implementation file.
1 2 3 4 5 6 |
#include <DX12LibPCH.h> #include <ResourceStateTracker.h> #include <CommandList.h> #include <Resource.h> |
The DX12LibPCH.h
header file is the precompiled header file for the DX12Lib project.
The ResourceStateTracker.h
header file contains the declaration for the ResourceStateTracker
class. This header file is described in the previous section.
The CommandList
and Resource
classes were only forward-declared in the ResourceStateTracker.h
header file their headers are included here so the ResourceStateTracker
can use those classes.
1 2 3 4 |
// Static definitions. std::mutex ResourceStateTracker::ms_GlobalMutex; bool ResourceStateTracker::ms_IsLocked = false; ResourceStateTracker::ResourceStateMap ResourceStateTracker::ms_GlobalResourceState; |
Static member variables that are declared in the header file need to be defined to allocate space for them in static memory.
ResourceStateTracker::ResourceBarrier
The ResourceBarrier
method is used to add a resource barrier to the ResourceStateTracker
.
1 2 |
void ResourceStateTracker::ResourceBarrier(const D3D12_RESOURCE_BARRIER& barrier) { |
The ResourceBarrier
method takes a single D3D12_RESOURCE_BARRIER
as its only argument.
1 2 3 |
if (barrier.Type == D3D12_RESOURCE_BARRIER_TYPE_TRANSITION) { const D3D12_RESOURCE_TRANSITION_BARRIER& transitionBarrier = barrier.Transition; |
Since only transition barriers need to resove the before state of the resource, transition barriers are handled seperatly from UAV or alias barriers.
On line 23, the D3D12_RESOURCE_TRANSITION_BARRIER
is queried from the D3D12_RESOURCE_BARRIER
structure.
1 2 3 4 5 6 |
// First check if there is already a known "final" state for the given resource. // If there is, the resource has been used on the command list before and // already has a known state within the command list execution. const auto iter = m_FinalResourceState.find(transitionBarrier.pResource); if (iter != m_FinalResourceState.end()) { |
If the resource has been used on the command list before, then its previous state is known by the ResourceStateTracker
and that state is stored in the m_FinalResourceState
map.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
auto& resourceState = iter->second; // If the known final state of the resource is different... if ( transitionBarrier.Subresource == D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES && !resourceState.SubresourceState.empty() ) { // First transition all of the subresources if they are different than the StateAfter. for ( auto subresourceState : resourceState.SubresourceState ) { if ( transitionBarrier.StateAfter != subresourceState.second ) { D3D12_RESOURCE_BARRIER newBarrier = barrier; newBarrier.Transition.Subresource = subresourceState.first; newBarrier.Transition.StateBefore = subresourceState.second; m_ResourceBarriers.push_back( newBarrier ); } } } |
The iterator that is retrieved on line 28 stores the ResourceState
struct of the resource. The ResourceState
struct contains a map
of subresource states. If the transition barrier is transitioning all subresources (D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES
) and there are subresources that are in a different state then a transition barrier for each subresource that is not in the correct state is added to the m_ResourceBarriers
array.
1 2 3 4 5 6 7 8 9 10 11 |
else { auto finalState = resourceState.GetSubresourceState( transitionBarrier.Subresource ); if ( transitionBarrier.StateAfter != finalState ) { // Push a new transition barrier with the correct before state. D3D12_RESOURCE_BARRIER newBarrier = barrier; newBarrier.Transition.StateBefore = finalState; m_ResourceBarriers.push_back( newBarrier ); } } |
Else, if the transition barrier is transitioning only a single subresource, or all subresources are in the same state (the SubresourceState
map
is empty) and the current (final) state of the (sub)resource is different than the requested state, then a single transition barrier is added to the m_ResourceBarriers
array on line 56.
1 2 3 4 5 6 |
else // In this case, the resource is being used on the command list for the first time. { // Add a pending barrier. The pending barriers will be resolved // before the command list is executed on the command queue. m_PendingResourceBarriers.push_back(barrier); } |
If the resource is being used on the command list for the first time (the resource was not found in the m_FinalResourceState
map) then the resource barrier is added to the m_PendingResourceBarriers
array and the before state will be resolved later (before the command list is executed on the command queue).
1 2 3 |
// Push the final known state (possibly replacing the previously known state for the subresource). m_FinalResourceState[transitionBarrier.pResource].SetSubresourceState(transitionBarrier.Subresource, transitionBarrier.StateAfter); } |
Whether the resource has been seen on the command list before or not, its final state is added to the m_FinalResourceState
map, possibly replacing the previously known state of the (sub)resource.
1 2 3 4 5 6 |
else { // Just push non-transition barriers to the resource barriers array. m_ResourceBarriers.push_back(barrier); } } |
UAV and aliasing barriers (D3D12_RESOURCE_BARRIER_TYPE_UAV
, and D3D12_RESOURCE_BARRIER_TYPE_ALIASING
) do not require any special treatment and are simply pushed to the back of the m_ResourceBarriers
array.
ResourceStateTracker::TransitionResource
The TransitionResource
method is simply a helper method to forward a transition barrier to the ResourceBarrier
method described above.
1 2 3 4 5 6 7 8 9 10 11 12 |
void ResourceStateTracker::TransitionResource( ID3D12Resource* resource, D3D12_RESOURCE_STATES stateAfter, UINT subResource ) { if ( resource ) { ResourceBarrier( CD3DX12_RESOURCE_BARRIER::Transition( resource, D3D12_RESOURCE_STATE_COMMON, stateAfter, subResource ) ); } } void ResourceStateTracker::TransitionResource( const Resource& resource, D3D12_RESOURCE_STATES stateAfter, UINT subResource ) { TransitionResource( resource.GetD3D12Resource().Get(), stateAfter, subResource ); } |
There are two versions of the TransitionResource
method shown here. One that takes a raw pointer to a ID3D12Resource
and the other takes a const reference to a Resource
object (which is a type provided by the DX12Lib framework project). The Resource
class simply provides a wrapper for a ID3D12Resource
and it also serves as the base class for other resource types in the DX12Lib framwork.
ResourceStateTracker::UAVBarrier
The UAVBarrier
method is used to add a D3D12_RESOURCE_BARRIER_TYPE_UAV
typed D3D12_RESOURCE_BARRIER
to the command list.
1 2 3 4 5 6 |
void ResourceStateTracker::UAVBarrier(const Resource* resource ) { ID3D12Resource* pResource = resource != nullptr ? resource->GetD3D12Resource().Get() : nullptr; ResourceBarrier(CD3DX12_RESOURCE_BARRIER::UAV(pResource)); } |
When submitting a UAV barrier to a command list, the specified resource can be null. If the specified resource is null, then all UAV operations must complete before any UAV operation can be performed. This can cause pipeline stalls and should be avoided. UAV barriers should only be used to synchronize read/write operations on the same UAV resource.
ResourceStateTracker::AliasBarrier
The AliasBarrier
barrier method is used to add a D3D12_RESOURCE_BARRIER_TYPE_ALIASING
typed D3D12_RESOURCE_BARRIER
to the command list.
1 2 3 4 5 6 7 |
void ResourceStateTracker::AliasBarrier(const Resource* resourceBefore, const Resource* resourceAfter) { ID3D12Resource* pResourceBefore = resourceBefore != nullptr ? resourceBefore->GetD3D12Resource().Get() : nullptr; ID3D12Resource* pResourceAfter = resourceAfter != nullptr ? resourceAfter->GetD3D12Resource().Get() : nullptr; ResourceBarrier(CD3DX12_RESOURCE_BARRIER::Aliasing(pResourceBefore, pResourceAfter)); } |
Aliasing barriers are used to transition two resources that have mappings into the same heap. This is commonly used with placed or reserved resources that have overlapping mappings into the same heap. One or both resources can be null which indicates that accessing any placed or reserved resource could cause aliasing.
ResourceStateTracker::FlushResourceBarriers
The FlushResourceBarriers
method is used to push the non-pending resource barriers to the specified command list.
1 2 3 4 5 6 7 8 9 10 |
void ResourceStateTracker::FlushResourceBarriers(CommandList& commandList) { UINT numBarriers = static_cast<UINT>(m_ResourceBarriers.size()); if (numBarriers > 0 ) { auto d3d12CommandList = commandList.GetGraphicsCommandList(); d3d12CommandList->ResourceBarrier(numBarriers, m_ResourceBarriers.data()); m_ResourceBarriers.clear(); } } |
The FlushResourceBarriers
method is straightforward. It simply checks if the m_ResourceBarriers
array contains any barriers. If so, the resource barriers are added to the command list using the ID3D12GraphicsCommandList::ResourceBarrier
method.
After adding the resource barriers to the command list, the m_ResourceBarriers
(vector) array is cleared.
ResourceStateTracker::FlushPendingResourceBarriers
The FlushPendingResourceBarriers
method is not so straightforward as the FlushResourceBarriers
method. This method adds only the resource barriers that are required to transition the resources into the correct state required by the command list. The FlushPendingResourceBarriers
method adds the transition barriers to the intermediate command list (C) depicted in the image above. In order to determine if a pending transition needs to be added to the intermediate command list, its global state is compared against the pending state. If the pending state is different than the global state, then a transition barrier is added to the intermediate command list.
1 2 3 4 5 6 7 8 9 10 |
uint32_t ResourceStateTracker::FlushPendingResourceBarriers(CommandList& commandList) { assert(ms_IsLocked); // Resolve the pending resource barriers by checking the global state of the // (sub)resources. Add barriers if the pending state and the global state do // not match. ResourceBarriers resourceBarriers; // Reserve enough space (worst-case, all pending barriers). resourceBarriers.reserve(m_PendingResourceBarriers.size()); |
In order to gurantee the consistency of the global resource state map, access to the global map must be exclusive to the current thread. If access to the global resource state map is not locked, the assert on line 118 fails.
The resourceBarriers
local variable is a vector that is used to add any pending resource barriers to the command list. The size of the resourceBarriers
(vector) array will be (at most) the size of the m_PendingResourceBarriers
(vector) array. As an optimization, the worst-case size for the resourceBarriers
(vector) array is pre-allocated on line 125.
1 2 3 4 5 |
for (auto pendingBarrier : m_PendingResourceBarriers) { if (pendingBarrier.Type == D3D12_RESOURCE_BARRIER_TYPE_TRANSITION) // Only transition barriers should be pending... { auto pendingTransition = pendingBarrier.Transition; |
Only pending resource barriers of type D3D12_RESOURCE_BARRIER_TYPE_TRANSITION
need to be checked (in fact, the m_PendingResourceBarriers
array should only contain transition barriers) and UAV and alias barriers are ignored.
On line 131, the D3D12_RESOURCE_TRANSITION_BARRIER
structure is retrieved from the pending resource barrier.
1 2 3 |
const auto& iter = ms_GlobalResourceState.find(pendingTransition.pResource); if (iter != ms_GlobalResourceState.end()) { |
The globally known state of the resource is queried from the ms_GlobalResourceState
state array.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
// If all subresources are being transitioned, and there are multiple // subresources of the resource that are in a different state... auto& resourceState = iter->second; if ( pendingTransition.Subresource == D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES && !resourceState.SubresourceState.empty() ) { // Transition all subresources for ( auto subresourceState : resourceState.SubresourceState ) { if ( pendingTransition.StateAfter != subresourceState.second ) { D3D12_RESOURCE_BARRIER newBarrier = pendingBarrier; newBarrier.Transition.Subresource = subresourceState.first; newBarrier.Transition.StateBefore = subresourceState.second; resourceBarriers.push_back( newBarrier ); } } } |
If the pending resource barrier is transitioning all subresources (D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES
) and there are some subresources that are in a different state (the SubresourceState
map
is not empty) then every subresource that is in a different state needs an explicit transition to the correct state.
1 2 3 4 5 6 7 8 9 10 11 |
else { // No (sub)resources need to be transitioned. Just add a single transition barrier (if needed). auto globalState = ( iter->second ).GetSubresourceState( pendingTransition.Subresource ); if ( pendingTransition.StateAfter != globalState ) { // Fix-up the before state based on current global state of the resource. pendingBarrier.Transition.StateBefore = globalState; resourceBarriers.push_back( pendingBarrier ); } } |
Otherwise either only a single subresource is being transitioned or there are no subresources that are in a different state, then only a single transition barrier is required. If the (sub)resource is not in the correct after state (according to its global state), then a transition barrier is added to the resourceBarriers
array. The current global state of the resource is used to fix-up the before state of the transition barrier.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
} } } UINT numBarriers = static_cast<UINT>(resourceBarriers.size()); if (numBarriers > 0 ) { auto d3d12CommandList = commandList.GetGraphicsCommandList(); d3d12CommandList->ResourceBarrier(numBarriers, resourceBarriers.data()); } m_PendingResourceBarriers.clear(); return numBarriers; } |
And finally, the resource barriers (if any) are added to the (intermediate) command list on line 173.
After comitting all of the pending resource barriers to the command list, the m_PendingResourceBarriers
(vector) array is cleared.
The number of barriers (numBarriers
) that were generated is returned to the calling function. If the number of resource barriers is 0, then there is no need to execute the intermediate command list on the command queue.
ResourceStateTracker::CommitFinalResourceStates
The purpose of the CommitFinalResourceStates
method is to ensure that the final state of the resource is committed to the static global state map.
1 2 3 4 5 6 7 8 9 10 11 12 |
void ResourceStateTracker::CommitFinalResourceStates() { assert(ms_IsLocked); // Commit final resource states to the global resource state array (map). for (const auto& resourceState : m_FinalResourceState) { ms_GlobalResourceState[resourceState.first] = resourceState.second; } m_FinalResourceState.clear(); } |
The CommitFinalResourceStates
method simply merges the entries of the m_FinalResourceState
map
with the ms_GlobalResourceState
map
. After merging the final state of the resources into the global state map, the final state map is cleared on line 191.
ResourceStateTracker::Reset
The ResourceStateTracker
is reset whenver the command list that owns the ResourceStateTracker
is reset.
1 2 3 4 5 6 7 |
void ResourceStateTracker::Reset() { // Reset the pending, current, and final resource states. m_PendingResourceBarriers.clear(); m_ResourceBarriers.clear(); m_FinalResourceState.clear(); } |
The Reset
method simply clears the m_PendingResourceBarriers
, m_ResourceBarriers
(vector
) arrays, and the m_FinalResourceState
map
.
ResourceStateTracker
is used correctly, then the calling the Reset
method should be superfluous.ResourceStateTracker::Lock
The Lock
(static) method is used to lock the mutex which ensures only the current thread has access to the ms_GlobalResourceState
array.
1 2 3 4 5 |
void ResourceStateTracker::Lock() { ms_GlobalMutex.lock(); ms_IsLocked = true; } |
ResourceStateTracker::Unlock
Similar to the Lock
method, the Unlock
method unlocks the global mutex.
1 2 3 4 5 |
void ResourceStateTracker::Unlock() { ms_IsLocked = false; ms_GlobalMutex.unlock(); } |
ResourceStateTracker::AddGlobalResourceState
The AddGlobalResourceState
method is used whenever a new resource is created using ID3D12Device::CreateCommittedResource
, ID3D12Device::CreatePlacedResource
, or ID3D12Device::CreateReservedResource
.
1 2 3 4 5 6 7 8 |
void ResourceStateTracker::AddGlobalResourceState(ID3D12Resource* resource, D3D12_RESOURCE_STATES state) { if ( resource != nullptr ) { std::lock_guard<std::mutex> lock(ms_GlobalMutex); ms_GlobalResourceState[resource].SetSubresourceState(D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES, state); } } |
The resource and the initial resource state is added to the ms_GlobalResourceState
map
on line 219.
ResourceStateTracker::RemoveGlobalResourceState
Similar to the AddGlobalResourceState
, the RemoveGlobalResourceState
removes the resource from the ms_GlobalResourceState
map
.
1 2 3 4 5 6 7 8 |
void ResourceStateTracker::RemoveGlobalResourceState(ID3D12Resource* resource) { if ( resource != nullptr ) { std::lock_guard<std::mutex> lock(ms_GlobalMutex); ms_GlobalResourceState.erase(resource); } } |
View the full source code for ResourceStateTracker.cpp
Custom Command List
In order to provide a simple interface to the end user for the UploadBuffer
, DescriptorAllocator
, DynamicDescriptorHeap
, and the ResourceStateTracker
classes described in this article, the DX12Lib project provides a custom CommandList
class. The CommandList
class is one of the largest classes in the DX12Lib project and won’t be described in complete detail in this article. Only the methods of the CommandList
class that utialize one of the classes described here are shown.
CommandList Class
The CommandList
class is a wrapper for the ID3D12GraphicsCommandList
type. Almost all of the command list functionality that is required for the DirectX 12 tutorials in this series is implemented in the custom CommandList
class and there should be little to no reason to access the underlying ID3D12GraphicsCommandList
. The custom CommandList
class handles resource barriers, copying (CPU and GPU) resources, texture loading, mipmap generation (mipmap generation is explained in a later lesson), binding resources to the pipeline, descriptor heaps, draw, and dispatch commands.
CommandList::TransitionBarrier
The TransitionBarrier
method is used to forward a D3D12_RESOURCE_TRANSITION_BARRIER
structure to the ResourceStateTracker::ResourceBarrier
method.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
void CommandList::TransitionBarrier( const Resource& resource, D3D12_RESOURCE_STATES stateAfter, UINT subResource, bool flushBarriers ) { auto d3d12Resource = resource.GetD3D12Resource(); if ( d3d12Resource ) { // The "before" state is not important. It will be resolved by the resource state tracker. auto barrier = CD3DX12_RESOURCE_BARRIER::Transition( d3d12Resource.Get(), D3D12_RESOURCE_STATE_COMMON, stateAfter, subResource ); m_ResourceStateTracker->ResourceBarrier( barrier ); } if ( flushBarriers ) { FlushResourceBarriers(); } } |
The TransitionBarrier
method takes a const reference to a Resource
, the D3D12_RESOURCE_STATES
, the subresource to transition (which defaults to D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES
), and a boolean indicating if the resource barriers should be flushed to the command list as arguments to the method.
It is interesting to note that the TransitionBarrier
method does not need the before state of the resource since the ResourceStateTracker
will resolve the before state when comitting the resource barriers.
If the resource barriers should be committed to the command list, then the FlushResourceBarriers
method is used which in-turn calls the ResourceStateTracker::FlushResourceBarriers
method.
There are also CommandList::UAVBarrier
and CommandList::AliasingBarrier
methods that add resource barriers of those types to the ResourceStateTracker
. The description of those methods is not shown here for brevity.
CommandList::CopyResource
The CopyResource
is used to copy one GPU resource to another (copying of resources is a common operation in rendering pipelines). This method is shown here since it provides a good example of transitioning resources to the correct state before performing an operation that requires the resources to be in a specific state.
1 2 3 4 5 6 7 8 9 10 11 12 |
void CommandList::CopyResource( Resource& dstRes, const Resource& srcRes ) { TransitionBarrier( dstRes, D3D12_RESOURCE_STATE_COPY_DEST ); TransitionBarrier( srcRes, D3D12_RESOURCE_STATE_COPY_SOURCE ); FlushResourceBarriers(); m_d3d12CommandList->CopyResource( dstRes.GetD3D12Resource().Get(), srcRes.GetD3D12Resource().Get() ); TrackResource(dstRes); TrackResource(srcRes); } |
Before copying the contents of one resource to another, the destination needs to be in the D3D12_RESOURCE_STATE_COPY_DEST
state and the source resource needs to be in the D3D12_RESOURCE_STATE_COPY_SOURCE
state. The resources are transitioned to the correct state on lines 99-100. Before issuing the ID3D12GraphicsCommandList::CopyResource
method, the resource barriers need to be comitted to the command list. This is accomplished using the FlushResourceBarriers
method.
On line 104, the source resource is copied to the destination resource using the ID3D12GraphicsCommandList::CopyResource
method.
To ensure neither the source nor the destination resources go out of scope while the resource is still being referenced by the command list, the resources are added to a list of tracked objects using the TrackResource
method on lines 106-107. Using the TrackResource
method, short lived (temporary) resoruces can be used on a command list without having to track their lifetime outside of the command list. This is useful for generating mipmaps on texture resources when the original texture resource doesn’t support UAV writes, a temporary resource object is created which supports UAV writes. The temporary resource will stay in scope until the command list is reset (when it is finished executing on the command queue).
CommandList::SetGraphicsDynamicConstantBuffer
The SetGraphicsDynamicConstantBuffer
method uses the UploadBuffer
class to update a constant buffer that needs to change often (for example, the world matrix for a model that is changed for every model drawn). For smaller constant buffers (less than 16 32-bit constants) the ID3D12GraphicsCommandList::SetGraphicsRoot32BitConstants
method can also be used to update dynamic constatnt buffer data, but for larger constant buffers, it is probably better to use an upload heap.
1 2 3 4 5 6 7 8 |
void CommandList::SetGraphicsDynamicConstantBuffer( uint32_t rootParameterIndex, size_t sizeInBytes, const void* bufferData ) { // Constant buffers must be 256-byte aligned. auto heapAllococation = m_UploadBuffer->Allocate( sizeInBytes, D3D12_CONSTANT_BUFFER_DATA_PLACEMENT_ALIGNMENT ); memcpy( heapAllococation.CPU, bufferData, sizeInBytes ); m_d3d12CommandList->SetGraphicsRootConstantBufferView( rootParameterIndex, heapAllococation.GPU ); } |
The SetGraphicsDynamicConstantBuffer
method takes the root parameter index, the size of the constant buffer, and a pointer to the constant buffer data in host (CPU) memory as arguments to the method.
On line 785, an allocation for the requested size and required alignment (constant buffers are required to be aligned to 256 bytes) is made using the UploadBuffer::Allocate
method. This method returns a UploadBuffer::Allocation
struct which just contains the CPU and GPU pointers to the memory in the upload heap.
The buffer data is copied to the upload heap using a simple memcpy
function on line 786.
The location of the data in GPU memory is then set using the ID3D12GraphicsCommandList::SetGraphicsRootConstantBufferView
method.
rootParameterIndex
must be set to D3D12_ROOT_PARAMETER_TYPE_CBV
to use the SetGraphicsDynamicConstantBuffer
method described here.CommandList::SetShaderResourceView
The SetShaderResourceView
method uses the DynamicDescriptorHeap
class to stage an SRV to a GPU visible descriptor heap. This method also transitions the resource to the correct state for use as an SRV on the graphics or compute pipelines.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
void CommandList::SetShaderResourceView( uint32_t rootParameterIndex, uint32_t descriptorOffset, const Resource& resource, D3D12_RESOURCE_STATES stateAfter, UINT firstSubresource, UINT numSubresources, const D3D12_SHADER_RESOURCE_VIEW_DESC* srv) { if (numSubresources < D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES) { for (uint32_t i = 0; i < numSubresources; ++i) { TransitionBarrier(resource, stateAfter, firstSubresource + i); } } else { TransitionBarrier(resource, stateAfter); } m_DynamicDescriptorHeap[D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV]->StageDescriptors( rootParameterIndex, descriptorOffset, 1, resource.GetShaderResourceView( srv ) ); TrackResource(resource); } |
As can be seen from the code snippet above, the SetShaderResourceView
method takes a lot of arguments:
uint32_t rootParameterIndex
: The root parameter index to assign the SRV to. The root parameter must be of typeD3D12_ROOT_PARAMETER_TYPE_DESCRIPTOR_TABLE
and thedescriptorOffset
must refer to a descriptor in aD3D12_DESCRIPTOR_RANGE
of typeD3D12_DESCRIPTOR_RANGE_TYPE_SRV
.uint32_t descriptorOffset
: The offset starting from the first descriptor in theD3D12_ROOT_DESCRIPTOR_TABLE.
const Resource& resource
: TheResource
to bind.D3D12_RESOURCE_STATES stateAfter
: The required state of the resource. If the resource is being bound for reading in a pixel shader, it should be transitioned to theD3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE
state, otherwise it should be transitioned to theD3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE
state. The default value for this argument isD3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE|D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE
which allows the resource to be used in either a pixel shader or a non-pixel shader (but it may not be optimal to rely on the combined state).UINT firstSubresource
: The first subresource to transition to the requested state. The default value for this argument is 0.UINT numSubresources
: The number of subresoruces to transition. The default value for this argument isD3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES
.const D3D12_SHADER_RESOURCE_VIEW_DESC* srv
: The SRV description to use for the resource in the shader. By default, this value isnullptr
which uses the default SRV for the resource. It is necessary to specify the SRV description for the resource when a depth buffer resource needs to be read in a pixel shader (for example, when performing shadow mapping).
On lines 939-949 of the SetShaderResourceView
method, the subresources are transitioned to the reqeusted state using the TransitionBarrier
method described earlier.
The SRV matching the SRV description is retrieved from the resource using the Resource::GetShaderResourceView
method (not shown here) and staged to the DynamicDescriptorHeap
using the DynamicDescriptorHeap::StageDescriptors
described earlier.
To ensure the lifetime of the resource, it is added to a list of tracked resources using the TrackResource
method on line 953.
There is also a similar SetUnorderedAccessView
method which performs the same function for UAVs as the SetShaderResourceView
does for SRVs. The SetUnorderedAccessView
method is not described here for brevity.
CommandList::Draw
The Draw
method is used to render geometry to the currently bound render target. Before executing a Draw
command on the command list, all resource barriers must be flushed to the command list using the FlushResourceBarriers
method and any resource descriptors that were staged to the DynamicDescriptorHeap
need to be committed.
1 2 3 4 5 6 7 8 9 10 11 |
void CommandList::Draw( uint32_t vertexCount, uint32_t instanceCount, uint32_t startVertex, uint32_t startInstance ) { FlushResourceBarriers(); for ( int i = 0; i < D3D12_DESCRIPTOR_HEAP_TYPE_NUM_TYPES; ++i ) { m_DynamicDescriptorHeap[i]->CommitStagedDescriptorsForDraw( *this ); } m_d3d12CommandList->DrawInstanced( vertexCount, instanceCount, startVertex, startInstance ); } |
The Draw
method takes the same parameters as the ID3D12GraphicsCommandList::DrawInstanced
method. Since this method was already described in the previous article, it isn’t described here again.
The FlushResourceBarriers
method is used on line 1022 to ensure that any non-pending resource barriers are flushed to the command list before executing the Draw
command.
On lines 1024-1027, the DynamicDescriptorHeap::CommitStagedDescriptorsForDraw
method is use to ensure all of the resource descriptors are bound to the graphics pipeline.
The actual draw command is executed on line 1029 using the ID3D12GraphicsCommandList::DrawInstanced
method.
This are also DrawIndexed
and Dispatch
(for compute shaders) methods in the CommandList
class that look similar to the Draw
method shown here.
View the full source code for CommandList.cpp
Conclusion
In this lesson, I described several classes that you can use in your own projects to create a DirectX 12 framework. The UploadBuffer
class provides a simple method to upload dynamic buffer data (vertex, index, constant, and structured buffers) to the graphics or compute pipeline. The DescriptorAllocator
class is used to allocate (and deallocate) CPU visible descriptors for use with render target views (RTV), depth-stencil views (DSV), shader resource views (SRV), constant buffer views (CBV), and unordered access views (UAV). The DynamicDescriptorHeap
class provides a mechanism to copy CPU visible descriptors to a GPU visible descriptor heap for use in shaders. The CommandList
class brings these classes together to make DirectX 12 graphics programming as easy as possible.
In the next lesson I’ll show you how to add textures to the scene. I’ll also show you how to perform mipmapping in DirectX 12 using a compute shader.
Download the Source
The source code for this project is available on GitHub:
References
[1] Microsoft, “Microsoft/DirectX-Graphics-Samples”, GitHub, 2018. [Online]. Available: https://github.com/Microsoft/DirectX-Graphics-Samples. [Accessed: 22- May- 2018].
[2] D. Graphics, “Variable Size Memory Allocations Manager”, Diligent Graphics, 2018. [Online]. Available: http://diligentgraphics.com/diligent-engine/architecture/d3d12/variable-size-memory-allocations-manager/. [Accessed: 06- Jun- 2018].
[3] “D3D12_DESCRIPTOR_HEAP_DESC”, Windows Dev Center, 2018. [Online]. Available: https://docs.microsoft.com/en-us/windows/desktop/api/d3d12/ns-d3d12-d3d12_descriptor_heap_desc. [Accessed: 15- Nov- 2018].
[4] Microsoft, “ID3D12Device::CopyDescriptors method”, Windows Dev Center, 2018. [Online]. Available: https://docs.microsoft.com/en-us/windows/desktop/api/d3d12/nf-d3d12-id3d12device-copydescriptors. [Accessed: 23- Nov- 2018].
[5] Microsoft, “ID3D12Device::CopyDescriptorsSimple method”, Windows Dev Center, 2018. [Online]. Available: https://docs.microsoft.com/en-us/windows/desktop/api/d3d12/nf-d3d12-id3d12device-copydescriptorssimple. [Accessed: 26- Nov- 2018].
Hi, Jeremiah.
Thank you for all the effort you put into these!
I’ve been following along with your tutorials to make my own engine.
Lesson 3 is a big jump from lesson 2 so I’m referencing the code in the download. Unfortunately, the solution won’t build. I keep getting this error:
C:\WINDOWS\system32\ninja : error : build.ninja:272: bad $-escape (literal $ must be written as $$)
Do you know what the issue is?
Thank you
Michael,
It sounds like you are trying to use the built-in CMake that comes with Visual Studio. If that’s the case, then please try changing the generator from “Ninja” to “Visual Studio 15 2017 Win64” according to the instructions here: CMake in Visual Studio 2017.
If you cloned (or downloaded) the project from GitHub then please use the
GenerateProjectFiles.bat
batch file to generate the Visual Studio project and solution file. The project on GitHub also provides the necessary CMake files needed to generate the project and solution files correctly.Perfect!
Thank you, once again. I’m learning a lot.
Hey, Jeremiah.
Just wanted to say: THANK YOU!
Good job, waiting for upcoming parts 🙂
Best regards,
Artem
I’m building my engine from what you’ve shown here and everything is going well, so far.
While looking through the code I see that the TextureUsage enum is not used anywhere. Is this an oversight or was it something that you thought you’d need later?
Thank you,
Michael
Hey, Jeremiah
Great tutorials, it’s really easy to go through.
One concern I have is with the ResourceStateTracker::ResourceBarrier. When we process barrier with D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES for a resource already seen in this command list and there is at least one subresource that was transitioned individually, we have a for loop through the subresources.
My concern is that we loop only through the subresources that were actually transitioned individually and hence are contained in “SubresourceState” map. So those subresources, which have they state in “State” field and not in “SubresourceState” map will not be transitioned.
Am I missing something?
Cheers!
Maciej,
Wow, I’m really surprised that you picked up on this. This is still an issue that I need to fix. I have to think about how to fix it correctly. I tried appending a transition barrier for ALL subresources (to the correct state) but I think I got a warning when the debug layer is enabled stating that there are multiple barriers transitioning the same resource to the same state. I think I’ll need to do the transitions using a 2-phase approach (first transition all of the subresource that are not in the correct state, flush the transition barriers to the command list, then transition the entire resource using a seperate barrier). Since there were no side effects (in my case) when not appending a resource barrier for ALL subresources, I left it as-is. I still need to go back to this some day…
But thanks for pointing this out! It means that you’re really reading (and understanding) the code examples!
Thanks, I wanted to implement something like this myself so I relly dived into 🙂
One more thing, why don’t you store resources states in the directly Resource class? Having it as a global map costs some performance. Subresources count could be stored too.
Is there a reason why not to do it?
Hi Jeremiah! This time I dived into your implementation of descriptors control and I have a few questions
1. Why 32 bit masks are used for descriptor tables in RootSignature and DynamicDescriptorHeap classes? Root indices may be in range of 0 to 63, since 64 DWORDS are allowed, right?
2. Why descriptors are copied to the gpu visible heap one at a time (all range sizes are equal to 1)? Wouldn’t it be more optimal to store staged descriptors as ranges and copy larger chunks?
3. Why are descriptor copy operation is placed in a loop? Shouldn’t we minimize api calls or it doesn’t matter in this case? All ranges could be stored in a vector and commited in one call to ID3D12Device::CopyDescriptors.
4. Is allocating one descriptor heap per command list ok? Changing descriptor heap is a costly operation that could be avoided if the heap was reused. Although thread safety could become an issue…
Cheers and thanks for the tutorials!
5. Why do we wait for frames instead of fences in DescriptorAllocatorPage::ReleaseStaleDescriptors
I am also very interested in point 4.
It’s easier to keep thread safety and parallelism with one descriptor heap per command list vs one per frame.
Oddly enough, while the MiniEngine (see https://github.com/microsoft/DirectX-Graphics-Samples/tree/master/MiniEngine) is doing something similar, the MS documentation says something different: https://docs.microsoft.com/en-us/windows/win32/direct3d12/descriptor-heaps-overview#the-purpose-of-descriptor-heaps.
Apparently it should be at most one descriptor per frame.
Bogdan,
Indeed I was using MiniEngine as a starting point for my own DX12 framework. The method they (and I use) works well for backwards compatibility with older shader models (before 5.1), however this technique is not well suited for modern API’s. It’s better to have a single descriptor heap (per heap type) for your entire application (as opposed to copying descriptors each frame). This is referred to as “bindless textures” (or more accurately “bindless resources”) and will be the subject of a future tutorial 😉
However, I wonder if there are still some problems. If you set a descriptor heap for the entire application, how do you set the size when creating it. Does it mean to reset the descriptor heap when the size is insufficient
The size of the (GPU visible) descriptor heap should be large enough to handle the next draw or dispatch call. If you run out of descriptors, the framework will create a new one (or take one from the pool). If you don’t have enough descriptors for a single draw call, you should increase the number of descriptors that are allocated per dynamic descriptor heap. The default is 1024, but you can allocate much more per descriptor heap without issues.
Hi, thanks for the tutorial, but one thing I don’t get from lesson 2->3. Why is there a lot of additional code that’s not reviewed? Am I supposed to just copy the Application header and source files to my project and all the additional DX12Lib folder or what is the intention here?
Xardas,
Most of the code that I refactored into the Application class was already shown in Lesson 1 and 2 (creating a device, creating a window, starting the game loop, etc..) so I didn’t feel repeating this for the Application class was very useful. For the 3rd lesson, I wanted to focus on a few complex classes that help to simplify working with DirectX 12 and minimize repetition.
Hello, Thank you for this amazing series of articles.
It really helps getting around the API and it’s still relevant to this day 😀
Regarding the descriptor allocator, is there a particular that it supports allocating n contiguous descriptors ?
In the sources, there are two different callsites where more than one descriptor is allocated at once. But at least from the CPU it seems (to me) that it would be identical to have n different allocation.
(I’m just wondering is I can get away with implementing a full variable size allocator: if we don’t have to support n contiguous descriptor we can than have a simpler fixed-block / slab allocator, right ? (with O(1) allocs))
I have one question… Why introduce additional overhead and simplify DirectX 12 if you want easy graphics programming? There is a lot of ready made graphics packages that makes programming super easy. You can’t have both fast and easy.