-
Notifications
You must be signed in to change notification settings - Fork 2.3k
The GPU API #7067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The GPU API #7067
Conversation
(The shader compiler is a separate tool, which operates as a command line app and can be embedded in an external tool or used at runtime, that's at https://github.com/libsdl-org/SDL_shader_tools ... the thing that has to happen here is that compiler work needs to be completed, and then its output needs to be processed by the GPU API.) |
Will SDL_shader_tools continue to live in a separate repo or will it finally be merged into the SDL repo? |
My intention has been to keep it separate, but it's not much code at the moment (~20k lines across five C files, not counting unit tests and other pieces). I'm flexible on the matter. |
Not sure if this is an ideal venue for API feedback, but I'm not a huge fan of manually exposing Fences as a client-side resource. Since their intended purpose is to query command buffer status and wait on command buffer completion, I think these should be command buffer APIs ( Making fence submission optional on the client side also has the side effect of making a Vulkan implementation significantly more complicated, since in most Vulkan renderers, command buffers always need a fence to manage and clean up internal state. This means that the backend would need to manage fences internally anyway, and since Vulkan only allows you to submit one fence per command buffer submission, internal cleanup and user-side notifications would have to be conflated. This seems unnecessarily complicated. |
This is good feedback, but let me see if I can transfer the original megathreads over from my repo to libsdl-org's real fast. |
(Apparently I can't, so I'm going to paste your comment in there so I don't lose it, since we might merge this before all feedback is incorporated.) |
Since this quite advanced, compiling, has a back-end, maybe this should be merged shortly ... Just to make sure, for Metal, for instance, we expect the current SDL_Renderer back-end to be removed, and the the GPU metal be used instead, with minor implementations so that SDL_Render/Metal can work using GPU interface underneath ? I don't have any comment on the interface because I not familiar enough with this. Should we check if there are any future roadmap/subject with vulkan/metal/d3d12 that would make the interface change ? Can maybe we could add or think of a simpler 3d interface (in addition to the current one) ? ( maybe this doesn't make sense because of complexity). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW the objective-c version of CFBridgingRetain
and CFBridgingRelease
is (__bridge_retained void*)thing
and (void)(__bridge_transfer CocoaType*)thing
, though both work equally well. The __bridge_retained
would also let you (__bridge_retained SDL_MetalView)thing
instead of having to bridge then cast.
// !!! FIXME: does this need bridging, or can we just assign and let ARC handle it? | ||
#ifdef __MACOSX__ | ||
layer = (CAMetalLayer *)[(__bridge NSView *)windata.mtlview layer]; | ||
#else | ||
layer = (CAMetalLayer *)[(__bridge UIView *)windata.mtlview layer]; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean the (CAMetalLayer *)
, no bridging is needed since the return type of layer
is already an ARC type
|
||
texturedata.mtltexture = nil; | ||
|
||
// !!! FIXME: does ARC know what to do with these, since it doesn't start with "alloc" or "new"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ARC knows what methods transfer ownership and what methods don't
mtlpipedesc.alphaToCoverageEnabled = NO; | ||
mtlpipedesc.alphaToOneEnabled = NO; | ||
mtlpipedesc.rasterizationEnabled = YES; | ||
mtlpipedesc.rasterSampleCount = 1; // !!! FIXME: multisampling (also, how is this different from sampleCount?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sampleCount
is API_DEPRECATED_WITH_REPLACEMENT("rasterSampleCount")
, so I'd guess they're the same (it's just a rename).
Yes. There's a need for a cross-platform graphics abstraction layer written in simple C/C++ with a C interface. There's not really a better option at the moment than this proposed SDL GPU API. WebGPU attempts to do something similar, but its current implementations are written in Rust (wgpu) or complex C++ with very heavy dependencies (dawn). These runtimes also include a runtime-only textual shader transpiler for WGSL, which adds complexity and binary size that SDL can avoid by using precompiled bytecode shaders instead. BGFX is another alternative, but it is primarily intended for C++. It technically has a C interface, but it is very clear when using it that it is a C++-first library. It also has its own bespoke build system instead of using something standard like CMake. sokol_gfx is a popular single-header graphics library written in C99, but it only supports OpenGL, Metal, and D3D11. No Vulkan, D3D12, or console API support. It also requires making separate builds of your game for each supported backend/shader language. For applications that already use SDL, having a full cross-platform rendering API right out of the box streamlines the decisions the application developer needs to make. Having one unifying API automatically answers the question of "it worth it to target {Platform X} if I have to add a whole new rendering backend?" Once SDL's implementation is far enough along, we're actually planning to replace most of the existing backends in FNA3D (a graphics abstraction library to emulate XNA graphics behavior on top of various APIs) with a single backend targeting SDL_GPU. So that's at least one concrete use case for this API. 😄
All three APIs are generally good about retaining backwards compatibility, and none of them have fundamentally changed since their introduction. They map fairly close to the workings of modern GPUs, so unless GPU architecture changes dramatically, I expect a wrapper interface for them to be future-proof for at least the next decade or so. |
My wish is to tell people you can't get at the lowlevel specifics of the system, because this caused an enormous amount of grief in SDL2, and led to things like SDL_GL_BindTexture and stuff exposing all sorts of internals in risky ways. That one can choose Metal as a backend at all is because there will be cases where it's desirable to require Metal vs OpenGL or whatever because different drivers work better or worse, or we're debugging a specific backend, etc. That being said, I'll probably lose this fight eventually and let people bypass the GPU API and hook in to the lowlevel API directly. That being said as well: unlike OpenGL, it's not awful to slot one's own Metal/Vulkan/D3D12 code in next to this, since you can do so without stepping on internal state. One can build out a command buffer and slot it into the queue without affecting any command buffers the GPU API wants to slot before or after it. At least I hope so. But also, the hope is that one can get a lot more done without resorting to using the lowlevel API to fill in the gaps. So we'll see how it goes.
There are places we are intentionally limited (for example, there aren't compute shaders at all right now), but things I've intentionally avoided can be added later without hacks. Mostly I've avoided them in the name of reducing complexity, as I think there's probably a sweet spot between what most devs need and what Unreal Engine 5 needs, and I'm looking to service the former at the start.
I think we're (currently) at the point that the major additions to these APIs are ways to make the current stuff faster or more pleasant to use, and less so about new features, and even less so about big paradigm shifts...Most of Vulkan 1.1, 1.2, and 1.3 are about pulling existing extensions into the core spec, which is to say, one can survive without any of them.
We can always do this (or an external library could layer one on top of this API), but it doesn't take long to find oneself overwhelmed with a gigantic API that always demands one more thing. Part of what's nice about these next-gen APIs is that their surface area is pretty small. What isn't nice about them is sometimes they chose potentially unnecessary flexibility over simplicity, which is part of what we're solving here, or at least I hope so. (The word "hope" gets thrown around a lot in here because so much of this is up in the air still and we won't know some things until we get a little further and discover our mistakes. So if something looks wrong to you, please do fight me on it.) |
I like the API except for uniforms:
I don't know how simple, low/high level you want to go. For command buffer, you allocate and submit it, SDL frees it once done with it. I guess something similar could be done with CPU/GPU buffer. |
Do you think it would be sensible to require ARB_direct_state_access for an eventual OpenGL backend? It's part of Core 4.5 and might somewhat help with the internal state problem. I'm assuming at least GL 4.0 will be required; most up to date drivers supporting that will probably have the extension as well. |
For uniform buffers, there are common strategies that don't involve a ton of setup (e.g. having one big ring buffer you write to with an increasing offset for each object, instead of having per-object buffers.) Also with OpenGL local uniforms, you need an API call per uniform x shader combination which ends up being a lot compared to uniform buffers.
I think most graphics drivers that support OpenGL DSA also support vulkan - and OpenGL drivers tend to be less maintained especially around newer features like that. |
typedef enum SDL_GpuPixelFormat | ||
{ | ||
SDL_GPUPIXELFMT_INVALID, | ||
SDL_GPUPIXELFMT_B5G6R5, | ||
SDL_GPUPIXELFMT_BGR5A1, | ||
SDL_GPUPIXELFMT_RGBA8, | ||
SDL_GPUPIXELFMT_RGBA8_sRGB, | ||
SDL_GPUPIXELFMT_BGRA8, | ||
SDL_GPUPIXELFMT_BGRA8_sRGB, | ||
SDL_GPUPIXELFMT_Depth24_Stencil8 | ||
/* !!! FIXME: some sort of YUV format to let movies stream efficiently? */ | ||
/* !!! FIXME: s3tc? pvrtc? other compressed formats? We'll need a query for what's supported, and/or guarantee it with a software fallback...? */ | ||
} SDL_GpuPixelFormat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No floating point formats here? It's required for HDR rendering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is leftover from when the whole thing was a small bump over the 2D rendering API, and I was trying to keep it limited. These can be added.
SDL_GPUPIXELFMT_BGRA8_sRGB, | ||
SDL_GPUPIXELFMT_Depth24_Stencil8 | ||
/* !!! FIXME: some sort of YUV format to let movies stream efficiently? */ | ||
/* !!! FIXME: s3tc? pvrtc? other compressed formats? We'll need a query for what's supported, and/or guarantee it with a software fallback...? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll definitely need a query. Software fallback is probably not a good idea. There are so many compressed formats out there, including a decoder for each one of them is a lot of bloat that seems to contradict the 'lightweight' nature of this API. Even if we do include fallbacks, a query is needed anyway, because applications that support multiple compressed formats (via Basis Universal for example) will want to know which ones are actually compressed in VRAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to avoid this, but there isn't a lot of agreement about what compressed texture types are universally available, so yeah, I think we'll be adding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately there's no such thing as "universally available" when it comes to compressed textures. On the desktop that would probably be S3TC, and probably BPTC/BC7 at the feature level you're targeting. But those are rarely, if ever, supported by mobile GPUs. Those often have ETC/ETC2 (OpenGL ES 3 requires those) and probably ASTC. ETC2 support is also required for desktop GL 4.3 as part of ARB_ES3_compatibility, but that's usually implemented as a software fallback. This sad state of affairs is why Basis Universal exists in the first place.
Possibly, but there are other problems with OpenGL that are going to be more difficult to overcome, so this is still an unknown in general. |
We might put in some sort of helper function here: SDL_ReplaceGpuBuffer(void *data, size_t len, SDL_GpuBuffer *buf); ...to say "this entire block of things is changing, so take this pointer and update this GPU buffer, and queue that blit operation now, so it'll be ready when the draw commands I'm about to queue run." Behind the scenes, SDL keeps a pool of SDL_CpuBuffers around that copy the data immediately, queues a blit operation and returns, so the app can continue on immediately and not worry about keeping their copy of the data around. I assume, more or less, this is what glUniform is doing behind the scenes, too. |
It's a little more complicated with GL uniforms, because the drivers put frequently updated (or just sufficiently small) uniforms into push constants instead of buffers. Or whatever is the non-vulkan term for those. That's much more efficient for small stuff you update basically every draw, and so SDL should probably expose a similar concept. |
Push constants are a level of complexity I'm trying to avoid, honestly. Also notable that WebGPU gave up on them, at least for the time being. (EDIT: Removing that link in hopes that I won't have a million WebGPU people wandering in here. Probably too late.) |
I am used to OpenGL but It makes sense that modern graphic API do thing differently. It is better to keep the API simple and put a "Porting an OpenGL application" section in the documentation that explains how to translate OpenGL concept to the new API. |
Wouldn't that need a blit command encoder? If not, what would this be expected to do?
As a human reading this, my assumption would be (2) uses old data, while (4) uses new data. (AFAIK this is what would happen in OpenGL.) If you made SDL_ReplaceGpuBuffer take a blit command encoder, it would be obvious when it happens, and wouldn't lead to people making wrong assumptions about ordering Edit: Thinking about it, Metal's |
typedef struct SDL_GpuDepthStecilDescription | ||
{ | ||
Uint32 stencil_read_mask; | ||
Uint32 stencil_write_mask; | ||
Uint32 stencil_reference; | ||
SDL_GpuCompareFunction stencil_function; | ||
SDL_GpuStencilOperation stencil_fail; | ||
SDL_GpuStencilOperation depth_fail; | ||
SDL_GpuStencilOperation depth_and_stencil_pass; | ||
} SDL_GpuDepthStecilDescription; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nitpick time: This struct's name is typo'd 😛
What's the status of this mr/branch? |
It's still in progress. I've been buried under other SDL3 tasks the last several months. |
This was meant to be a nod to Vulkan, which can take an array of command buffers to submit in a single call (presumably with some chance of atomicity), but this turned out to be a hassle for the implementation, and honestly who actually needs this?
This hasn't even been compiled yet--I didn't write it on a Mac!--so this likely has obvious syntax problems and copy/paste mistakes, etc. This implements everything but the elephant in the room--shaders--since I'm still deciding what I want to do there, so even once the thing compiles you can't use it yet. But this is a big chunk of progress!
Still doesn't _do_ anything, as we don't have shader support figured out at any level yet, or have the higher level API symbols exported from SDL itself yet.
SDL started requiring ARC since I wrote the Metal code, so I've rebased this branch passed that point and removed the check.
This actually works enough to run test/testgpu_simple_clear.c correctly on a Mac, if you statically link SDL (since none of the GPU API entry points are exported from the shared library or in the dynapi table yet), which is surprisingly motivating for something that does so little. Obviously we need to figure out the shader plan before anything else is going to work, though.
…age. This won't actually work, because we're actually going to need bytecode here, not something we compile, but it's better than having the GLSL code there as a placeholder.
It's an external tool that can be used offline or embedded into an app for use at runtime, but SDL itself isn't going to embed a compiler.
This is still in flux, but then again, so is SDL3, and the warnings make the buildbots (with -Werror) fail.
We're not wired into the dynapi yet, so they'll fail to link.
I'm closing this PR; we're all working out of #9312, which turned out to be quite similar to this API but is getting a lot more effort applied to it. I might mine this PR for tiny pieces, and this thread for its various advice (which I appreciated greatly, everyone!). I encourage everyone to move over to the new PR and give feedback, as the API isn't locked down yet. |
This is the SDL3 GPU API.
There's still work to be done (some texture format additions, etc, but notably: we need to load shaders!), but this is the bulk of the initial work.
This has a Metal implementation, and a "dummy" implementation, at the moment. I've adjusted it for SDL3's naming conventions, but there are likely small things to tweak still, and documentation to fill in, and backends to implement, etc.
But I'd like this to go into revision control sooner than later, so we're all looking at the same thing and it won't need to keep merging from the main repository.