Move add_to_alignment logic to BufferVec #9928

nicopap · 2023-09-26T07:41:50Z

Objective

The "Uninitialized buffer uniform tail" trick was both used by skinning and morphing.

We should abstract this and merge them to have a consistent and explicit implementation. We may also take the opportunity to optimize it.

Solution

Move the add_to_alignment logic to a BufferVec impl block
Make the const part of the calculation const, and panic at compile time when alignment is impossible.
Devise a new way to extend the BufferVec that is as efficient as possible.

The goal is to avoid the overhead of push, which involves checking for available capacity each iteration.

We set a new capacity and set the newly allocated memory positions to zero. Because using any other Vec method makes rust too dumb to do any optimizations on this.

Alternatives

I've tried a lot of different approaches to improve perfs.

Using `buffer.values.extend((0..to_add).map(|_| T::zeroed())`

While in isolated code, this inlines the whole operation, in the context of the extract systems, it still calls SpecializedExtend as an external function, and is slower than the current while solution.

I can confirm that in isolated solutions, this is the best, because Range has a TrustedLen impl, this allows the compiler to remove a lot of bound checks, which makes the optimizers more capable. In contrast to iter::repeat(T::zeroed()).take(to_add).

Using a zeroed vector

This does an allocation, and rust is not capable of using calloc on bytemuck::Zeroable types¹, so it allocates the vec and pushes zeros to it, then calls ptr::copy_nonoverlapping to copy them at the end of the buffer.values. I'm not sure it is any gain from other solutions, especially when we expect the additional zeros to be between 4 and 64.

Using `set_len` without initialization

This is very unsafe, as it breaks an important invariant of Vec (no unintialized memory within len). It is unsound in rust to have any value be uninitialized, even stuff like i32 where all bit patterns are accepted, because "uninitialized" in C terms means the value is not fixed, which breaks a lot of rust assumptions. But according to my research, it should be sound. As the values of the value field of BufferVec are never read (so fixedness is irrelevant). In fact, wgpu does handle it like FFI data, using ptr::copy_nonoverlapping and passing it directly to the driver.

For our specific use-case of add_to_alignment, it's fine, because even in the shader, we do not read the uninitialized values. I didn't test perfs on the current iteration, but for this, we get a 3% speedup on extract_skinned_meshes.

However, this requires disabling a forbid clippy lint. I'm comfortable enough to say "this is fine" but I suspect this would be rejected by most of the community.

`push` with explicit alloc elision

let mut my_vec = Vec::new();
my_vec.reserve(12)
for _ in 0..12 {
  my_vec.push(0);
}

Would you believe that this generates a capacity check for each loop iteration? We know we will never overflow capacity though! Here is the way to remove them

let mut my_vec = Vec::new();
my_vec.reserve(12)
for _ in 0..12 {
  if my_vec.len() == my_vec.capacity() {
    unsafe { std::hint::unreachable_unchecked() };
  }
  my_vec.push(0);
}

When applying this to the add_to_alignment method, we get something pretty nice. But we still, for some reasons, have individual increments of the len field, and each 0 is added individually.

prefer consts

One important insight is that the compiler handles much better values derived from consts.

So instead of:

let len = buffer.values.len();
let t_aligned_len = div_ceil(len, t_align) * t_align;
let to_add = t_aligned_len - len;
buffer.values.extend((0..to_add).map(|_| T::zeroed())

We could do:

buffer.values.extend((0..t_align).map(|_| T::zeroed());
buffer.values.truncate(t_aligned_len);

From my testing, this helps a lot, because t_align will be known at compile time, since it is directly derived from a constant value, and the compiler is more capable of optimizing around that.

Note that Vec has a specialized implementation that supports calloc when initializing zeroed vectors, but only on std types such as integer types. ↩

janhohenheim · 2025-05-17T17:01:34Z

Triage: has merge conflicts and is draft
@lkolbly do you want to update this PR or should I tag it S-Needs-Adoption? :)

Move add_to_alignment logic to BufferVec

b85bb12

nicopap added A-Rendering Drawing game state to the screen C-Code-Quality A section of code that is hard to understand or change labels Sep 26, 2023

nicopap added this to the 0.13 milestone Oct 25, 2023

alice-i-cecile removed this from the 0.13 milestone Jan 24, 2024

janhohenheim added the S-Waiting-on-Author The author needs to make changes or address concerns before this can be merged label May 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Move add_to_alignment logic to BufferVec #9928

Move add_to_alignment logic to BufferVec #9928

Uh oh!

nicopap commented Sep 26, 2023 •

edited

Loading

Uh oh!

janhohenheim commented May 17, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

Move add_to_alignment logic to BufferVec #9928

Are you sure you want to change the base?

Move add_to_alignment logic to BufferVec #9928

Uh oh!

Conversation

nicopap commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Alternatives

Using buffer.values.extend((0..to_add).map(|_| T::zeroed())

Using a zeroed vector

Using set_len without initialization

push with explicit alloc elision

prefer consts

Footnotes

Uh oh!

janhohenheim commented May 17, 2025

Uh oh!

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

nicopap commented Sep 26, 2023 •

edited

Loading

Using `buffer.values.extend((0..to_add).map(|_| T::zeroed())`

Using `set_len` without initialization

`push` with explicit alloc elision