Deep Dive: Our workers' based synchronous stack #2317

WebReflection · 2025-03-17T11:57:25Z

WebReflection
Mar 17, 2025
Maintainer

Have you ever wondered how is it possible that we don't block the main thread (the visited page) while executing remote code that can synchronously read or write any window reference? ... well, you are about to find it out, at least on a high level 👍

Worker, Proxy, SharedArrayBuffer & Atomics

These are the ingredients to make it all happen and the reason for intensive UI tasks, things might feel not as fast as it is when everything just works on the main thread. The current issue with the latest approach though, is that if your code has a while True: loop, if your code takes long time to bootstrap, or your code asks for an input, there's not much we can do if not indirectly crashing the browser or its tab, block entirely interactivity or "stop the world" for a window prompt that must be answered or dismissed.

The solution to all this is to use workers, which operate in a separate thread, being that a hyper one or a whole different CPU core.

The roundtrip's Tango

In a worker, we type, as example, window.location.href to retrieve the current page:

the window Proxy intercepts the need to access its location field
it creates a SharedArrayBuffer with 8 bytes (room for 2 int32 values, one to notify and one to provide the resulting length of the outcome)
it postMessage to the main thread, attaching what the proxy needs to know as details, such SharedArrayBuffer, then it waits synchronously for a notify operations (via Atomics.notify)
the main applies those Proxy details to its window and creates a unique identifier of that location object as result
- it serializes as binary the resulting "JSON" representation of that outcome (note: this is not the object, rather an abstraction of that object that can be reused later on from the worker as known identity)
- it stores the length of such binary content as int32 at index 1 of the int32 array in charge of viewing the SharedArrayBuffer
- it notifies at index 0 with a positive integer that there is a length known to grab content
the worker unlocks itself on notify, it reads the length of the binary result and, until now, it postMessage a new SharedArrayBuffer with enough room to store that binary data + 4 bytes to wait for the next notification
the main recognizes the follow up requests
- it sets the previously serialized as binary result into the SharedArrayBuffer, starting at byteOffset 4
- it notifies at index 0 that the operation has been completed
the worker unlocks itself and then:
- it grabs from byteOffset 4 to 4 + length previously communicated the binary content that was returned
- it parses or unserialize such binary content as JS reference/value where, if such reference is not primitive, like objects or arrays or functions, it maps that reference once as Proxy, so that this whole dance can be performed again with location.href

At the end of this convoluted orchestration we'll have that string value representing the current location href ... is this madness?

Well, somehow ... yes, but this dance is basically the same Pyodide or MicroPython or any FFI usually does: things are mapped bi-directionally, hooks in the Garbage Collector are created to avoid caching too heavily all possible references, nothing is usually strongly referenced because these programming languages are strongly dynamic.

... but how can a location.href change over time on the same page?

That's a lovely question and the simple answer is that we don't really know ahead of time what users are asking for, we can only guarantee that whatever they have asked provided a meaningful, and as fast as it can be, result.

Our polyfills' role

The SharedArrayBuffer primitive has, unfortunately, historic reasons to not be always available unless special headers, provided by the server or a special Service Worker, are in place, and we orchestrated 2 variants that solve the issue, in a way or another, but both variants add some low to high overhead:

the Service Worker variant is to enable SharedArrayBuffer by simulating correct headers for any request:
- it has the least overhead, it requires a tiny extra synchronous script on the page, see the project's page for extra details
- it plays almost natively well, yet it needs to intercepts all network calls and that's slightly slower than native network requests with correct headers in place
the SharedArrayBuffer Always On variant, which allows async interactions but it requires still a Service Worker to grant synchronous interactions:
- differently from mini-coi, it doesn't need to change all network requests' headers to work
- sadly, compared to mini-coi, the synchronous interaction is 2X up to 10X slower per each single synchronous Proxy operation

When neither variants are fully available, you can see an error in PyScript devtools that states:

⚠️ unable to use window and document

This does not mean that PyScript won't work though, it means that the whole magic provided by SharedArrayBuffer and Atomics.wait cannot practically be used so it's not possible to simulate synchronous code at that point.

There is still room for improvements!

Nowadays, both ArrayBuffer and SharedArrayBuffer instances can grow in size overtime, this wasn't true 2.5 years ago when we first sketched this whole worker/main dance.

On top of that, there are better primitives to deal with binary data, such as DataView, that helps to avoid duplicating the amount of RAM needed to serialize or deserialize, where TypedArrays as views do a wonderful job at being thin abstract layers to deal with, dropping any unnecessary bloat.

So let's see how our initial plan/dance can be improved now, keeping the window.location.href example as reference, from a worker:

the worker allocates once a SharedArrayBuffer that can grow up to a few megabytes but starts as tiny as possible (64K or something similar, the upper bound should rarely be reached or needed at all)
the window Proxy intercepts the need to access its location field
it postMessage to the main thread, attaching what the proxy needs to know as details, always the same SharedArrayBuffer, then it waits synchronously for a notify operations (via Atomics.notify)
the main applies those Proxy details to its window and:
- it serializes as binary directly, from byteOffset 4, into the SharedArrayBuffer the result, keeping the ability to grow on demand behind the scene
- it notifies at index 0 with a positive integer that everything is ready
the worker unlocks itself and then it deserializes directly as JS value from byteOffset 4

Done 🥳 ... or better, there is no need anymore to:

do postDance twice to have a length and a content
duplicate the amount of RAM until the window can pass the serialized data to the second SharedArrayBuffer
stringify and parse content out of serialized data ...

In few words, what required 7 steps (ask/serialize/binary/length -> retrieve/binary/deserialze) and 2X the RAM, could instead use 3 steps (ask/binary-serialization -> binary-deserialization), reducing complexity and code bloat too, while improving performance by quite some margin.

An extra detail ...

I need to figure out if using a SharedWorker to create a unique SharedArrayBuffer that can be used across all main and their workers would work, which goal is to have predictable amount of RAM needed to orchestrate one to dozens tabs running PyScript with one to many workers ... that requires also Web Lock API but if it can be done for SQLite I believe it could be done for our project too ... still surfing the edge of modern APIs but hey, we just want the best by all means 😇

We are close but not there yet

If you have followed recent community calls, you are probably bored by me demoing and benchmarking all the things but here an update of our current state:

I have managed to create an ad-hoc binary serializer which goal is simplicity and performance
I am planning to fully integrate that in coincident, our Proxy orchestrator, so that we can cut a lot of unnecessary steps in between communications
I am planning to port that into Polyscript, our interpreters engine used by PyScript to then ...
use this new simplified and (hopefully) faster stack in PyScript 🤩

Are there unknowns?

At the logical level, I expect performance to improve that's a natural consequence of removing intermediate steps and reduce by 2 the time it takes to ask and retrieve data but, on the other hand, I cannot measure concretely improvements until this has been done ... I mean, I could hack something around but that won't still provide real-world improvements so maybe I should not focus on that.

Last, but not least, it's unclear if/how I can polyfill this new stack in sabayon too but I also expect that dance to be even faster because it won't need more than a single and synchronous XMLHttpRequest to retrieve data, as opposite of 2 plus the whole roundtrip there done to figure out who asked for what (multiple tabs using PyScript as example).

Conclusion

I am very glad I've managed to tackle all performance issues that were either hidden in implementations (current serializer we use at the moment) or logic (not updated usage of most modern APIs) to actually being able to "draw" on the board what is the current goal and I hope everything will work as planned and that it won't take too long to have a release that showcases performance are finally reasonable enough to stop using the main thread so please follow this space to know further progresses and don't be afraid to ask anything you'd like to ask about these topics 👋

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deep Dive: Our workers' based synchronous stack #2317

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Deep Dive: Our workers' based synchronous stack #2317

Uh oh!

Uh oh!

WebReflection Mar 17, 2025 Maintainer

Worker, Proxy, SharedArrayBuffer & Atomics

The roundtrip's Tango

Our polyfills' role

There is still room for improvements!

An extra detail ...

We are close but not there yet

Are there unknowns?

Conclusion

Replies: 0 comments

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

WebReflection
Mar 17, 2025
Maintainer