Skip to content

Very slow JSON serialization and deserialization and blocking event loop #489

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Luksalos opened this issue Nov 25, 2024 · 1 comment
Open
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Luksalos
Copy link

What is the current behavior?

PrerecordedResponse.from_json(result) (link to code) is very slow, especially for larger inputs. This is due to the Dataclasses JSON library, where they are already aware of that performance issue but haven’t addressed it since 2020. In addition to .from_json(), the .to_dict() operation is also very slow, which one would use if they want to parse the output from the Deepgram SDK into their own Pydantic model.

In our case, for recordings lasting around 1 hour:

source = {"url": signed_url}
options = rerecordedOptions(
        model="nova-2-general",
        diarize=True,
        utterances=True,
        paragraphs=True)
deepgram.listen.rest.v("1").transcribe_url(source, options=options)

The .from_json() takes over 10 seconds. Pydantic parsing takes ~30ms.
For a 7-minute recording, the .from_json() operation took ~1.7 seconds, while Pydantic parsing took ~5ms.

This issue also affects the asynchronous version, where the problem is even more significant as it blocks the event loop for a long time.

Expected behavior

JSON serialization and deserialization shouldn't take that long, and CPU-heavy operations should definitely not block the event loop. Please consider using Pydantic or raw dataclasses.

@Luksalos Luksalos changed the title Blocking Very slow JSON serialization and deserialization and blocking event loop Nov 25, 2024
@jjmaldonis
Copy link
Contributor

Adding __slots__ to the dataclasses may help -- this is worth a quick try. I have not tested, and I don't know if dataclasses actually support __slots__, but adding the class variable can result in dramatic speed improvements.

Overall, my opinion is that dataclasses begin to break down once the scope of their usage extends past the immediate value proposition of dataclasses, and a different implementation tends to work better. Pydantic tends to be used for input validation, which isn't a critically important feature within this SDK because responses do not need to be validated. That said, I'm a big fan of pydantic in general. But choosing a different class implementation may give us the speed and flexibility wins we're looking for. That said, moving away from dataclasses will be a major breaking change.

@jpvajda jpvajda added enhancement New feature or request help wanted Extra attention is needed labels Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy