Skip to content

k/generator: do not use standard vectors for offset commit protocol #26280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented May 29, 2025

In transactional and standard offset commit request and replies topic partition offsets are stored as a vectors. The number of partitions/topics in a single request may grow large, therefore we should use a fragmented vector when handling those requests.

Changed the Kafka protocol generator overrides for offset commit requests and responses to use default chunked_vector

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.1.x
  • v24.3.x
  • v24.2.x

Release Notes

Improvements

  • Fixed large allocation issues when handling OffsetCommits

Implementation of `then()` uses a `std::forward<Func>` when creating
continuation. Passing in an lvalue ref to `fut.then(...)` forces the
copy. Replaced it with lambda capturing the reference to `func` not to
have to copy the whole `func`

Signed-off-by: Michał Maślanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv force-pushed the fix-large-allocation-in-offset-commits branch from ca99c81 to dc94b03 Compare May 29, 2025 10:22
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 29, 2025

Retry command for Build#66564

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/transactions/tx_atomic_produce_consume_test.py::TxAtomicProduceConsumeTest.test_basic_tx_consumer_transform_produce@{"with_failures":false}
tests/rptest/transactions/transactions_test.py::TransactionsTest.change_static_member_test
tests/rptest/transactions/transactions_test.py::TransactionsTest.rejoin_member_test
tests/rptest/transactions/producers_api_test.py::ProducersAdminAPITest.test_producers_state_api_during_load
tests/rptest/transactions/tx_verifier_test.py::TxVerifierTest.test_all_tx_tests

@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#66564
test_class test_method test_arguments test_kind job_url test_status passed reason
CloudRetentionTest test_cloud_retention {"cloud_storage_type": 2, "max_consume_rate_mb": null} ducktape https://buildkite.com/redpanda/redpanda/builds/66564#01971bc9-e123-4446-bc35-ac695c8db16d FLAKY 20/21 upstream reliability is '97.51552795031056'. current run reliability is '95.23809523809523'. drift is 2.27743 and the allowed drift is set to 50. The test should PASS
ProducersAdminAPITest test_producers_state_api_during_load ducktape https://buildkite.com/redpanda/redpanda/builds/66564#01971bc9-e122-468d-87b6-d7b060cae925 FAIL 0/21 The test has failed across all retries
TransactionsTest change_static_member_test ducktape https://buildkite.com/redpanda/redpanda/builds/66564#01971bc9-e122-468d-87b6-d7b060cae925 FAIL 0/21 The test has failed across all retries
TransactionsTest rejoin_member_test ducktape https://buildkite.com/redpanda/redpanda/builds/66564#01971bc9-e122-468d-87b6-d7b060cae925 FAIL 0/21 The test has failed across all retries
TxAtomicProduceConsumeTest test_basic_tx_consumer_transform_produce {"with_failures": false} ducktape https://buildkite.com/redpanda/redpanda/builds/66564#01971bc9-e123-4446-bc35-ac695c8db16d FAIL 0/21 The test has failed across all retries
TxVerifierTest test_all_tx_tests ducktape https://buildkite.com/redpanda/redpanda/builds/66564#01971bc9-e122-468d-87b6-d7b060cae925 FAIL 0/21 The test has failed across all retries

In transactional and standard offset commit request and replies topic
partition offsets are stored as a vectors. The number of
partitions/topics in a single request may grow large, therefore we
should use a fragmented vector when handling those requests.

Changed the Kafka protocol generator overrides for offset commit
requests and responses to use default `chunked_vector`

Signed-off-by: Michał Maślanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv force-pushed the fix-large-allocation-in-offset-commits branch from dc94b03 to 50cf97b Compare May 29, 2025 15:47
Comment on lines -71 to +72
return fut.then(func).handle_exception(
[&eptr](std::exception_ptr ex) mutable {
return fut.then([&func] { return func(); })
.handle_exception([&eptr](std::exception_ptr ex) mutable {
Copy link
Member

@dotnwat dotnwat May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks ok, but i'm having trouble determining if this is safe w.r.t. to guarnateeing that the retry function will always execute with a fresh version of it's original captures. i think that is true if func is the user provided function, but i'm less sure that is true for gated_retry_with_mitigation_impl which wraps the user function and how the const-ness of operator()() propogates to the captured function.

maybe @BenPope knows. i've seen this exact function have similar proposals made recently and I'm not sure where things stand with those changes and if there are footguns here.

--

more broadly, though, and not necessarily blocking this PR: do we want to optimize for large captures? it seems like the caller could always arrange for a large object to be captured as a reference, shared pointer, etc...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relevant? #26215 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the static assert pass, i.e. it should not be possible to do std::move() in the body of operator()() of the func

@mmaslankaprv mmaslankaprv requested a review from dotnwat May 30, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy