Skip to content

Make map operations deterministic in quorum queues (backport #13971) #14025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 4, 2025

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jun 4, 2025

Prior to this commit map iteration order was undefined in quorum queues and could therefore be different on different versions of Erlang/OTP.

Example:

OTP 26.2.5.3

Erlang/OTP 26 [erts-14.2.5.3] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V14.2.5.3 (press Ctrl+G to abort, type help(). for help)
1> maps:foreach(fun(K, _) -> io:format("~b,", [K]) end, maps:from_keys(lists:seq(1, 33), ok)).
4,25,8,1,23,10,7,9,11,12,28,24,13,3,18,29,26,22,19,2,33,21,32,20,17,30,14,5,6,27,16,31,15,ok

OTP 27.3.3

Erlang/OTP 27 [erts-15.2.6] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V15.2.6 (press Ctrl+G to abort, type help(). for help)
1> maps:foreach(fun(K, _) -> io:format("~b,", [K]) end, maps:from_keys(lists:seq(1, 33), ok)).
18,4,12,19,29,13,2,7,31,8,10,23,9,15,32,1,25,28,20,6,11,17,24,14,33,3,16,30,21,5,27,26,22,ok

This can lead to non-determinism on different members. For example, different members could potentially return messages in a different order.

This commit introduces a new machine version fixing this bug.

The property test added by this PR shows that non-determinism could have occurred (prior to this PR) even on different nodes running the same OTP version (tested on macOS and OTP 27.3.3). Such a test failure can be reproduced by changing the rabbit_fifo machine version from 6 to 5 in the new test case.

I also tested manually that the new test succeeds if the CT node runs OTP 27.3.3 and another locally started Erlang node runs OTP 26.2.5.3.


This is an automatic backport of pull request #13971 done by Mergify.

ansd added 4 commits June 4, 2025 09:17
(cherry picked from commit f293c11)

# Conflicts:
#	deps/rabbit/test/queue_utils.erl
Prior to this commit map iteration order was undefined in quorum queues
and could therefore be different on different versions of Erlang/OTP.

Example:

OTP 26.2.5.3
```
Erlang/OTP 26 [erts-14.2.5.3] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V14.2.5.3 (press Ctrl+G to abort, type help(). for help)
1> maps:foreach(fun(K, _) -> io:format("~b,", [K]) end, maps:from_keys(lists:seq(1, 33), ok)).
4,25,8,1,23,10,7,9,11,12,28,24,13,3,18,29,26,22,19,2,33,21,32,20,17,30,14,5,6,27,16,31,15,ok
```

OTP 27.3.3
```
Erlang/OTP 27 [erts-15.2.6] [source] [64-bit] [smp:12:12] [ds:12:12:10] [async-threads:1] [jit]

Eshell V15.2.6 (press Ctrl+G to abort, type help(). for help)
1> maps:foreach(fun(K, _) -> io:format("~b,", [K]) end, maps:from_keys(lists:seq(1, 33), ok)).
18,4,12,19,29,13,2,7,31,8,10,23,9,15,32,1,25,28,20,6,11,17,24,14,33,3,16,30,21,5,27,26,22,ok
```

This can lead to non-determinism on different members. For example, different
members could potentially return messages in a different order.

This commit introduces a new machine version fixing this bug.

(cherry picked from commit 2db4843)
This commit adds a property test that applies the same Ra commands in
the same order on two different Erlang nodes. The state in which both nodes end
up should be exactly the same.

Ideally, the two nodes should run different OTP versions because this
way we could test for any non-determinism across OTP versions.

However, for now, having a test with both nodes having the same OTP
verison is good enough because running this test with rabbit_fifo
machine version 5 fails while machine version 6 succeeds.

This reveales another interesting: The default "undefined" map order can
even be different using different Erlang nodes with the **same** OTP
version.

(cherry picked from commit 2f78318)
For test case leader_locator_balanced the actual leaders elected were
nodes 1, 3, 1 because they know about machine version 6 while node 2
only knows about machine version 5.

(cherry picked from commit 21b6088)
@mergify mergify bot added the conflicts label Jun 4, 2025
@mergify mergify bot assigned ansd Jun 4, 2025
Copy link
Author

mergify bot commented Jun 4, 2025

Cherry-pick of f293c11 has failed:

On branch mergify/bp/v4.1.x/pr-13971
Your branch is up to date with 'origin/v4.1.x'.

You are currently cherry-picking commit f293c11a0.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   deps/rabbit/test/queue_utils.erl

no changes added to commit (use "git add" and/or "git commit -a")

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@michaelklishin michaelklishin added this to the 4.1.1 milestone Jun 4, 2025
@ansd ansd merged commit aeece38 into v4.1.x Jun 4, 2025
537 of 539 checks passed
@ansd ansd deleted the mergify/bp/v4.1.x/pr-13971 branch June 4, 2025 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy