-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semaphore.h: handle spurious wakeups in TimedWait() on Linux #1021
Conversation
❌ Integration test FAILEDRequested by @dconeybe on commit 26e918b
Add flaky tests to go/fpl-cpp-flake-tracker |
// Return failure, since the timeout expired. | ||
return false; | ||
case EINVAL: | ||
assert("sem_timedwait() failed with EINVAL" == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making sure, you want this to NOT actually assert in release builds, yes? (the default assert behavior?)
@@ -172,7 +172,30 @@ class Semaphore { | |||
return WaitForSingleObject(semaphore_, milliseconds) == 0; | |||
#else // not windows and not mac - should be Linux. | |||
timespec t = internal::MsToAbsoluteTimespec(milliseconds); | |||
return sem_timedwait(semaphore_, &t) == 0; | |||
while (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add a test exercising this failure/fix to semaphore_test.cc? (Or, even better, does re-enabling the disabled MultithreadedStressTest in that file now work?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vindication! One of the nightly test runs failed with this assertion failure: semaphore.h:189: bool firebase::Semaphore::TimedWait(int): Assertion `"sem_timedwait() failed with EINVAL" == 0' failed. https://github.com/firebase/firebase-cpp-sdk/runs/7338303645 According to https://linux.die.net/man/3/sem_timedwait, EINVAL occurs in one of two cases:
|
…s with the TimedWait() fix in #1021
This fixes a latent bug where
Future::Wait(int timeout_milliseconds)
would occasionally return prematurely, when neither the timeout had expired nor the Future been completed. This was due to the implementation ofSemaphore::TimedWait(int milliseconds)
which callssem_timedwait()
on Linux and Android and neglected to check if theerrno
wasEINTR
, in which case the wait should be restarted.This bug surfaced as the integration tests for Firestore's
TransactionTest.TestMaxAttempts
flakily failing due to a call toFuture.Await(int timeout_milliseconds)
returning as if it had timed out when, in fact, no timeout had occurred.Note that this fix only affects Linux and Android (which runs Linux under the hood).