Fix APNs P8 HTTP2 error recovery #734
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Original pull request: #723
Rebased off master and added a couple tests.
I'm not sure how best to interact with the original PR, so just cherry-picked in the original commits and made a new branch + pr.
Let me know if there would be a better way to do it.
Original PR Description:
Currently RPush uses faulty logic when determining if a notification should be retried. It only recognizes certain responses from APNs as needing a retry. Which means that any failure to actually deliver the notification will result in the notification being marked as failed.
Clearly, we should retry the notification unless - and until - we receive a response from APNs with a permanent failure code, or the notification reaches its maximum number of retries.
It is very common to receive Errno::ECONNRESET: Connection reset by peer errors from the APNs connection, and this issue has not been resolved in RPush. See #607. This error occurs because Apple closes the connection on their end after some period of inactivity. It's also common to receive OpenSSL::SSL::SSLError errors from the HTTP2 socket.
When an error occurs in the NetHttp2 client, it calls the error callback registered in create_http2_client and swallows the error. This means that the Delivery class never sees it and cannot handle it. So the notification remains in the processing state indefinitely. The threads which NetHttp2 create have abort_on_exception set to true, so we cannot
raise the error there otherwise it will immediately terminate the whole process. So we store it on the client and check for it after the join call. If we see an error then we raise it in the perform method so that it can be handled appropriately by either retrying the notification or marking it as failed.