-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raycluster_controller: generate events for failed pod creation #2286
raycluster_controller: generate events for failed pod creation #2286
Conversation
93697a9
to
36859cf
Compare
ac2f2a7
to
3f02a29
Compare
3f02a29
to
3384881
Compare
ray-operator/controllers/ray/raycluster_controller_unit_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/raycluster_controller_unit_test.go
Outdated
Show resolved
Hide resolved
ray-operator/controllers/ray/raycluster_controller_unit_test.go
Outdated
Show resolved
Hide resolved
b975209
to
dff7bf6
Compare
Generate events for when the raycluster_controller fails to create: - Head pods - Worker pods The event generated has EventTypeWarning as the event type. This commit also introduces the following event reasons as constants for ease of use and testing: - FailedToCreateResource ("Failed") - CreatedResource ("Created") - DeletedResource ("Deleted") This commit additionally adds in tests to verify this behaviour. Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
dff7bf6
to
e9a25d9
Compare
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
4685e9b
to
1413ce3
Compare
@rueian I've added failure events for all failed requests made to the Kubernetes API as part of this PR itself. Can you please take an initial look? I'll need a day or two to add in the tests as well, we can probably refactor it and add a single high level test for all events. |
Also, currently, I've used the same event const for all purposes. Do we want to go a similar route? |
Yes, we indeed need different event reasons for different categories. The current reason list (Failed, Created, and Deleted) is not enough. That is because the EventCorrelator will aggregate events by fields except the https://github.com/kubernetes/client-go/blob/master/tools/record/events_cache.go#L424-L438 |
000497f
to
97aa676
Compare
@rueian thanks, that's super helpful! Can you please take a look at the latest commit? It adds in event reason types for the following buckets:
Please lmk if this makess sense. Thank you! |
… add failure events This commit introduces event reasons for different resources so that the EventCorrelator can collapse similar events. This commit also adds in failure events for all failed API requests to the Kubernetes API. Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
97aa676
to
37395f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will open a follow up PR to fix my comments.
|
||
// Worker pod event list | ||
const ( | ||
CreatedWorkerPod = "CreatedWorkerPod" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used.
return err | ||
} | ||
logger.Info("Created pod", "Pod ", pod.GenerateName) | ||
r.Recorder.Eventf(&instance, corev1.EventTypeNormal, "Created", "Created worker pod %s", pod.Name) | ||
r.Recorder.Eventf(&instance, corev1.EventTypeNormal, CreatedHeadPod, "Created worker pod %s/%s", pod.Namespace, pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CreatedWorkerPod
@@ -944,10 +1015,11 @@ func (r *RayClusterReconciler) createService(ctx context.Context, raySvc *corev1 | |||
logger.Info("Pod service already exist, no need to create") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: rename raySvc
.
Why are these changes needed?
Generate events for when the raycluster_controller fails to create:
The event generated has EventTypeWarning as the event type. This commit also introduces the following event reasons as constants for ease of use and testing:
This commit additionally adds in tests to verify this behaviour.
Related issue number
Towards #2250
Checks
cc @rueian @kevin85421