Skip to content

Failing E2E test [sig-network] LoadBalancers ExternalTrafficPolicy: Local [Feature:LoadBalancer] [Slow] should only target nodes with endpoints #131692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kef002 opened this issue May 9, 2025 · 8 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@kef002
Copy link

kef002 commented May 9, 2025

Which jobs are failing?

[sig-network] LoadBalancers ExternalTrafficPolicy: Local [Feature:LoadBalancer] [Slow] should only target nodes with endpoints

Which tests are failing?

[PANICKED] in [It] - runtime/panic.go:115 @ 04/22/25 09:12:26.918
I0422 09:12:26.918509 23 util.go:81]
Output of kubectl describe svc:

I0422 09:12:26.918605 23 builder.go:121] Running '/usr/local/bin/kubectl --kubeconfig=/tmp/kubeconfig-310162115 --namespace=esipp-9715 describ
e svc --namespace=esipp-9715'
I0422 09:12:27.044900 23 builder.go:146] stderr: ""
I0422 09:12:27.044964 23 builder.go:147] stdout: "Name: external-local-nodes\nNamespace: esipp-9715\nLabels
: testid=external-local-nodes-73c595e0-4cce-4980-8804-44b8c33c8528\nAnnotations: \nSelector:
testid=external-local-nodes-73c595e0-4cce-4980-8804-44b8c33c8528\nType: LoadBalancer\nIP Family Policy: SingleStac
k\nIP Families: IPv4\nIP: 10.107.95.39\nIPs: 10.107.95.39\nLoadBalancer Ingress: 10.
158.34.156 (VIP)\nPort: 8081/TCP\nTargetPort: 80/TCP\nNodePort: 30837/TCP\nE
ndpoints: 192.168.130.169:80\nSession Affinity: None\nExternal Traffic Policy: Local\nInternal Traffic Policy: Cluster
nHealthCheck NodePort: 30490\nEvents:\n Type Reason Age From Message\n ---- ------ --
-- ---- -------\n Normal EnsuringLoadBalancer 34s service-controller Ensuring load balancer\n Normal EnsuredLoadBalancer
34s service-controller Ensured load balancer\n\n\nName: node-port-service\nNamespace: esipp-9715\nLabel
s: \nAnnotations: \nSelector: selector-abe458e2-85d6-44bb-a950-52754b3adf53=true\nTyp
e: NodePort\nIP Family Policy: SingleStack\nIP Families: IPv4\nIP: 10.101.131.252
\nIPs: 10.101.131.252\nPort: http 80/TCP\nTargetPort: 8083/TCP\nNodePort:
http 32453/TCP\nEndpoints: 192.168.130.168:8083,192.168.196.139:8083,192.168.136.190:8083 + 1 more...\nPort:
udp 90/UDP\nTargetPort: 8081/UDP\nNodePort: udp 31931/UDP\nEndpoints: 192.168.130.168:8081,192
.168.196.139:8081,192.168.136.190:8081 + 1 more...\nSession Affinity: None\nExternal Traffic Policy: Cluster\nInternal Traffic Policy:
Cluster\nEvents: \n\n\nName: session-affinity-service\nNamespace: esipp-9715\nLabels
: \nAnnotations: \nSelector: selector-abe458e2-85d6-44bb-a950-52754b3adf53=true\nType
: NodePort\nIP Family Policy: SingleStack\nIP Families: IPv4\nIP: 10.97.15.200\nI
Ps: 10.97.15.200\nPort: http 80/TCP\nTargetPort: 8083/TCP\nNodePort: htt
p 31513/TCP\nEndpoints: 192.168.130.168:8083,192.168.196.139:8083,192.168.136.190:8083 + 1 more...\nPort: ud
p 90/UDP\nTargetPort: 8081/UDP\nNodePort: udp 31478/UDP\nEndpoints: 192.168.130.168:8081,192.168.
196.139:8081,192.168.136.190:8081 + 1 more...\nSession Affinity: ClientIP\nExternal Traffic Policy: Cluster\nInternal Traffic Policy:
Cluster\nEvents: \n"
I0422 09:12:27.044976 23 util.go:84] Name: external-local-nodes
Namespace: esipp-9715
Labels: testid=external-local-nodes-73c595e0-4cce-4980-8804-44b8c33c8528
Annotations:
Selector: testid=external-local-nodes-73c595e0-4cce-4980-8804-44b8c33c8528
Type: LoadBalancer
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.107.95.39
IPs: 10.107.95.39
LoadBalancer Ingress: 10.158.34.156 (VIP)
Port: 8081/TCP
TargetPort: 80/TCP
NodePort: 30837/TCP
Endpoints: 192.168.130.169:80
Session Affinity: None
External Traffic Policy: Local
Internal Traffic Policy: Cluster
HealthCheck NodePort: 30490
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal EnsuringLoadBalancer 34s service-controller Ensuring load balancer
Normal EnsuredLoadBalancer 34s service-controller Ensured load balancer

Name: node-port-service
Namespace: esipp-9715
Labels:
Annotations:
Selector: selector-abe458e2-85d6-44bb-a950-52754b3adf53=true
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.101.131.252
IPs: 10.101.131.252
Port: http 80/TCP
TargetPort: 8083/TCP
NodePort: http 32453/TCP
Endpoints: 192.168.130.168:8083,192.168.196.139:8083,192.168.136.190:8083 + 1 more...
Port: udp 90/UDP
TargetPort: 8081/UDP
NodePort: udp 31931/UDP
Endpoints: 192.168.130.168:8081,192.168.196.139:8081,192.168.136.190:8081 + 1 more...
Session Affinity: None
External Traffic Policy: Cluster
Internal Traffic Policy: Cluster
Events:

Name: session-affinity-service
Namespace: esipp-9715
Labels:
Annotations:
Selector: selector-abe458e2-85d6-44bb-a950-52754b3adf53=true
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.97.15.200
IPs: 10.97.15.200
Port: http 80/TCP
TargetPort: 8083/TCP
NodePort: http 31513/TCP
Endpoints: 192.168.130.168:8083,192.168.196.139:8083,192.168.136.190:8083 + 1 more...
Port: udp 90/UDP
TargetPort: 8081/UDP
NodePort: udp 31478/UDP
Endpoints: 192.168.130.168:8081,192.168.196.139:8081,192.168.136.190:8081 + 1 more...
Session Affinity: ClientIP
External Traffic Policy: Cluster
Internal Traffic Policy: Cluster
Events:

I0422 09:12:27.066184 23 jig.go:604] Waiting up to 15m0s for service "external-local-nodes" to have no LoadBalancer
I0422 09:12:27.092657 23 helper.go:125] Waiting up to 7m0s for all (but 0) nodes to be ready
STEP: dump namespace information after failure @ 04/22/25 09:12:27.097

................................................................................................................................................................

[PANICKED] Test Panicked
In [It] at: runtime/panic.go:115 @ 04/22/25 09:12:26.918

runtime error: index out of range [3] with length 3

Full Stack Trace
k8s.io/kubernetes/test/e2e/network.init.func16.4({0x7f9d2c2f99e0, 0xc00527e510})
k8s.io/kubernetes/test/e2e/network/loadbalancer.go:1136 +0xb25

......................................................
Summarizing 1 Failure:
[PANICKED!] [sig-network] LoadBalancers ExternalTrafficPolicy: Local [Feature:LoadBalancer] [Slow] [It] should only target nodes with endpoint
s [sig-network, Feature:LoadBalancer, Slow]
runtime/panic.go:115

Ran 1 of 6622 Specs in 34.664 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 6621 Skipped
--- FAIL: TestE2E (34.95s)
FAIL

Since when has it been failing?

The test failed 04/2025

Testgrid link

No response

Reason for failure (if possible)

No response

Anything else we need to know?

No response

Relevant SIG(s)

/sig-network

@kef002 kef002 added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label May 9, 2025
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 9, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@aojea
Copy link
Member

aojea commented May 9, 2025

@kef002 what kubernetes version are you using? you can obtain it from e2e.test -version

@chandan-epic
Copy link

/assign

@calvinxu1526
Copy link

$e2e.test --version
v1.32.4

@aojea
Copy link
Member

aojea commented May 14, 2025

I see, checking the panic line at that version it seems that the problem is that there is a mismatch between the returned ips

ips := e2enode.CollectAddresses(nodes, v1.NodeInternalIP)

and the number of nodes

for n, internalIP := range ips {
// Make sure the loadbalancer picked up the health check change.
// Confirm traffic can reach backend through LB before checking healthcheck nodeport.
e2eservice.TestReachableHTTP(ctx, ingressIP, svcTCPPort, e2eservice.KubeProxyLagTimeout)
expectedSuccess := nodes.Items[n].Name == endpointNodeName

that seems that can happen

func CollectAddresses(nodes *v1.NodeList, addressType v1.NodeAddressType) []string {
ips := []string{}
for i := range nodes.Items {
ips = append(ips, GetAddresses(&nodes.Items[i], addressType)...)
}
return ips
}

Since it is possible if a node has multiple addresss from the same type, we need to make the test for reliable by getting only one IP per node, per example, instead of calling CollectAddresses create a map[nodeName] = ip and iterating over it

/sig network
/help

@k8s-ci-robot
Copy link
Contributor

@aojea:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

I see, checking the panic line at that version it seems that the problem is that there is a mismatch between the returned ips

ips := e2enode.CollectAddresses(nodes, v1.NodeInternalIP)

and the number of nodes

for n, internalIP := range ips {
// Make sure the loadbalancer picked up the health check change.
// Confirm traffic can reach backend through LB before checking healthcheck nodeport.
e2eservice.TestReachableHTTP(ctx, ingressIP, svcTCPPort, e2eservice.KubeProxyLagTimeout)
expectedSuccess := nodes.Items[n].Name == endpointNodeName

that seems that can happen

func CollectAddresses(nodes *v1.NodeList, addressType v1.NodeAddressType) []string {
ips := []string{}
for i := range nodes.Items {
ips = append(ips, GetAddresses(&nodes.Items[i], addressType)...)
}
return ips
}

Since it is possible if a node has multiple addresss from the same type, we need to make the test for reliable by getting only one IP per node, per example, instead of calling CollectAddresses create a map[nodeName] = ip and iterating over it

/sig network
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 14, 2025
@thockin thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 22, 2025
@adrianmoisey
Copy link
Member

/assign

@adrianmoisey
Copy link
Member

I'll take a look at this one, seems like it's good for me to dig into the tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

7 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy