Skip to content

[Failing Test][sig-node] InodeEviction [Slow] [Serial] when we run containers that should cause DiskPressure should eventually evict all of the correct pods #131923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pacoxu opened this issue May 23, 2025 · 5 comments
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@pacoxu
Copy link
Member

pacoxu commented May 23, 2025

Which jobs are failing?

E2eNode Suite: [It] [sig-node] InodeEviction [Slow] [Serial] [Disruptive] [Feature:Eviction] when we run containers that should cause DiskPressure should eventually evict all of the correct pods

Which tests are failing?

ci-crio-cgroupv2-node-e2e-eviction

Since when has it been failing?

Keeps failing since 05-16
Flake before

Image

Testgrid link

https://testgrid.k8s.io/sig-node-cri-o#ci-crio-cgroupv2-node-e2e-eviction

Reason for failure (if possible)

{ failed [FAILED] Timed out after 900.027s.
Expected
    <*errors.errorString | 0xc0008a0d40>: 
    NodeCondition: DiskPressure not encountered
    {
        s: "NodeCondition: DiskPressure not encountered",
    }
to be nil
In [It] at: k8s.io/kubernetes/test/e2e_node/eviction_test.go:608 @ 05/23/25 01:30:55.069
}

Anything else we need to know?

#130663 is about containerd failure on ImageGCNoEviction. (A fix is in containerd v2.0.5)

But this one is for CRI-O.

Relevant SIG(s)

/sig

@pacoxu pacoxu added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label May 23, 2025
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 23, 2025
@pacoxu pacoxu changed the title [Failing Test][It] [sig-node] InodeEviction [Slow] [Serial] [Disruptive] [Feature:Eviction] when we run containers that should cause DiskPressure should eventually evict all of the correct pods[Changes] [Failing Test][It] [sig-node] InodeEviction [Slow] [Serial] when we run containers that should cause DiskPressure should eventually evict all of the correct pods May 23, 2025
@pacoxu pacoxu changed the title [Failing Test][It] [sig-node] InodeEviction [Slow] [Serial] when we run containers that should cause DiskPressure should eventually evict all of the correct pods [Failing Test][sig-node] InodeEviction [Slow] [Serial] when we run containers that should cause DiskPressure should eventually evict all of the correct pods May 23, 2025
@kannon92
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 23, 2025
@haircommander
Copy link
Contributor

/cc

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 28, 2025
@haircommander haircommander moved this from Triage to Issues - To do in SIG Node CI/Test Board May 28, 2025
@hshiina
Copy link
Contributor

hshiina commented May 31, 2025

A pod to consume inodes seems to be running slowly in recent runs.

In the last success, inode usage exceeded the threshold(200000) and the node hit DiskPressure:

  STEP: checking eviction ordering and ensuring important pods don't fail @ 05/16/25 07:08:18.258
  I0516 07:08:20.262973 2719 eviction_test.go:616] Node has DiskPressure
  I0516 07:08:20.306348 2719 util.go:313] Kubelet Metrics: []
  I0516 07:08:20.328398 2719 eviction_test.go:888] imageFsInfo.Inodes: 10223040, imageFsInfo.InodesFree: 9892515
  I0516 07:08:20.328453 2719 eviction_test.go:891] rootFsInfo.Inodes: 10223040, rootFsInfo.InodesFree: 9892515
  I0516 07:08:20.328472 2719 eviction_test.go:894] Pod: volume-inode-hog-pod
  I0516 07:08:20.328487 2719 eviction_test.go:897] --- summary Container: volume-inode-hog-container inodeUsage: 12
  I0516 07:08:20.328506 2719 eviction_test.go:902] --- summary Volume: test-volume inodeUsage: 282140
  I0516 07:08:20.328656 2719 eviction_test.go:894] Pod: innocent-pod
  I0516 07:08:20.328758 2719 eviction_test.go:897] --- summary Container: innocent-container inodeUsage: 11
  I0516 07:08:20.333886 2719 eviction_test.go:718] fetching pod innocent-pod; phase= Running
  I0516 07:08:20.333932 2719 eviction_test.go:718] fetching pod volume-inode-hog-pod; phase= Running

In a recent failure, inode usage didn't reach the threshold:

  I0523 01:30:52.471167 2675 eviction_test.go:894] Pod: innocent-pod
  I0523 01:30:52.471184 2675 eviction_test.go:897] --- summary Container: innocent-container inodeUsage: 11
  I0523 01:30:54.590009 2675 eviction_test.go:888] imageFsInfo.Inodes: 10223040, imageFsInfo.InodesFree: 9944277
  I0523 01:30:54.590062 2675 eviction_test.go:891] rootFsInfo.Inodes: 10223040, rootFsInfo.InodesFree: 9944277
  I0523 01:30:54.590078 2675 eviction_test.go:894] Pod: innocent-pod
  I0523 01:30:54.590098 2675 eviction_test.go:897] --- summary Container: innocent-container inodeUsage: 11
  I0523 01:30:54.590112 2675 eviction_test.go:894] Pod: volume-inode-hog-pod
  I0523 01:30:54.590125 2675 eviction_test.go:897] --- summary Container: volume-inode-hog-container inodeUsage: 12
  I0523 01:30:54.590140 2675 eviction_test.go:902] --- summary Volume: test-volume inodeUsage: 149671
  [FAILED] in [It] - k8s.io/kubernetes/test/e2e_node/eviction_test.go:608 @ 05/23/25 01:30:55.069

@hshiina
Copy link
Contributor

hshiina commented May 31, 2025

There is a similar slow down in the ImageGCNoEviction test of the same job, which seems flaky.

@Manish4044
Copy link

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: Issues - To do
Development

No branches or pull requests

6 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy