Skip to content

Setup TorchBench in Docker #158613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Setup TorchBench in Docker #158613

wants to merge 26 commits into from

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 18, 2025

This reduces the time spending to setup TorchBench in A100/H100 by another half an hour

Testing

Signed-off-by: Huy Do huydhn@gmail.com

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela

huydhn added 8 commits July 17, 2025 15:51
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@huydhn
Copy link
Contributor Author

huydhn commented Jul 18, 2025

Stack from ghstack (oldest at bottom):

@huydhn huydhn requested review from a team and jeffdaily as code owners July 18, 2025 00:21
Copy link

pytorch-bot bot commented Jul 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158613

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 8 New Failures, 1 Unrelated Failure

As of commit 9867f10 with merge base d36afac (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by half
an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 7425aa2
ghstack-comment-id: 3086043340
Pull-Request: #158613
[ghstack-poisoned]
@huydhn huydhn added the no-runner-experiments Bypass Meta/LF runner determinator label Jul 18, 2025
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by half
an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 9eaa733
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
huydhn added 3 commits July 17, 2025 18:14
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by half
an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 536ab13
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: c3c6d5b
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only affects the docker images used in inductor benchmarks right? Or will this make pulling docker images across all of CI slightly slower?

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving cuz this seems ok, my only question is that it does not affect anything else negatively

@huydhn
Copy link
Contributor Author

huydhn commented Jul 21, 2025

This only affects the docker images used in inductor benchmarks right? Or will this make pulling docker images across all of CI slightly slower?

Yup, the size of that benchmark image increases, but it's cached on the runner. I'm trying to confirm at the moment that the cache works correctly by re-running the benchmark to warm it up. Let me confirm that before landing this change.

@huydhn
Copy link
Contributor Author

huydhn commented Jul 21, 2025

Confirmed:

[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 21, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: ff22872
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
@huydhn
Copy link
Contributor Author

huydhn commented Jul 21, 2025

@pytorchbot merge

Copy link

pytorch-bot bot commented Jul 21, 2025

This PR has pending changes requested. Please address the comments and update the PR before merging.

@huydhn
Copy link
Contributor Author

huydhn commented Jul 21, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 21, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@ZainRizvi
Copy link
Contributor

@pytorchbot revert -c nosignal -m "Seems to have broken trunk. See GH job link HUD commit link"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@huydhn your PR has been successfully reverted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged no-runner-experiments Bypass Meta/LF runner determinator oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: releng release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy