Skip to content

Setup TorchBench in Docker #158613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: gh/huydhn/5/head
Choose a base branch
from
Open

Setup TorchBench in Docker #158613

wants to merge 10 commits into from

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jul 18, 2025

This reduces the time spending to setup TorchBench in A100/H100 by another half an hour

Testing

Signed-off-by: Huy Do huydhn@gmail.com

[ghstack-poisoned]
@huydhn
Copy link
Contributor Author

huydhn commented Jul 18, 2025

@huydhn huydhn requested review from a team and jeffdaily as code owners July 18, 2025 00:21
Copy link

pytorch-bot bot commented Jul 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158613

Note: Links to docs will display an error until the docs builds have been completed.

❌ 7 New Failures, 27 Pending, 1 Unrelated Failure

As of commit e6bd515 with merge base 90b082e (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by half
an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 7425aa2
ghstack-comment-id: 3086043340
Pull-Request: #158613
[ghstack-poisoned]
@huydhn huydhn added the no-runner-experiments Bypass Meta/LF runner determinator label Jul 18, 2025
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by half
an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 9eaa733
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by half
an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 536ab13
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: c3c6d5b
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 18, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 73a1715
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
@yangw-dev yangw-dev self-requested a review July 18, 2025 19:00
install_huggingface
install_timm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, wonder should be consistent to use pinned version from the buid.sh or nightly here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, which line are you referring to here? TorchBench, HF, and TIMM use pinned commits, so my understand is that if a dependencies has been setup here, we won't need to deal with it later in build.sh. If you're thinking about vLLM's pinned commit, here is no good because it needs PyTorch to be built first

@yangw-dev
Copy link
Contributor

FYI the error:

System.UnauthorizedAccessException: Access to the path '/home/grace/_work/_tool' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode) at System.IO.Directory.CreateDirectory(String path) at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken) at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken) at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut) at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)

@huydhn
Copy link
Contributor Author

huydhn commented Jul 18, 2025

FYI the error:

System.UnauthorizedAccessException: Access to the path '/home/grace/_work/_tool' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode) at System.IO.Directory.CreateDirectory(String path) at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken) at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken) at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut) at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args)

Yeah, this is an unrelated infra issue. I was trying to fix these runners yesterday. If I get it right, rerun would work now, so I will do that after the current jobs finish

Copy link
Contributor

@ZainRizvi ZainRizvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem like this is breaking inductor benchmarks

Should be good once they're fixed though:

2025-07-18T17:22:25.1897653Z Dockerfile:101
2025-07-18T17:22:25.1897867Z --------------------
2025-07-18T17:22:25.1898280Z   99 |     COPY ci_commit_pins/huggingface.txt huggingface.txt
2025-07-18T17:22:25.1898891Z  100 |     COPY ci_commit_pins/timm.txt timm.txt
2025-07-18T17:22:25.1899385Z  101 | >>> RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
2025-07-18T17:22:25.1900040Z  102 |     RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
2025-07-18T17:22:25.1900495Z  103 |     
2025-07-18T17:22:25.1900707Z --------------------
2025-07-18T17:22:25.1901406Z ERROR: failed to solve: process "/bin/sh -c if [ -n \"${INDUCTOR_BENCHMARKS}\" ]; then bash ./install_inductor_benchmark_deps.sh; fi" did not complete successfully: exit code: 1

@huydhn
Copy link
Contributor Author

huydhn commented Jul 18, 2025

Seem like this is breaking inductor benchmarks

Should be good once they're fixed though:

2025-07-18T17:22:25.1897653Z Dockerfile:101
2025-07-18T17:22:25.1897867Z --------------------
2025-07-18T17:22:25.1898280Z   99 |     COPY ci_commit_pins/huggingface.txt huggingface.txt
2025-07-18T17:22:25.1898891Z  100 |     COPY ci_commit_pins/timm.txt timm.txt
2025-07-18T17:22:25.1899385Z  101 | >>> RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
2025-07-18T17:22:25.1900040Z  102 |     RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
2025-07-18T17:22:25.1900495Z  103 |     
2025-07-18T17:22:25.1900707Z --------------------
2025-07-18T17:22:25.1901406Z ERROR: failed to solve: process "/bin/sh -c if [ -n \"${INDUCTOR_BENCHMARKS}\" ]; then bash ./install_inductor_benchmark_deps.sh; fi" did not complete successfully: exit code: 1

Umm, where do you get this error? The benchmark jobs are still running, but they look ok so far https://github.com/pytorch/pytorch/actions/runs/16375091052

[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 19, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: efa6b15
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 19, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: efa6b15
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 19, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 22a7b2f
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 19, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: a62f08f
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
[ghstack-poisoned]
huydhn added a commit that referenced this pull request Jul 19, 2025
This reduces the time spending to setup TorchBench in A100/H100 by
another half an hour

Signed-off-by: Huy Do <huydhn@gmail.com>
ghstack-source-id: 7374439
ghstack-comment-id: 3086043340
Pull-Request: #158613
Signed-off-by: Huy Do <huydhn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor no-runner-experiments Bypass Meta/LF runner determinator release notes: releng release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy