-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Setup TorchBench in Docker #158613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/huydhn/5/head
Are you sure you want to change the base?
Setup TorchBench in Docker #158613
Conversation
Stack from ghstack (oldest at bottom): |
install_huggingface | ||
install_timm | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, wonder should be consistent to use pinned version from the buid.sh or nightly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, which line are you referring to here? TorchBench, HF, and TIMM use pinned commits, so my understand is that if a dependencies has been setup here, we won't need to deal with it later in build.sh. If you're thinking about vLLM's pinned commit, here is no good because it needs PyTorch to be built first
FYI the error: System.UnauthorizedAccessException: Access to the path '/home/grace/_work/_tool' is denied. ---> System.IO.IOException: Permission denied --- End of inner exception stack trace --- at System.IO.FileSystem.CreateDirectory(String fullPath, UnixFileMode unixCreateMode) at System.IO.Directory.CreateDirectory(String path) at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken) at GitHub.Runner.Worker.JobRunner.RunAsync(AgentJobRequestMessage message, CancellationToken jobRequestCancellationToken) at GitHub.Runner.Worker.Worker.RunAsync(String pipeIn, String pipeOut) at GitHub.Runner.Worker.Program.MainAsync(IHostContext context, String[] args) |
Yeah, this is an unrelated infra issue. I was trying to fix these runners yesterday. If I get it right, rerun would work now, so I will do that after the current jobs finish |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seem like this is breaking inductor benchmarks
Should be good once they're fixed though:
2025-07-18T17:22:25.1897653Z Dockerfile:101
2025-07-18T17:22:25.1897867Z --------------------
2025-07-18T17:22:25.1898280Z 99 | COPY ci_commit_pins/huggingface.txt huggingface.txt
2025-07-18T17:22:25.1898891Z 100 | COPY ci_commit_pins/timm.txt timm.txt
2025-07-18T17:22:25.1899385Z 101 | >>> RUN if [ -n "${INDUCTOR_BENCHMARKS}" ]; then bash ./install_inductor_benchmark_deps.sh; fi
2025-07-18T17:22:25.1900040Z 102 | RUN rm install_inductor_benchmark_deps.sh common_utils.sh timm.txt huggingface.txt
2025-07-18T17:22:25.1900495Z 103 |
2025-07-18T17:22:25.1900707Z --------------------
2025-07-18T17:22:25.1901406Z ERROR: failed to solve: process "/bin/sh -c if [ -n \"${INDUCTOR_BENCHMARKS}\" ]; then bash ./install_inductor_benchmark_deps.sh; fi" did not complete successfully: exit code: 1
Umm, where do you get this error? The benchmark jobs are still running, but they look ok so far https://github.com/pytorch/pytorch/actions/runs/16375091052 |
This reduces the time spending to setup TorchBench in A100/H100 by another half an hour
Testing
Signed-off-by: Huy Do huydhn@gmail.com