-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Description
🐛 Describe the bug
When running torch.xpu.mem_get_info()
on BMG Windows, it failed with the message:
>>> torch.xpu.mem_get_info()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "F:\miniforge\envs\nightly0708\lib\site-packages\torch\xpu\memory.py", line 194, in mem_get_info
return torch._C._xpu_getMemoryInfo(device)
RuntimeError: The device (Intel(R) Arc(TM) B580 Graphics) doesn't support querying the available free memory. You can file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize its implementation.
Failure reason
This is because of the oneAPI 2025.1 does not have the correct free memory query. This will be fixed in oneAPI 2025.2.
From the Intel Compiler team:
For 2025.1, setting UR_L0_ENABLE_SYSMAN_ENV_DEFAULT=0 and seeing free memory is correct, that is the only way to enable the correct free memory support with 2025.1. This forces zesInit to be called which is required for BMG/LNL onwards.
back in 2025.0 there was a bug where we reported an "estimated" free memory when sysman support was not available, but in 2025.1+ if sysman support is not available, then the adapter returns unsupported to avoid confusing the user and allowing them to believe there is more free memory than there actually is.
For 2025.2, zesInit is called by default if one has BMG/LNL onwards, no env required.
Workaround
For 2025.1, the only way to enable it is set the environment:
# cmd
set UR_L0_ENABLE_SYSMAN_ENV_DEFAULT=0
When this is set, then everything should work fine.
Fix plan
We will fix this in by upgrading the oneAPI 2025.2
Versions
torch: 2.9.0.dev20250706+xpu
intel-sycl-rt : 2025.1.1
cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @gujinghui @EikanWang @fengyuan14 @guangyey
Metadata
Metadata
Assignees
Labels
Type
Projects
Status