-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Add AOTI shim for _weight_int4pack_mm_cpu_tensor #149031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149031
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 8f28345 with merge base a8b1767 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Attention! PyTorch one of the C-stable API file was changedYou MUST NOT change existing function declarations in this, as this header defines a stable C ABI. If you need to change the signature for a function, introduce a new v2 version of the function and modify code generation to target the new version of the function. Caused by: |
Hi @EikanWang Could you please take a look? And since all these files are named after MKLDNN, is it still ok to rename them to cpu? Thanks. |
@@ -523,3 +523,19 @@ AOTITorchError aoti_torch_cpu__mkl_linear( | |||
#endif // AT_MKL_ENABLED | |||
|
|||
#endif // AT_MKLDNN_ENABLED() | |||
|
|||
AOTITorchError aoti_torch_cpu__weight_int4pack_mm_cpu_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But are we using mkldnn here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. It does not require MKLDNN. We are considering renaming these files from mkldnn to cpu. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgong5 , currently, the shim_mkldnn
indicates that it should only serve oneDNN. However, Weiwen and I checked the motivation as it is CPU-dedicated and cannot be reused across different hardware backends. For XPU, we have provided shim_xpu
regardless of its implementation being on the top of oneDNN. That's why we want to rename the file to CPU. Does it make sense to you?
@@ -523,3 +523,19 @@ AOTITorchError aoti_torch_cpu__mkl_linear( | |||
#endif // AT_MKL_ENABLED | |||
|
|||
#endif // AT_MKLDNN_ENABLED() | |||
|
|||
AOTITorchError aoti_torch_cpu__weight_int4pack_mm_cpu_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgong5 , currently, the shim_mkldnn
indicates that it should only serve oneDNN. However, Weiwen and I checked the motivation as it is CPU-dedicated and cannot be reused across different hardware backends. For XPU, we have provided shim_xpu
regardless of its implementation being on the top of oneDNN. That's why we want to rename the file to CPU. Does it make sense to you?
@@ -3835,7 +3835,7 @@ def matcher_check_fn(): | |||
include_ops = [ | |||
"aoti_torch_cpu__weight_int4pack_mm_cpu_tensor" | |||
if torch._inductor.config.cpp_wrapper | |||
else "extern_kernels.int4mm_packed_weight_cpu" | |||
else "torch.ops.quantized.int4mm_packed_weight_cpu.default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Xia-Weiwen , as we synced offline, the test cases do not cover aoti_torch_cpu__weight_int4pack_mm_cpu_tensor
. Could you help elaborate on the changes? How do the changes test aoti_torch_cpu__weight_int4pack_mm_cpu_tensor
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the UT runs both the fallback kernel and the template-based kernel for max-autotune. So, the fallback kernel should be used for codegen, compiling with gcc and benchmarking.
@pytorchbot merge |
Hi @desertfire Since it breaks the lowering of this op (compiling error with cpp wrapper), can we cherry-pick this patch to the 2.7 branch? Thanks. |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
You can add it to #149044 |
Merge failedReason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_8-test / test Details for Dev Infra teamRaised by workflow job |
Thanks |
@pytorchbot merge -f "CI failure is unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
**Summary** Previous implementation of shim did not align with the design and it was removed by pytorch#148907 This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT. **Test plan** ``` pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4 ``` Pull Request resolved: pytorch#149031 Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire
**Summary** Previous implementation of shim did not align with the design and it was removed by #148907 This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT. **Test plan** ``` pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4 ``` Pull Request resolved: #149031 Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire
Stack from ghstack (oldest at bottom):
Summary
Previous implementation of shim did not align with the design and it was removed by #148907
This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT.
Test plan
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov