Content-Length: 561745 | pFad | https://github.com/pytorch/pytorch/pull/149031

AE Add AOTI shim for _weight_int4pack_mm_cpu_tensor by Xia-Weiwen · Pull Request #149031 · pytorch/pytorch · GitHub
Skip to content

Add AOTI shim for _weight_int4pack_mm_cpu_tensor #149031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Mar 12, 2025

Stack from ghstack (oldest at bottom):

Summary
Previous implementation of shim did not align with the design and it was removed by #148907
This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT.

Test plan

pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Mar 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/149031

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 8f28345 with merge base a8b1767 (image):

NEW FAILURE - The following job has failed:

  • linux-binary-manywheel / manywheel-py3_9-cuda12_8-test / test (gh)
    RuntimeError: cuDNN version incompatibility: PyTorch was compiled against (9, 8, 0) but found runtime version (9, 7, 1). PyTorch already comes bundled with cuDNN. One option to resolving this error is to ensure PyTorch can find the bundled cuDNN. one possibility is that there is a conflicting cuDNN in LD_LIBRARY_PATH.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen added a commit that referenced this pull request Mar 12, 2025
ghstack-source-id: c72bbbb
Pull Request resolved: #149031
@Xia-Weiwen Xia-Weiwen marked this pull request as draft March 12, 2025 07:29
Copy link
Contributor

Attention! PyTorch one of the C-stable API file was changed

You MUST NOT change existing function declarations in this, as this header defines a stable C ABI. If you need to change the signature for a function, introduce a new v2 version of the function and modify code generation to target the new version of the function.


Caused by:

@Xia-Weiwen Xia-Weiwen added the topic: not user facing topic category label Mar 12, 2025
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Mar 13, 2025
ghstack-source-id: 38166c3
Pull Request resolved: #149031
[ghstack-poisoned]
Xia-Weiwen added a commit that referenced this pull request Mar 13, 2025
@Xia-Weiwen
Copy link
Collaborator Author

Xia-Weiwen commented Mar 14, 2025

Hi @EikanWang Could you please take a look? And since all these files are named after MKLDNN, is it still ok to rename them to cpu? Thanks.

@@ -523,3 +523,19 @@ AOTITorchError aoti_torch_cpu__mkl_linear(
#endif // AT_MKL_ENABLED

#endif // AT_MKLDNN_ENABLED()

AOTITorchError aoti_torch_cpu__weight_int4pack_mm_cpu_tensor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But are we using mkldnn here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It does not require MKLDNN. We are considering renaming these files from mkldnn to cpu. Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgong5 , currently, the shim_mkldnn indicates that it should only serve oneDNN. However, Weiwen and I checked the motivation as it is CPU-dedicated and cannot be reused across different hardware backends. For XPU, we have provided shim_xpu regardless of its implementation being on the top of oneDNN. That's why we want to rename the file to CPU. Does it make sense to you?

@@ -523,3 +523,19 @@ AOTITorchError aoti_torch_cpu__mkl_linear(
#endif // AT_MKL_ENABLED

#endif // AT_MKLDNN_ENABLED()

AOTITorchError aoti_torch_cpu__weight_int4pack_mm_cpu_tensor(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgong5 , currently, the shim_mkldnn indicates that it should only serve oneDNN. However, Weiwen and I checked the motivation as it is CPU-dedicated and cannot be reused across different hardware backends. For XPU, we have provided shim_xpu regardless of its implementation being on the top of oneDNN. That's why we want to rename the file to CPU. Does it make sense to you?

@@ -3835,7 +3835,7 @@ def matcher_check_fn():
include_ops = [
"aoti_torch_cpu__weight_int4pack_mm_cpu_tensor"
if torch._inductor.config.cpp_wrapper
else "extern_kernels.int4mm_packed_weight_cpu"
else "torch.ops.quantized.int4mm_packed_weight_cpu.default"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Xia-Weiwen , as we synced offline, the test cases do not cover aoti_torch_cpu__weight_int4pack_mm_cpu_tensor. Could you help elaborate on the changes? How do the changes test aoti_torch_cpu__weight_int4pack_mm_cpu_tensor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the UT runs both the fallback kernel and the template-based kernel for max-autotune. So, the fallback kernel should be used for codegen, compiling with gcc and benchmarking.

@Xia-Weiwen Xia-Weiwen marked this pull request as ready for review March 17, 2025 07:57
@Xia-Weiwen Xia-Weiwen requested a review from desertfire March 17, 2025 07:57
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 17, 2025
@Xia-Weiwen
Copy link
Collaborator Author

Hi @desertfire Since it breaks the lowering of this op (compiling error with cpp wrapper), can we cherry-pick this patch to the 2.7 branch? Thanks.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@desertfire
Copy link
Contributor

Hi @desertfire Since it breaks the lowering of this op (compiling error with cpp wrapper), can we cherry-pick this patch to the 2.7 branch? Thanks.

You can add it to #149044

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: linux-binary-manywheel / manywheel-py3_9-cuda12_8-test / test

Details for Dev Infra team Raised by workflow job

@Xia-Weiwen
Copy link
Collaborator Author

Hi @desertfire Since it breaks the lowering of this op (compiling error with cpp wrapper), can we cherry-pick this patch to the 2.7 branch? Thanks.

You can add it to #149044

Thanks

@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge -f "CI failure is unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Xia-Weiwen Xia-Weiwen added the intel This tag is for PR from Intel label Mar 18, 2025
Xia-Weiwen added a commit to Xia-Weiwen/pytorch that referenced this pull request Mar 18, 2025
**Summary**
Previous implementation of shim did not align with the design and it was removed by pytorch#148907
This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT.

**Test plan**
```
pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4
```

Pull Request resolved: pytorch#149031
Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire
atalman pushed a commit that referenced this pull request Mar 20, 2025
**Summary**
Previous implementation of shim did not align with the design and it was removed by #148907
This PR adds it back in the files of MKLDNN backend and re-enable the CPP wrapper UT.

**Test plan**
```
pytest -s test/inductor/test_cpu_cpp_wrapper.py -k test_woq_int4
```

Pull Request resolved: #149031
Approved by: https://github.com/leslie-fang-intel, https://github.com/EikanWang, https://github.com/desertfire
@github-actions github-actions bot deleted the gh/Xia-Weiwen/31/head branch April 20, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: inductor open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/pytorch/pytorch/pull/149031

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy