Also support non-contiguous activation for torch._weight_int8pack_mm on CPU #147588

sanchitintel · 2025-02-21T04:56:49Z

Problem

Non-contiguous activation for torch._weight_int8pack_mm is unsupported on CPU.
So, with int8 WoQ with B16 activation with torchao, for batch-size 2 & above, an assertion is hit regarding non-contiguous A being unsupported. Such an issue was encountered with LLaMA models.

Solution

Also support non-contiguous activation for torch._weight_int8pack_mm, so long as it's contiguous on the last dimension & remove the assertion that requires contiguous activation.

Alternative solutions considered

Could modify LLaMA model in transformers library to call contiguous after obtaining the final hidden state, just before computing logits with the LM head. However, it might cause some regression for other users of that code.

Another aspect to this issue is - is latency always lower if we make an activation tensor contiguous before linear or torch._weight_int8pack_mm is called on CPU? I guess we need some data-points to analyze this part, although I think the performance should be good enough with this patch, since the first cache lines of rows of A are being explicitly prefetched in the existing code (and it also avoids copy, which a contiguous call would do).

cc @jgong5 @mingfeima @XiaobingSuper @ashokei @jingxu10

pytorch-bot · 2025-02-21T04:56:52Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147588

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 91b6790 with merge base 8a5265c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

the activation could be a slice of another tensor, but it should still be in row-major order.

mingfeima

don't need to benchmark this, won't affect the performance. just change the error message a little bit.

mingfeima · 2025-02-21T06:16:44Z

aten/src/ATen/native/LinearAlgebra.cpp

  TORCH_CHECK(A.dim() == 2,
      __func__, " : expect A to be 2D tensor.");
-
+  TORCH_CHECK(A.stride(1) == 1,
+      __func__, " : A must be row-major even if it's non-contiguous");


change the error message to "A must be contiguous on last dimension."

`A must be contiguous on last dimension.` seems more sensible.

sanchitintel · 2025-02-21T19:55:11Z

@pytorchbot merge

pytorchmergebot · 2025-02-21T19:57:07Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

sanchitintel · 2025-02-21T20:07:16Z

Hi @malfet, can you please help review this PR? It requires approval from a core reviewer/maintainer for landing.

Thanks!

malfet · 2025-02-21T20:11:38Z

Hi @malfet, can you please help review this PR? It requires approval from a core reviewer/maintainer for landing.

Thanks!

Hmm, this change looks BC breaking to me(at quick glance, afk atm). Any reason making it incompatible with contiguous A?

sanchitintel · 2025-02-21T20:25:18Z

Any reason making it incompatible with contiguous A?

Hi @malfet, thanks for promptly following up!

This change makes non-contiguous activations (so long as they're contiguous in the last dimension) compatible with torch._weight_int8pack_mm. Contiguous activations are still supported, so the change is not bc-breaking.

Thanks!

malfet

LGTM, thank you for the explanation

malfet · 2025-02-22T08:21:42Z

@pytorchbot merge

pytorchmergebot · 2025-02-22T08:23:26Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…on CPU (pytorch#147588) ### Problem Non-contiguous activation for `torch._weight_int8pack_mm` is unsupported on CPU. So, with int8 WoQ with B16 activation with torchao, for batch-size 2 & above, an assertion is hit regarding non-contiguous A being unsupported. Such an issue was encountered with LLaMA models. ### Solution Also support non-contiguous activation for `torch._weight_int8pack_mm`, so long as it's contiguous on the last dimension & remove the assertion that requires contiguous activation. ### Alternative solutions considered Could modify LLaMA model in transformers library to call `contiguous` after obtaining the final hidden state, just before computing logits with the LM head. However, [it](huggingface/transformers#36078) might cause some regression for other users of that code. Another aspect to this issue is - is latency always lower if we make an activation tensor contiguous before linear or `torch._weight_int8pack_mm` is called on CPU? I guess we need some data-points to analyze this part, although I think the performance should be good enough with this patch, since the first cache lines of rows of A are being explicitly prefetched in the existing code (and it also avoids copy, which a `contiguous` call would do). Pull Request resolved: pytorch#147588 Approved by: https://github.com/mingfeima, https://github.com/leslie-fang-intel, https://github.com/malfet

Support non-contiguous activation for torch._weight_int8pack_mm on CPU

403f600

pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: linalg_frontend release notes category labels Feb 21, 2025

sanchitintel requested a review from leslie-fang-intel February 21, 2025 04:56

sanchitintel added topic: not user facing topic category and removed release notes: linalg_frontend release notes category labels Feb 21, 2025

pytorchbot added the open source label Feb 21, 2025

This comment was marked as resolved.

Sign in to view

sanchitintel requested a review from mingfeima February 21, 2025 05:16

sanchitintel added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 21, 2025

the activation must be in row-major format

24c769c

the activation could be a slice of another tensor, but it should still be in row-major order.

sanchitintel marked this pull request as ready for review February 21, 2025 06:17

sanchitintel requested review from lezcano, nikitaved and IvanYashchuk as code owners February 21, 2025 06:17

mingfeima approved these changes Feb 21, 2025

View reviewed changes

Update error message of a TORCH_CHECK

91b6790

`A must be contiguous on last dimension.` seems more sensible.

leslie-fang-intel approved these changes Feb 21, 2025

View reviewed changes

pytorchmergebot added the merging label Feb 21, 2025

pytorchmergebot removed the merging label Feb 21, 2025

sanchitintel requested a review from malfet February 21, 2025 20:07

sanchitintel added the intel This tag is for PR from Intel label Feb 21, 2025

sanchitintel changed the title ~~Support non-contiguous activation for torch._weight_int8pack_mm on CPU~~ Also support non-contiguous activation for torch._weight_int8pack_mm on CPU Feb 21, 2025

malfet approved these changes Feb 22, 2025

View reviewed changes

pytorchmergebot added the merging label Feb 22, 2025

pytorchmergebot added the Merged label Feb 22, 2025

pytorchmergebot closed this in 3cc3d7e Feb 22, 2025

pytorchmergebot removed the merging label Feb 22, 2025

Xia-Weiwen mentioned this pull request Jun 9, 2025

High-performance LLM quantization on X86 CPU with native PyTorch #155435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Also support non-contiguous activation for torch._weight_int8pack_mm on CPU #147588

Also support non-contiguous activation for torch._weight_int8pack_mm on CPU #147588

Uh oh!

sanchitintel commented Feb 21, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 21, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

mingfeima left a comment

Uh oh!

mingfeima Feb 21, 2025

Uh oh!

sanchitintel commented Feb 21, 2025

Uh oh!

pytorchmergebot commented Feb 21, 2025

Uh oh!

sanchitintel commented Feb 21, 2025

Uh oh!

malfet commented Feb 21, 2025

Uh oh!

sanchitintel commented Feb 21, 2025 •

edited

Loading

Uh oh!

malfet left a comment

Uh oh!

malfet commented Feb 22, 2025

Uh oh!

pytorchmergebot commented Feb 22, 2025

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Also support non-contiguous activation for torch._weight_int8pack_mm on CPU #147588

Also support non-contiguous activation for torch._weight_int8pack_mm on CPU #147588

Uh oh!

Conversation

sanchitintel commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Alternative solutions considered

Uh oh!

pytorch-bot bot commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147588

✅ No Failures

Uh oh!

This comment was marked as resolved.

mingfeima left a comment

Choose a reason for hiding this comment

Uh oh!

mingfeima Feb 21, 2025

Choose a reason for hiding this comment

Uh oh!

sanchitintel commented Feb 21, 2025

Uh oh!

pytorchmergebot commented Feb 21, 2025

Merge failed

Uh oh!

sanchitintel commented Feb 21, 2025

Uh oh!

malfet commented Feb 21, 2025

Uh oh!

sanchitintel commented Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

malfet left a comment

Choose a reason for hiding this comment

Uh oh!

malfet commented Feb 22, 2025

Uh oh!

pytorchmergebot commented Feb 22, 2025

Merge started

Uh oh!

Uh oh!

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

sanchitintel commented Feb 21, 2025 •

edited

Loading

pytorch-bot bot commented Feb 21, 2025 •

edited

Loading

sanchitintel commented Feb 21, 2025 •

edited

Loading