Content-Length: 462991 | pFad | https://github.com/pytorch/pytorch/pull/147588

A0 Also support non-contiguous activation for torch._weight_int8pack_mm on CPU by sanchitintel · Pull Request #147588 · pytorch/pytorch · GitHub
Skip to content

Also support non-contiguous activation for torch._weight_int8pack_mm on CPU #147588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

sanchitintel
Copy link
Collaborator

@sanchitintel sanchitintel commented Feb 21, 2025

Problem

Non-contiguous activation for torch._weight_int8pack_mm is unsupported on CPU.
So, with int8 WoQ with B16 activation with torchao, for batch-size 2 & above, an assertion is hit regarding non-contiguous A being unsupported. Such an issue was encountered with LLaMA models.

Solution

Also support non-contiguous activation for torch._weight_int8pack_mm, so long as it's contiguous on the last dimension & remove the assertion that requires contiguous activation.

Alternative solutions considered

Could modify LLaMA model in transformers library to call contiguous after obtaining the final hidden state, just before computing logits with the LM head. However, it might cause some regression for other users of that code.

Another aspect to this issue is - is latency always lower if we make an activation tensor contiguous before linear or torch._weight_int8pack_mm is called on CPU? I guess we need some data-points to analyze this part, although I think the performance should be good enough with this patch, since the first cache lines of rows of A are being explicitly prefetched in the existing code (and it also avoids copy, which a contiguous call would do).

cc @jgong5 @mingfeima @XiaobingSuper @ashokei @jingxu10

Copy link

pytorch-bot bot commented Feb 21, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147588

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 91b6790 with merge base 8a5265c (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: cpu CPU specific problem (e.g., perf, algorithm) release notes: linalg_frontend release notes category labels Feb 21, 2025
@sanchitintel sanchitintel added topic: not user facing topic category and removed release notes: linalg_frontend release notes category labels Feb 21, 2025
@sanchitintel

This comment was marked as resolved.

@sanchitintel sanchitintel added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 21, 2025
the activation could be a slice of another tensor, but it should still be in row-major order.
@sanchitintel sanchitintel marked this pull request as ready for review February 21, 2025 06:17
Copy link
Collaborator

@mingfeima mingfeima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need to benchmark this, won't affect the performance. just change the error message a little bit.

TORCH_CHECK(A.dim() == 2,
__func__, " : expect A to be 2D tensor.");

TORCH_CHECK(A.stride(1) == 1,
__func__, " : A must be row-major even if it's non-contiguous");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the error message to "A must be contiguous on last dimension."

`A must be contiguous on last dimension.` seems more sensible.
@sanchitintel
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@sanchitintel
Copy link
Collaborator Author

Hi @malfet, can you please help review this PR? It requires approval from a core reviewer/maintainer for landing.

Thanks!

@sanchitintel sanchitintel requested a review from malfet February 21, 2025 20:07
@sanchitintel sanchitintel added the intel This tag is for PR from Intel label Feb 21, 2025
@malfet
Copy link
Contributor

malfet commented Feb 21, 2025

Hi @malfet, can you please help review this PR? It requires approval from a core reviewer/maintainer for landing.

Thanks!

Hmm, this change looks BC breaking to me(at quick glance, afk atm). Any reason making it incompatible with contiguous A?

@sanchitintel
Copy link
Collaborator Author

sanchitintel commented Feb 21, 2025

Any reason making it incompatible with contiguous A?

Hi @malfet, thanks for promptly following up!

This change makes non-contiguous activations (so long as they're contiguous in the last dimension) compatible with torch._weight_int8pack_mm. Contiguous activations are still supported, so the change is not bc-breaking.

Thanks!

@sanchitintel sanchitintel changed the title Support non-contiguous activation for torch._weight_int8pack_mm on CPU Also support non-contiguous activation for torch._weight_int8pack_mm on CPU Feb 21, 2025
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the explanation

@malfet
Copy link
Contributor

malfet commented Feb 22, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

majing921201 pushed a commit to majing921201/pytorch that referenced this pull request Mar 4, 2025
…on CPU (pytorch#147588)

### Problem
Non-contiguous activation for `torch._weight_int8pack_mm` is unsupported on CPU.
So, with int8 WoQ with B16 activation with torchao, for batch-size 2 & above, an assertion is hit regarding non-contiguous A being unsupported. Such an issue was encountered with LLaMA models.

### Solution
Also support non-contiguous activation for `torch._weight_int8pack_mm`, so long as it's contiguous on the last dimension & remove the assertion that requires contiguous activation.

### Alternative solutions considered
Could modify LLaMA model in transformers library to call `contiguous` after obtaining the final hidden state, just before computing logits with the LM head. However, [it](huggingface/transformers#36078) might cause some regression for other users of that code.

Another aspect to this issue is - is latency always lower if we make an activation tensor contiguous before linear or `torch._weight_int8pack_mm` is called on CPU? I guess we need some data-points to analyze this part, although I think the performance should be good enough with this patch, since the first cache lines of rows of A are being explicitly prefetched in the existing code (and it also avoids copy, which a `contiguous` call would do).

Pull Request resolved: pytorch#147588
Approved by: https://github.com/mingfeima, https://github.com/leslie-fang-intel, https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request intel This tag is for PR from Intel Merged module: cpu CPU specific problem (e.g., perf, algorithm) open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/pytorch/pytorch/pull/147588

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy