Content-Length: 218045 | pFad | http://github.com/NVIDIA/TensorRT-LLM/issues/2681

E5 the difference of quantization implementation between quantize.py and convert_checkpoint.py · Issue #2681 · NVIDIA/TensorRT-LLM · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the difference of quantization implementation between quantize.py and convert_checkpoint.py #2681

Open
XA23i opened this issue Jan 12, 2025 · 2 comments
Labels
triaged Issue has been triaged by maintainers

Comments

@XA23i
Copy link

XA23i commented Jan 12, 2025

I've noticed that I can apply SmoothQuant to models using the command:

python quantize.py --model_dir $MODEL_PATH --qformat int8_sq --kv_cache_dtype int8 --output_dir $OUTPUT_PATH

in quantize.py. Additionally, I can also achieve this by running:

python3 convert_checkpoint.py --model_dir ./tmp/Qwen/7B/
--output_dir ./tllm_checkpoint_1gpu_sq
--dtype float16
--smoothquant 0.5
--per_token
--per_channel

It seems that the latter approach is more flexible since I can adjust parameters like the SmoothQuant ratio, per_token, and other options.

Does the first command offer broader compatibility, while the latter is restricted to models that specifically use convert_checkpoint.py? So, when a model has a corresponding convert_checkpoint.py file, I should prioritize using it first.

Furthermore, I noticed that both commands generate safetensors and a config.json. Is it possible to use quantize.py to generate config.json and manually modify the quantization-related fields afterward?

@nv-guomingz
Copy link
Collaborator

nv-guomingz commented Jan 13, 2025

Hi @XA23i , the first commands relies on the kernel selected/generated by TensorRT while the latter command depends on the smoothquant plugin.

Is it possible to use quantize.py to generate config.json and manually modify the quantization-related fields afterward?

===============================

I don't think so since the config.json reflects the content of safetensor. We can't modify one while keep another one as usual.
Maybe you cite the specific scenario and we can figure out the best way to solve the problem.

@nv-guomingz nv-guomingz added the triaged Issue has been triaged by maintainers label Jan 13, 2025
@XA23i
Copy link
Author

XA23i commented Jan 13, 2025

if I use the first command, what are the default settings for SmoothQuant, particularly the smoothquant ratio? It seems there is no such description in config.json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/NVIDIA/TensorRT-LLM/issues/2681

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy