the difference of quantization implementation between quantize.py and convert_checkpoint.py #2681

XA23i · 2025-01-12T08:07:45Z

I've noticed that I can apply SmoothQuant to models using the command:

python quantize.py --model_dir $MODEL_PATH --qformat int8_sq --kv_cache_dtype int8 --output_dir $OUTPUT_PATH

in quantize.py. Additionally, I can also achieve this by running:

python3 convert_checkpoint.py --model_dir ./tmp/Qwen/7B/
--output_dir ./tllm_checkpoint_1gpu_sq
--dtype float16
--smoothquant 0.5
--per_token
--per_channel

It seems that the latter approach is more flexible since I can adjust parameters like the SmoothQuant ratio, per_token, and other options.

Does the first command offer broader compatibility, while the latter is restricted to models that specifically use convert_checkpoint.py? So, when a model has a corresponding convert_checkpoint.py file, I should prioritize using it first.

Furthermore, I noticed that both commands generate safetensors and a config.json. Is it possible to use quantize.py to generate config.json and manually modify the quantization-related fields afterward?

nv-guomingz · 2025-01-13T06:27:57Z

Hi @XA23i , the first commands relies on the kernel selected/generated by TensorRT while the latter command depends on the smoothquant plugin.

Is it possible to use quantize.py to generate config.json and manually modify the quantization-related fields afterward?

===============================

I don't think so since the config.json reflects the content of safetensor. We can't modify one while keep another one as usual.
Maybe you cite the specific scenario and we can figure out the best way to solve the problem.

XA23i · 2025-01-13T12:13:59Z

if I use the first command, what are the default settings for SmoothQuant, particularly the smoothquant ratio? It seems there is no such description in config.json.

nv-guomingz added the triaged Issue has been triaged by maintainers label Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the difference of quantization implementation between quantize.py and convert_checkpoint.py #2681

the difference of quantization implementation between quantize.py and convert_checkpoint.py #2681

XA23i commented Jan 12, 2025

nv-guomingz commented Jan 13, 2025 •

edited

Loading

XA23i commented Jan 13, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

the difference of quantization implementation between quantize.py and convert_checkpoint.py #2681

the difference of quantization implementation between quantize.py and convert_checkpoint.py #2681

Comments

XA23i commented Jan 12, 2025

nv-guomingz commented Jan 13, 2025 • edited Loading

XA23i commented Jan 13, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

nv-guomingz commented Jan 13, 2025 •

edited

Loading