[0.6.1] InternLM SmoothQuant does not work #705

lanking520 · 2023-12-20T00:03:17Z

I am running 0.6.1 with InternLM model, with the following configurations

python hf_internlm_convert.py -i internlm/internlm-7b -o ./internlm-chat-7b/smooth_internlm/sq0.5/ -sq 0.5 --tensor-parallelism 1 --storage-type fp16

The conversion is finished successfully. However, when you start build the engine:

 python3 build.py  --ft_model_dir=./internlm-chat-7b/smooth_internlm/sq0.5/1-gpu --use_smooth_quant --output_dir /tmp/trtllm/internlm-internlm-7b/1

This error started to appear

Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.dense.scale_y_accum_quant.bin
Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.dense.scale_y_quant_orig.bin
Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.dense.smoother.0.bin
<tensorrt_llm.quantization.layers.SmoothQuantLinear object at 0x7f9b6503ef50>
0
Loading from /tmp/trtllm/internlm-internlm-7b/smoothquant/1-gpu/model.layers.0.attention.query_key_value.bias.0.bin
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/build.py", line 733, in <module>
    build(0, args)
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/build.py", line 704, in build
    engine = build_rank_engine(builder, builder_config, engine_name,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/build.py", line 581, in build_rank_engine
    load_from_binary(tensorrt_llm_internlm,
  File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/internlm/weight.py", line 770, in load_from_binary
    dst.value = np.ascontiguousarray(t)
AttributeError: 'NoneType' object has no attribute 'value'

I have printed out the layer

<tensorrt_llm.quantization.layers.SmoothQuantLinear object at 0x7f9b6503ef50>

It seemed that the bias property are not passed correctly when converting to smoothquant. This caused the bias object are initialized as None.

Environment information

TRTLLM 0.6.1 with CUDA 12.2
Transformers 4.34.0

The text was updated successfully, but these errors were encountered:

Tracin · 2023-12-25T09:52:16Z

Yeah, you are totally right. It will be fixed in next release.
You can change the code a little bit to make it work.
attention bias
change bias=False to bias=layer.attention.qkv.bias is not None
mlp bias
change bias=False to bias=layer.mlp.fc.bias is not None

byshiue assigned Tracin Dec 25, 2023

byshiue added triaged Issue has been triaged by maintainers Low Precision Issue about lower bit quantization, including int8, int4, fp8 labels Dec 25, 2023

Tracin closed this as completed Dec 25, 2023

kaiyux mentioned this issue Dec 27, 2023

Update TensorRT-LLM main branch #754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0.6.1] InternLM SmoothQuant does not work #705

[0.6.1] InternLM SmoothQuant does not work #705

lanking520 commented Dec 20, 2023

Tracin commented Dec 25, 2023

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

[0.6.1] InternLM SmoothQuant does not work #705

[0.6.1] InternLM SmoothQuant does not work #705

Comments

lanking520 commented Dec 20, 2023

Environment information

Tracin commented Dec 25, 2023

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!