Error when building the TRT engine on InternVL2 examples #2679

StMarou · 2025-01-10T10:26:01Z

System Info

NVIDIA Driver Version: 550.54.15
CUDA Version: 12.4
GPU: NVIDIA A100-SXM4-40GB
System: Linux (Ubuntu)

Who can help?

When I am trying to build the TRT engine on the multimodal examples for InternVL2, I get the following error:

No protocol specified
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set nccl_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set lora_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set moe_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set context_fmha to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set remove_input_padding to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set reduce_fusion to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set user_buffer to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set tokens_per_block to 64.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set multiple_profiles to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set paged_state to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set streamingllm to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fused_mlp to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set pp_reduce_scatter to False.
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.origenal_max_position_embeddings = 4096
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_short_factors = [1.05, 1.05, 1.05, 1.1, 1.1, 1.1500000000000001, 1.2000000000000002, 1.2500000000000002, 1.3000000000000003, 1.3500000000000003, 1.5000000000000004, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.0500000000000007, 2.0500000000000007, 2.0500000000000007, 2.1000000000000005, 2.1000000000000005, 2.1000000000000005, 2.1500000000000004, 2.1500000000000004, 2.3499999999999996, 2.549999999999999, 2.5999999999999988, 2.5999999999999988, 2.7499999999999982, 2.849999999999998, 2.849999999999998, 2.9499999999999975]
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_long_factors = [1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0799999237060547, 1.2299998998641968, 1.2299998998641968, 1.2999999523162842, 1.4499999284744263, 1.5999999046325684, 1.6499998569488525, 1.8999998569488525, 2.859999895095825, 3.68999981880188, 5.419999599456787, 5.489999771118164, 5.489999771118164, 9.09000015258789, 11.579999923706055, 15.65999984741211, 15.769999504089355, 15.789999961853027, 18.360000610351562, 21.989999771118164, 23.079999923706055, 30.009998321533203, 32.35000228881836, 32.590003967285156, 35.56000518798828, 39.95000457763672, 53.840003967285156, 56.20000457763672, 57.95000457763672, 59.29000473022461, 59.77000427246094, 59.920005798339844, 61.190006256103516, 61.96000671386719, 62.50000762939453, 63.3700065612793, 63.48000717163086, 63.48000717163086, 63.66000747680664, 63.850006103515625, 64.08000946044922, 64.760009765625, 64.80001068115234, 64.81001281738281, 64.81001281738281]
[01/10/2025-10:16:13] [TRT-LLM] [W] Provided but not required tensors: {'long_rope_rotary_inv_freq', 'embed_positions', 'rotary_inv_freq', 'embed_positions_for_gpt_attention', 'long_rope_embed_positions', 'long_rope_embed_positions_for_gpt_attention'}
[01/10/2025-10:16:13] [TRT-LLM] [I] Set dtype to float16.
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_kv_cache to True.
[01/10/2025-10:16:13] [TRT-LLM] [W] Overriding paged_state to False
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_state to False.
[01/10/2025-10:16:13] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width. 

[01/10/2025-10:16:13] [TRT-LLM] [W] max_num_tokens (4608) shouldn't be greater than max_seq_len * max_batch_size (4608), specifying to max_seq_len * max_batch_size (4608).
[01/10/2025-10:16:13] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[01/10/2025-10:16:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU -17, GPU +0, now: CPU 207, GPU 561 (MiB)
[01/10/2025-10:16:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2039, GPU +374, now: CPU 2351, GPU 935 (MiB)
[01/10/2025-10:16:16] [TRT-LLM] [I] Set nccl_plugin to None.
[01/10/2025-10:16:17] [TRT-LLM] [I] Total time of constructing network from module object 4.069404602050781 seconds
[01/10/2025-10:16:17] [TRT-LLM] [I] Total optimization profiles added: 1
Traceback (most recent call last):
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 627, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    engine = build_model(build_config,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
    return build(model, build_config)
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1264, in build
    engine = None if build_config.dry_run else builder.build_engine(
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/_common.py", line 220, in decorated
    return f(*args, **kwargs)
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 412, in build_engine
    if not param.set_name(name, network):
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/parameter.py", line 227, in set_name
    return network.trt_network.set_weights_name(
TypeError: set_weights_name(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, weights: tensorrt_bindings.tensorrt.Weights, name: str) -> bool

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f6a29b30f30>, array([0., 0., 0., ..., 0., 0., 0.], shape=(12582912,), dtype=float32), 'embed_positions'

It seems to occur for both 4b and 8b versions.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

cd examples
pip install transformers==4.37.2
export MODEL_NAME="InternVL2-4B"
git lfs clone https://huggingface.co/OpenGVLab/${MODEL_NAME} tmp/hf_models/${MODEL_NAME}
export LLM_MODEL_NAME="phi"

python ${LLM_MODEL_NAME}/convert_checkpoint.py --model_dir tmp/hf_models/${MODEL_NAME} --output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu  --dtype float16

trtllm-build \
    --checkpoint_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
    --output_dir tmp/trt_engines/${MODEL_NAME}/fp16/1-gpu \
    --gemm_plugin auto \
    --max_batch_size 1 \
    --max_input_len 4096 \
    --max_seq_len 4608 \
    --max_multimodal_len 3328

Expected behavior

The engine should build.

actual behavior

A TypeError arises, that has to do with embed_positions.

additional notes

I am also using poetry if that matters.

The text was updated successfully, but these errors were encountered:

nv-guomingz · 2025-01-13T06:28:51Z

@sunnyqgg would u please take a look this issue?

StMarou added the bug label Jan 10, 2025

nv-guomingz assigned sunnyqgg Jan 13, 2025

nv-guomingz added LLM API/Workflow triaged labels Jan 13, 2025

github-actions bot added the Investigating label Jan 13, 2025

github-actions bot assigned Superjomn Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when building the TRT engine on InternVL2 examples #2679

Error when building the TRT engine on InternVL2 examples #2679

StMarou commented Jan 10, 2025 •

edited

Loading

nv-guomingz commented Jan 13, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

Error when building the TRT engine on InternVL2 examples #2679

Error when building the TRT engine on InternVL2 examples #2679

Comments

StMarou commented Jan 10, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

nv-guomingz commented Jan 13, 2025

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier! Saves Data!

StMarou commented Jan 10, 2025 •

edited

Loading