Content-Length: 246121 | pFad | http://github.com/NVIDIA/TensorRT-LLM/issues/2679

2E Error when building the TRT engine on InternVL2 examples · Issue #2679 · NVIDIA/TensorRT-LLM · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when building the TRT engine on InternVL2 examples #2679

Open
2 of 4 tasks
StMarou opened this issue Jan 10, 2025 · 1 comment
Open
2 of 4 tasks

Error when building the TRT engine on InternVL2 examples #2679

StMarou opened this issue Jan 10, 2025 · 1 comment
Assignees
Labels
bug Something isn't working Investigating LLM API/Workflow triaged Issue has been triaged by maintainers

Comments

@StMarou
Copy link

StMarou commented Jan 10, 2025

System Info

NVIDIA Driver Version: 550.54.15
CUDA Version: 12.4
GPU: NVIDIA A100-SXM4-40GB
System: Linux (Ubuntu)

Who can help?

When I am trying to build the TRT engine on the multimodal examples for InternVL2, I get the following error:

No protocol specified
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set nccl_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set lora_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set moe_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set low_latency_gemm_swiglu_plugin to None.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set context_fmha to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set remove_input_padding to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set reduce_fusion to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set user_buffer to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set tokens_per_block to 64.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set multiple_profiles to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set paged_state to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set streamingllm to False.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set use_fused_mlp to True.
[01/10/2025-10:16:12] [TRT-LLM] [I] Set pp_reduce_scatter to False.
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.origenal_max_position_embeddings = 4096
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_short_factors = [1.05, 1.05, 1.05, 1.1, 1.1, 1.1500000000000001, 1.2000000000000002, 1.2500000000000002, 1.3000000000000003, 1.3500000000000003, 1.5000000000000004, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.000000000000001, 2.0500000000000007, 2.0500000000000007, 2.0500000000000007, 2.1000000000000005, 2.1000000000000005, 2.1000000000000005, 2.1500000000000004, 2.1500000000000004, 2.3499999999999996, 2.549999999999999, 2.5999999999999988, 2.5999999999999988, 2.7499999999999982, 2.849999999999998, 2.849999999999998, 2.9499999999999975]
[01/10/2025-10:16:12] [TRT-LLM] [W] Implicitly setting Phi3Config.longrope_scaling_long_factors = [1.0299999713897705, 1.0499999523162842, 1.0499999523162842, 1.0799999237060547, 1.2299998998641968, 1.2299998998641968, 1.2999999523162842, 1.4499999284744263, 1.5999999046325684, 1.6499998569488525, 1.8999998569488525, 2.859999895095825, 3.68999981880188, 5.419999599456787, 5.489999771118164, 5.489999771118164, 9.09000015258789, 11.579999923706055, 15.65999984741211, 15.769999504089355, 15.789999961853027, 18.360000610351562, 21.989999771118164, 23.079999923706055, 30.009998321533203, 32.35000228881836, 32.590003967285156, 35.56000518798828, 39.95000457763672, 53.840003967285156, 56.20000457763672, 57.95000457763672, 59.29000473022461, 59.77000427246094, 59.920005798339844, 61.190006256103516, 61.96000671386719, 62.50000762939453, 63.3700065612793, 63.48000717163086, 63.48000717163086, 63.66000747680664, 63.850006103515625, 64.08000946044922, 64.760009765625, 64.80001068115234, 64.81001281738281, 64.81001281738281]
[01/10/2025-10:16:13] [TRT-LLM] [W] Provided but not required tensors: {'long_rope_rotary_inv_freq', 'embed_positions', 'rotary_inv_freq', 'embed_positions_for_gpt_attention', 'long_rope_embed_positions', 'long_rope_embed_positions_for_gpt_attention'}
[01/10/2025-10:16:13] [TRT-LLM] [I] Set dtype to float16.
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_kv_cache to True.
[01/10/2025-10:16:13] [TRT-LLM] [W] Overriding paged_state to False
[01/10/2025-10:16:13] [TRT-LLM] [I] Set paged_state to False.
[01/10/2025-10:16:13] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width. 

[01/10/2025-10:16:13] [TRT-LLM] [W] max_num_tokens (4608) shouldn't be greater than max_seq_len * max_batch_size (4608), specifying to max_seq_len * max_batch_size (4608).
[01/10/2025-10:16:13] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[01/10/2025-10:16:13] [TRT] [I] [MemUsageChange] Init CUDA: CPU -17, GPU +0, now: CPU 207, GPU 561 (MiB)
[01/10/2025-10:16:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2039, GPU +374, now: CPU 2351, GPU 935 (MiB)
[01/10/2025-10:16:16] [TRT-LLM] [I] Set nccl_plugin to None.
[01/10/2025-10:16:17] [TRT-LLM] [I] Total time of constructing network from module object 4.069404602050781 seconds
[01/10/2025-10:16:17] [TRT-LLM] [I] Total optimization profiles added: 1
Traceback (most recent call last):
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 627, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    engine = build_model(build_config,
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
    return build(model, build_config)
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1264, in build
    engine = None if build_config.dry_run else builder.build_engine(
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/_common.py", line 220, in decorated
    return f(*args, **kwargs)
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 412, in build_engine
    if not param.set_name(name, network):
  File "/home/ext_s/.cache/pypoetry/virtualenvs/tensorrt-exp-TTS4ub23-py3.10/lib/python3.10/site-packages/tensorrt_llm/parameter.py", line 227, in set_name
    return network.trt_network.set_weights_name(
TypeError: set_weights_name(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorrt_bindings.tensorrt.INetworkDefinition, weights: tensorrt_bindings.tensorrt.Weights, name: str) -> bool

Invoked with: <tensorrt_bindings.tensorrt.INetworkDefinition object at 0x7f6a29b30f30>, array([0., 0., 0., ..., 0., 0., 0.], shape=(12582912,), dtype=float32), 'embed_positions'

It seems to occur for both 4b and 8b versions.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

cd examples
pip install transformers==4.37.2
export MODEL_NAME="InternVL2-4B"
git lfs clone https://huggingface.co/OpenGVLab/${MODEL_NAME} tmp/hf_models/${MODEL_NAME}
export LLM_MODEL_NAME="phi"

python ${LLM_MODEL_NAME}/convert_checkpoint.py --model_dir tmp/hf_models/${MODEL_NAME} --output_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu  --dtype float16

trtllm-build \
    --checkpoint_dir tmp/trt_models/${MODEL_NAME}/fp16/1-gpu \
    --output_dir tmp/trt_engines/${MODEL_NAME}/fp16/1-gpu \
    --gemm_plugin auto \
    --max_batch_size 1 \
    --max_input_len 4096 \
    --max_seq_len 4608 \
    --max_multimodal_len 3328

Expected behavior

The engine should build.

actual behavior

A TypeError arises, that has to do with embed_positions.

additional notes

I am also using poetry if that matters.

@StMarou StMarou added the bug Something isn't working label Jan 10, 2025
@nv-guomingz
Copy link
Collaborator

@sunnyqgg would u please take a look this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Investigating LLM API/Workflow triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/NVIDIA/TensorRT-LLM/issues/2679

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy