-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Failed TensorRT-LLM Benchmark
bug
Something isn't working
#2694
opened Jan 15, 2025 by
maulikmadhavi
1 of 4 tasks
0.16.0 Qwen2-72B-Struct SQ error
bug
Something isn't working
#2693
opened Jan 15, 2025 by
gy0514020329
4 tasks
NotImplementedError: Cannot copy out of meta tensor; no data!
bug
Something isn't working
#2692
opened Jan 15, 2025 by
chilljudaoren
2 of 4 tasks
(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin
bug
Something isn't working
#2690
opened Jan 14, 2025 by
idantene
2 of 4 tasks
trtllm-build llama3.1-8b failed
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2688
opened Jan 14, 2025 by
765500005
Multi-LoRA cpp inference error: Assertion failed: lora_weights has to few values for attn_k
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2687
opened Jan 13, 2025 by
lodm94
internvl-2.5
triaged
Issue has been triaged by maintainers
#2686
opened Jan 13, 2025 by
ChenJian7578
Inference error encountered while using the draft target model.
bug
Something isn't working
#2684
opened Jan 13, 2025 by
pimang62
2 of 4 tasks
Deepseek-v3 int4 weight only inference outputs garbage words with TP 8 on nvidia H20 GPU
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2683
opened Jan 13, 2025 by
handoku
the difference of quantization implementation between quantize.py and convert_checkpoint.py
triaged
Issue has been triaged by maintainers
#2681
opened Jan 12, 2025 by
XA23i
How to use multiple GPUs to infer qwen?
triaged
Issue has been triaged by maintainers
#2680
opened Jan 10, 2025 by
aaIce
Error when building the TRT engine on InternVL2 examples
bug
Something isn't working
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2679
opened Jan 10, 2025 by
StMarou
2 of 4 tasks
Inference Qwen2-0.5b + Medusa failed
bug
Something isn't working
Investigating
Speculative Decoding
triaged
Issue has been triaged by maintainers
#2678
opened Jan 10, 2025 by
shangshng
2 of 4 tasks
Llama-3.2 SmoothQuant convert checkpoint error
Investigating
LLM API/Workflow
triaged
Issue has been triaged by maintainers
#2677
opened Jan 10, 2025 by
lyffly
1 of 4 tasks
Difference in attention output when compared to HF engine attention output result.
bug
Something isn't working
#2675
opened Jan 9, 2025 by
krishnanpooja
3 of 4 tasks
an error occur(module ‘torch.distributed’ has no attribute ‘ReduceOp’])
#2674
opened Jan 9, 2025 by
fangbaolei
EAGLE model seems to be deployed but raises an error on inference
bug
Something isn't working
#2673
opened Jan 9, 2025 by
nuxlear
2 of 4 tasks
Prompt formatting for different version of InternVL2
bug
Something isn't working
#2672
opened Jan 8, 2025 by
nzarif
2 of 4 tasks
Help needed: No clear documentation/examples for implementing speculative decoding with backend serve
#2671
opened Jan 8, 2025 by
e1ijah1
trtllm-serve without any output Qwne2.5-7b
bug
Something isn't working
OpenAI API
#2667
opened Jan 8, 2025 by
Justin-12138
1 of 4 tasks
fp8 quantization for CohereForCausalLM
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2666
opened Jan 7, 2025 by
Alireza3242
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.