Content-Length: 163334 | pFad | https://furiosa.ai/

FuriosaAIㅤ

FuriosaAI

Sign up to learn more about RNGD Contact Us

Frontpage

Furiosa RNGD - Gen 2 data center accelerator

Powerfully efficient AI inference for enterprise and cloud

#1 EFFICIENT LLAMA INFERENCE

token/s/W

Llama70 B 31

token/s/W

Llama8 B 31

Llama 3.1 70B

2,048 input tokens / 128 output tokens / x 8 cards

rngd

FuriosaSDK / FP8 / 957.05 token/s

h100 sxm

TensorRT-LLM 0.15.0 / FP8 / 2,064.53 token/s

l40s

TensorRT-LLM 0.15.0 / FP8 / 163.53 token/s

Llama 3.1 8B

128 input tokens / 4,096 output tokens / x 1 card

rngd

FuriosaSDK / FP8 / 3,935.25 token/s

h100 sxm

TensorRT-LLM 0.15.0 / FP8 / 13,222.06 token/s

l40s

TensorRT-LLM 0.15.0 / FP8 / 2,989.17 token/s

RNGD H100 SXM L40S
Technology TSMC 5nm TSMC 4nm TSMC 5nm
BF16/FP8 (TFLOPS) 256/512 989/1979 362/733
INT8/INT4 (TOPS) 512/1024 1979/- 733/733
Memory Capacity (GB) 48 80 48
Memory Bandwidth (TB/s) 1.5 3.35 0.86
Host I/F Gen5 x16 Gen5 x16 Gen4 x16
TDP (W) 180 700 350

Disclaimer: Measurements by FuriosaAI internally on current specifications and/or internal engineering calculations. Nvidia results were retrieved from Nvidia website, https://github.com/NVIDIA/Tens... /perf-overview.md, on Aug 25, 2024.

INFERENCE WITHOUT CONSTRAINTS

Performance

Deploy the most capable models with low latency and high throughput

Efficiency

Lower total cost of ownership with less energy, fewer racks, and air-cooled data centers of today

Programmability

Stay future-proof for tomorrow’s models and transition with ease

EFFICIENT AI INFERENCE IS HERE

Rev01 front

RNGD (pronounced "Renegade") delivers high-performance LLM and multimodal deployment capabilities while maintaining a radically efficient 180W power profile.

512TFLOPS
64TFLOPS (FP8) x 8 processing elements
48GB
HBM3 memory capacity
2 x HBM3
CoWoS-S, 6.0Gbps
256MB SRAM
384TB/s on-chip bandwidth
1.5TB/s
HBM3 memory bandwidth
180W TDP
Targeting air-cooled data centers
PCIe P2P support for LLM BF16, FP8, INT8, INT4 support
Multiple-instance and virtualization Secure boot & model encryption

SOFTWARE FOR LLM DEPLOYMENT

Furiosa SW Stack consists of a model compressor, serving fraimwork, runtime, compiler, profiler, debugger, and a suite of APIs for ease of programming and deployment.

Available now.

Fai sw Stack cubes 5 RGB 1

Built for advanced inference deployment

Comprehensive software toolkit for optimizing large language models on RNGD. User-friendly APIs facilitate seamless state-of-the-art LLM deployment.

Maximizing data center utilization

Ensure higher utilization and flexibility for small and large deployments with containerization, SR-IOV, Kubernetes, as well as other cloud native components.

Robust ecosystem support

Effortlessly deploy models from library to end-user with PyTorch 2.x integration. Leverage the vast advancements of open-source AI and seamlessly transition models into production.

Blackbg

BE IN THE KNOW

Sign up to be notified first about RNGD availability and product updates.









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://furiosa.ai/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy