A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
-
Updated
Jan 24, 2025 - Python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
PyTorch implementation of "Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss" (ICASSP 2020)
A curated list of awesome papers on contextualizing E2E ASR outputs
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.
An implementation of RNN-Transducer loss in TF-2.0.
I'm building an end-to-end Vietnamese Speech Recognition System. I'll deploy it into production with the help of Flask, Uwsgi, Nginx, and AWS ...
Pure PyTorch implementation of the loss described in "Online Segment to Segment Neural Transduction" https://arxiv.org/abs/1609.08194
Deep learning-based subtitle generation model that processes audio datasets to generate accurate text transcriptions. Includes audio feature extraction, encoder-decoder architecture, training pipelines, and evaluation metrics for subtitle alignment.
Add a description, image, and links to the rnnt topic page so that developers can more easily learn about it.
To associate your repository with the rnnt topic, visit your repo's landing page and select "manage topics."
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: