Presentation 2
Presentation 2
Presentation 2
No current implementation
specifically for end to end ASR
Approach
CNN for frame level Classification
RNN with CTC loss for decoding
Traditioinal Hidden Markov Model not used
Used Mel logged-filter bank features as input
Results
Frame level classification satisfactory
Decoding scheme needs improvement
Literature Review
• Towards End-To-End Speech Recognition with Deep Convolutional Neural
Networks Bengio et al., Interspeech 2016
Approach
CNN for frame level Classification
No RNN used at all
CTC loss used for decoding
Traditioinal Hidden Markov Model not used
Used Mel logged-filter bank features as input
Results
CNN able to capture temporal relations
Training faster as comapred to RNN models
Literature Review
• End-To-End Speech Recognition from the Raw Waveform (2018)
Zeghidour et al., Facebook A.I.
Approach
End-to-End system trained directly from Raw Waveform
Uses trainable filterbanks in place of log mel-filterbanks
Uses CNN architecture
Results
Improved performance over log mel-filterbanks
Thank you!