Project phase 4 ibm[1] (1)
Project phase 4 ibm[1] (1)
Group Member:
▪ Name: : N Dishanth Naik
▪ CAN ID Number: CAN 33401527
▪ Name: Shashank MS
▪ CAN ID Number: CAN_33497484
▪ Name: Punith V
▪ CAN ID Number: CAN_33467315
Name: : N Dishanth Naik CAN ID Number: CAN 33401527 (Responsible for Data Collection &
Preprocessing)
• Tasks:
o Collect multilingual parallel datasets (e.g., from OpenSubtitles, Europarl, WMT datasets).
o Clean and preprocess the text data (tokenization, normalization, removing noise).
o Align sentences and ensure correct formatting for training.
o Split data into training, validation, and test sets.
• Tools:
o Python, NLTK, spaCy, Moses toolkit, TensorFlow Datasets
o Libraries: sentencepiece, re, pandas
Name: Shashank MS CAN ID Number: CAN_33497484 Responsible for Neural Machine Translation
Model)
• Tasks:
o Select the model architecture (Transformer-based, seq2seq with attention, or pretrained
models like mBART, T5).
o Train and fine-tune the model using TensorFlow/Keras or PyTorch.
o Optimize hyperparameters for better translation accuracy.
o Evaluate the model using BLEU, METEOR, and other NLP metrics.
• Tools:
o TensorFlow, PyTorch, Hugging Face transformers
o Libraries: torch, transformers, tensorflow, fairseq
Name: Punith V CAN ID Number: CAN_33467315 (Responsible for Real-Time Processing & Integration)
• Tasks:
o Develop an API (Flask/FastAPI) to handle input text and return translated output.
PHASE 4
o Integrate the trained NMT model into the backend.
• Tools:
Name: Insia Sarwath CAN ID Number: CAN_33442031 (Responsible for User Interface &
Deployment)
• Tasks:
o Design an intuitive UI for users to input text/audio for translation.
o Implement WebSocket for real-time translation responses.
o Integrate the backend API with the frontend.
o Deploy the application (using cloud platforms like AWS/GCP/Azure).
• Tools:
o React.js, Vue.js, or Angular
o WebSockets, Bootstrap, Tailwind CSS
o Deployment: AWS Lambda, Google Cloud Functions, Docker
Real-time language translation requires a well-integrated system that includes a trained Neural
Machine Translation (NMT) model, an interactive web interface, and a cloud-based deployment
environment for scalability and efficiency.
• Model Deployment: Hosting the NMT model in a cloud or edge environment for real-time
inference.
• Interface Development: Creating a user-friendly front-end for users to input and receive
translated text/audio.
• Collaboration & Integration: Combining AI models with external APIs, databases, and third-
party services.
• Convert the trained model into a lightweight, optimized format (e.g., ONNX, TensorRT) for faster
inference.
• Use quantization and pruning to reduce computational overhead.
• On-premise: Running the model locally for low-latency translation (e.g., edge AI devices).
• Cloud-based: Deploying on cloud platforms such as AWS, Google Cloud, or Azure for scalability.
•
PHASE 4
• Hybrid approach: Using both local processing and cloud fallback when required.
3. Deployment Pipelines
## Project Overview
This project is a real-time language translation system using **Neural Machine Translation (NMT)**
models from the **Hugging Face Transformers library**. It utilizes **Google T5 (Text-to-Text Transfer
Transformer)** for multilingual translation and **Helsinki-NLP's OPUS-MT** models for specific
language pairs.
## Features
- Translates text from English to multiple languages, including **French, German, and Spanish**.
- Uses **Google T5 (t5-base)** for general multilingual translation.
- Uses **Helsinki-NLP OPUS-MT** for specific language pairs (e.g., English to Spanish).
- Streamlit-based UI for user-friendly interaction (coming soon!).
## Technologies Used
- Python
- Hugging Face **Transformers**
- **T5-base** for multilingual translation
- **Helsinki-NLP OPUS-MT** for specific translations
- **Streamlit** for UI (upcoming feature)
## Installation
### **1. Clone the Repository**
```sh
git clone https://github.com/yourusername/Real-Time-Language-Translation-NMT.git
cd Real-Time-Language-Translation-NMT
```
## Usage
Run the Python script to translate text:
```sh
python translation.py
```
## Code Structure
```
Real-Time-Language-Translation-NMT/
├── translation.py # Main script for translation
├── app.py # Streamlit UI (coming soon)
├── requirements.txt # Required Python packages
├── README.md # Project documentation
└── venv/ # Virtual environment (not included in repo)
```
## Example Output
```sh
Translated to French: Comment vous êtes-vous ?
Translated to German: Wie läuft es Ihnen?
Translated to Spanish: ¿Cómo estás?
```
## Troubleshooting
### **1. Model Loading Issues**
If you get an error related to **model loading**, ensure you have an internet connection and that the
model name is correctly spelled. Try manually downloading the model:
```python
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-es")
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-es")
```
2. API Integration
• Connect the web interface to the NMT model API for seamless communication.
• Implement error handling for unsupported languages or failed translations.
• Use WebSockets for real-time translation feedback.
import streamlit as st
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
tokenizer.src_lang = lang_codes[src_lang]
encoded = tokenizer(text, return_tensors="pt")
generated_tokens = model.generate(
**encoded,
forced_bos_token_id=tokenizer.get_lang_id(lang_codes[tgt_lang])
)
return tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0]
# Streamlit UI
st.title("Multi-Language Translator")
PHASE 4
source_text = st.text_area("Enter text to translate:")
source_lang = st.selectbox("Source Language", ["English", "French", "German", "Spanish"])
target_lang = st.selectbox("Target Language", ["English", "French", "German", "Spanish"])
if st.button("Translate"):
if source_text:
translation = translate_text(source_text, source_lang, target_lang)
st.success(f"Translated Text: {translation}")
else:
st.error("Please enter text to translate!")
• Add translation capabilities to platforms like Slack, Microsoft Teams, Zoom, and WhatsApp.
• Connect with customer service chatbots for multilingual support.
Future Scope
Final Thoughts
Deploying a real-time language translation system requires a well-structured approach involving model
optimization, cloud hosting, UI/UX development, and collaboration with experts. Future advancements in
AI, deep learning, and cloud computing will continue to improve translation accuracy, speed, and
accessibility, making global communication more seamless than ever.
▪ Name: Shashank MS
▪ CAN ID Number: CAN_33497484 https://github.com/ms-shashank/Real-Time-
Language-Translation-NMT
▪ Name: Punith V
▪ CAN ID Number: CAN_33467315 https://github.com/Punithv2003