0% found this document useful (0 votes)
2 views4 pages

Phase 2 File

The document outlines the Phase-2 submission for a project focused on developing an NLP-powered chatbot for customer support, specifically targeting intent classification. It details the problem statement, project objectives, data description, preprocessing steps, exploratory data analysis, model building, and performance metrics. The project aims to enhance customer service efficiency through accurate intent recognition, utilizing various models including Logistic Regression, Random Forest, and BERT.

Uploaded by

mdnafeed29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

Phase 2 File

The document outlines the Phase-2 submission for a project focused on developing an NLP-powered chatbot for customer support, specifically targeting intent classification. It details the problem statement, project objectives, data description, preprocessing steps, exploratory data analysis, model building, and performance metrics. The project aims to enhance customer service efficiency through accurate intent recognition, utilizing various models including Logistic Regression, Random Forest, and BERT.

Uploaded by

mdnafeed29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Phase-2 Submission Template

Student Name: Harish Ragavendra S

Register Number: 510623243010

Institution: C.Abdul Hakeem College of Engineering & Technology

Department: Btech Ai&Ds

Date of Submission: 09.05.2025

Github Repository Link: https://github.com/madhuhansa/NLP-Intent-


classification-chatbot

1. Problem Statement
In the digital age, customers expect instant, accurate, and 24/7 support across
various platforms. Traditional customer service models rely heavily on human
agents, resulting in increased operational costs, inconsistent responses, and delays
during high-demand periods. To overcome these limitations, businesses are
increasingly turning to intelligent automation solutions like chatbots.

In Phase-1, we proposed a basic NLP-powered chatbot for customer support. Upon


further exploration of the dataset and chatbot interactions, we have refined the
problem to focus specifically on intent classification—accurately identifying the
user's intent from their message and generating a relevant, automated response.

This is fundamentally a multi-class classification problem, where the chatbot must


classify input queries into predefined categories such as “Order Status,” “Returns,”
“Technical Support,” etc. The quality of classification directly impacts the
effectiveness of the automated response.

Solving this problem has practical and wide-reaching implications. An accurate


intent-classifying chatbot reduces human workload, improves customer satisfaction,
enables round-the-clock support, and allows businesses to scale their support
infrastructure efficiently. It has relevance across e-commerce, healthcare, banking,
and more, making it a vital solution for modern customer service operations.
2. Project Objectives
- Refine the chatbot to improve intent recognition and response generation
accuracy.
- Enhance user experience through improved NLP pipelines.
- Achieve high model performance in terms of accuracy and F1-score.
- Adapt and evolve objectives based on insights from EDA and initial trials.

3. Flowchart of the Project Workflow

4. Data Description
- Dataset: Custom and open-source chatbot datasets (e.g., Cornell Movie Dialogues,
Kaggle FAQs).
- Type: Text (Unstructured)
- Number of Records: ~10,000 conversation pairs (questions and responses)
- Number of Features: 2 main columns – user_input and intent
- Dataset Type: Static
- Target Variable: intent (used for classification)
5. Data Preprocessing
- Removed missing and duplicate entries.
- Normalized text (lowercasing, punctuation removal).
- Tokenized sentences and applied lemmatization.
- Applied label encoding on target variable.
- Vectorized inputs using TF-IDF and BERT embeddings.

6. Exploratory Data Analysis (EDA)


Univariate Analysis:
- Countplots and pie charts to visualize intent distribution.
- Word clouds for most common words.
- Boxplots of message lengths.

Bivariate / Multivariate Analysis:


- Bar plots: Intent vs. average message length.
- Cosine similarity heatmaps for intent overlap.

Insights Summary:
- Common intents dominate dataset.
- Keyword-based patterns support model separability.

7. Feature Engineering
- Created features: message length, keyword flags.
- TF-IDF vectorization and BERT embeddings used.
- PCA used on TF-IDF (optional dimensionality reduction).
- Features helped improve classification accuracy.

8. Model Building
Models Used:
1. Logistic Regression – baseline with TF-IDF.
2. Random Forest – handles sparse data, interpretable.
3. BERT – transformer model with high accuracy.

Train-Test Split: 80-20, stratified.

Metrics: Accuracy, Precision, Recall, F1-score.


Performance:
| Model | Accuracy | Precision | Recall | F1-Score |
|--------------------|----------|-----------|--------|----------|
| Logistic Regression| 84.5% | 0.83 | 0.84 | 0.835 |
| Random Forest | 88.2% | 0.87 | 0.88 | 0.875 |
| BERT Transformer | 94.1% | 0.94 | 0.94 | 0.94 |

9. Visualization of Results & Model Insights


- Confusion Matrix for all models showed class-wise performance.
- ROC Curves confirmed BERT's superior prediction confidence.
- Feature Importance plots from Random Forest explained top keywords.
- Visual comparisons proved BERT outperformed others in both precision and
recall.

10. Tools and Technologies Used


- Programming Language: Python
- IDE: Google Colab, Jupyter Notebook, VS Code
- Libraries: pandas, numpy, seaborn, matplotlib, scikit-learn, transformers,
TensorFlow
- Visualization Tools: Plotly, seaborn

11. Team Members and Contributions


- Harish Ragavendra S – Data Cleaning, Model Development, Report Writing
- Justin Rishi S B – EDA, Feature Engineering,
- Mohammed Raquess – Model Evaluation, Deployment

- Mohammed Naveed - GitHub Management, Visualizations

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy