0% found this document useful (0 votes)
38 views27 pages

Intership Report

pdf g

Uploaded by

hpidugu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views27 pages

Intership Report

pdf g

Uploaded by

hpidugu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

INTERNSHIP REPORT

On

SENTIMENTAL ANALYSIS
By

Jaya Surya BU21CSEN0500155

K. Naga Nithin BU21CSEN0500189

P. Yeshwanth BU21CSEN05500255

T. Priyanka BU21CSEN0500310

Course code INTN3444

(Duration: 02-07-2024 to 31-10-2024)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Gandhi Institute of Technology and Management

(DEEMED TO BE A UNIVERSITY)

BENGALURU, KARNATAKA, INDIA

SESSION:2021-20

1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM SCHOOL OF TECHNOLOGY

GITAM

DECLARATION

We, hereby declare that the Internship report entitled “Sentimental Analysis using
machine learning” is an original work done in the Department of Computer Science
and Engineering, GITAM School of Technology, GITAM (Deemed to be University)
Bengaluru submitted in partial fulfilment of the requirements for the award of the degree
of B.Tech. in Computer Science and Engineering. The work has not been submitted to any
other college or University for the award of any degree.
Date:
Registration No(s) Name Signature
BU21CSEN0500255 P.YESHWANTH
BU21CSEN0500155 C.JAYA SURYA
BU21CSEN0500189 K.NAGA NITHIN
BU21CSEN0500310 T.PRIYANKA

2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GITAM SCHOOL OF TECHNOLOGY

GITAM

CERTIFICATE

This is to certify that the mini project report entitled “SENTIMENTAL ANALYSIS
USING MACHINE LEARNING” is a bonafide record of work carried out by
P.YESHWANTH (BU21CSEN0500255), C.JAYASURYA (BU21CSEN0500155
), K.NAGA NITHIN(BU21CSEN0500189), T.PRIYANKA(BU21CSEN0500310)

submitted in partial fulfillment of the requirement for the award of the degree of
Bachelor of Technology in Computer Science and Engineering.

Dr.Arun Prasath. G

Assistant Professor

Department of CSE,

GST, Bengaluru

3
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task would
be incomplete without the mention of the people who made it possible, whose consistent
guidance and encouragement crowned our efforts with success.

We consider it our privilege to express our gratitude to all those who guided us in
completing the project.

We express our gratitude to Director Prof. Basavaraj Gundappa Katageri for having
provided uswith the golden opportunity to undertake this project work in their esteemed
organization.

We sincerely thank Dr. Y. Vamshidhar, HOD, Department of Computer Science and


Engineering, Gandhi Institute of Technology and Management, Bengaluru for the
immense support given to us.

We express our gratitude to our project guide (GUIDE NAME), (DESIGNATION),


Department of Computer Science and Engineering, Gandhi Institute of Technology and
Management, Bengaluru, for their support, guidance, and suggestions throughout the
project work.

Student Name Registration No.

P.YESHWANTH BU21CSEN0500255

C JAYA SURYA BU21CSEN0500155

K NAGA NITHIN BU21CSEN0500189

T PRIYANKA BU21CSEN0500310

4
Contents

Title Page No.


Internship 1

Declaration 2

Certificate 3
Acknowledgment 4

1. INTRODUCTION 6

2. Abstract 7

3.Literature Review 8-9

4.System Specification 10

5.Methdology 11

6.Design Phase 12-15

7.Module Description 16-18

8.Evaluation Steps 19

9.Implementation 20-22

10.Results and Discussions 23-25

11.Conclusion 26

5
INTRODUCTION

Sentiment analysis, a branch of natural language processing (NLP) and machine learning,
plays a pivotal role in deciphering the sentiments expressed in textual data. In the context
of Amazon reviews, sentiment analysis involves leveraging advanced algorithms to discern
whether customers’ opinions are positive, negative, or neutral. As an indispensable tool for
businesses, sentiment analysis provides valuable insights into consumer perceptions,
allowing companies to gauge the success of their products and services. Amazon, as one of
the world’s largest online marketplaces, generates an immense volume of customer
reviews daily. Analyzing this vast reservoir of textual data can uncover trends, sentiments,
and critical feedback, providing businesses with actionable information to enhance their
offerings and customer experiences. The integration of machine learning in sentiment
analysis empowers models to learn and adapt to evolving language patterns, ensuring
accurate and relevant results. The effectiveness of sentiment analysis in a dynamic and
diverse marketplace like Amazon is further enhanced by the development of sophisticated
natural language processing techniques. Models are trained on vast datasets that not only
include plain text but also contextual nuances and varying linguistic styles from across the
globe. As a result, businesses can achieve a more nuanced understanding of customer
sentiment, capturing not just the binary of positive or negative, but the full spectrum of
emotions and intensities that underpin consumer feedback.
Furthermore, sentiment analysis tools can be integrated into automated systems to provide
real-time insights into customer reviews and feedback. This real-time capability allows
companies to swiftly respond to customer concerns and improve their products and
services proactively. For instance, a sudden surge in negative reviews on a newly launched
product can trigger an immediate investigation and necessary adjustments to address the
issues raised by customers. Such responsiveness not only helps in maintaining high
customer satisfaction but also builds brand loyalty and trust, which are critical in the
competitive e-commerce landscape.

6
ABSTRACT

In the realm of customer feedback analysis, this project presents a sentiment analysis tool
specifically designed for product reviews. Leveraging the power of natural language
processing (NLP) and machine learning, this tool empowers businesses to gain actionable
insights from a vast well of customer voices. The core objective is to equip businesses with
a robust solution for sentiment analysis. The methodology behind this project adheres to a
comprehensive approach. Data collection forms the initial step, encompassing a diverse set
of sources to ensure a well-rounded perspective. This data then undergoes meticulous
preprocessing to refine its quality. Here, techniques like text cleaning are employed to
eliminate irrelevant information such as typos or promotional content. The percentage
reduction in irrelevant text during this stage serves as a valuable metric, quantifying the
improvement in data quality.Once the data is prepped, it’s fed into the machine learning
engine. A specific algorithm, such as Naive Bayes with a demonstrated accuracy of 85%,
is chosen for the task. To optimize training and ensure generalizability, the data is split into
two sets: a training set (typically 80%) used to teach the model, and a validation set (20%)
used to evaluate its performance.The culmination of these efforts is a functional sentiment
analysis tool capable of accurately classifying product reviews into distinct sentiment
categories (positive, negative, or neutral). A key metric here is the tool’s classification
accuracy. Imagine the tool achieving a remarkable 90% accuracy in differentiating
positive, negative, and neutral sentiments within reviews. This speaks volumes about its
effectiveness in capturing the true voice of the customer.However, accuracy is just one
piece of the puzzle. For businesses to make real-time decisions, processing speed is
paramount. The ability of the tool to analyze a high volume of reviews quickly is
crucial.Businesses of all technical backgrounds should be able to leverage the tool’s
capabilities and gain actionable insights effortlessly. In essence, this sentiment analysis
tool empowers businesses to truly listen to their customers, understand their sentiment, and
ultimately make data-driven decisions that enhance customer satisfaction and drive
business growth.
Keywords:Feature Extraction, Lexicon-based methods, Machine Learning Algorithms,
Naive Bayes, N-grams, Natural Language Processing, Opinion Mining, Part-of-Speech
Tagging, Supervised Learning, Text Classification.

7
LITERATURE REVIEW
[1] S.E.Saad et al.,(2019) proposed that to aim for giving a complete tweet sentiment
analysis on the basis of ordinal regression with machine learning algorithms. The
suggested model included pre-processing tweets as first step and with the feature
extraction model, an
effective feature was generated. The methods such as SVR, RF, Multinomial logistic
regression (SoftMax), and DTs were employed for classifying the sentiment analysis.
Moreover, twitter dataset was used for experimenting the suggested model. The test results
have shown that the suggested model has attained the best accuracy, and also DTs were
performed well when compared over other methods.
[2] Afzaal et al., (2019) developed a novel approach to aspect-based sentiment
classification. The approach focused on recognizing features precisely, achieving the best
classification accuracy. The developed scheme was implemented as a mobile application to
assist tourists in identifying the best hotels. Real-world datasets were used to analyze the
proposed model, demonstrating its effectiveness in both recognition and classification.
[3] Feizollah et al., (2019) In the same year,concentrated on tweets related to halal
products, specifically halal cosmetics and halal tourism. Twitter search function was
utilized to extract information, and a new model was employed for data filtering. Deep
learning
models, including RNN, CNN, and LSTM, were used to compute and evaluate tweets,
with the combination of LSTM and CNN achieving the best accuracy.
[4] Mukhtar et al., (2018) performed sentiment analysis on Urdu blogs from various
domains using Supervised Machine Learning and Lexicon-based models. The Lexicon-
based models employed a well-performing Urdu sentiment analyzer and an Urdu
Sentiment
Lexicon. Results showed that the Lexicon-based model outperformed the supervised
machine learning algorithm.
[5] Kumar et al.,(2020)In 2020, addressed a hybrid deep learning approach named
ConVNet-SVMBoVW for predicting fine-grained sentiment from real-time data. An
aggregation model was developed to measure hybrid polarity, utilizing SVM for training
the
Bag of Visual Words (BoVW) to forecast sentiment in visual content. The suggested
ConVNet-SVMBoVW was found to be outperformed by conventional models.
[6] Abdi et al., (2018) reviewed a machine learning technique for summarizing user
8
opinions from reviews. The method merged multiple features into a unique set for
modeling an accurate classification model. Performance inves- tigation was conducted
for feature

9
selection models and classifiers, revealing that the combination of Information Gain (IG)
and SVM-based classification improved performance. [7] Ray and Chakrabarti, (2019),
designed a deep learning algorithm for extracting features from text for sentiment analysis.
A seven-layer Deep CNN was employed for tagging features in opinionated sentences. The
authors combined deep learning methods with rule-based models, achieving the best
accuracy.
[8] Zhao et al., (2019) Also in 2019,offered a novel image-text consistency-driven
multi- modal sentiment evaluation model. The model explored the correlation between
text and
image, implementing a multi-modal adaptive sentiment analysis model. The suggested
model, integrating visual features using the SentiBank model, outperformed traditional
models.
[9] Park et al., (2019), addressed a semi-supervised sentiment- discriminative objective
for handling partial sentiment data in documents. The sug- gested model not only reflected
partial data but also secured local structures obtained from real data, performing well on
real-time data.

10
System Specification

Hardware Specification

Windows / Linux / Mac Operating System: 4GB RAM / 128 GB SSD

Software Specification
Anaconda Prompt

Anaconda prompt is a type of command line interface which explicitly deals with the ML(
MachineLearning) modules.And navigator is available in all the Windows,Linux and
MacOS.The anaconda prompt has many number of IDE’s which make the coding easier.
The UI can also be implemented in python.

Jupyter

It’s like an open source web application that allows us to share and create the documents
which contains the live code, equations, visualizations and narrative text. It can be used for
data cleaning and transformation, numerical simulation, statistical modeling, data

visualization, machine learning.

11
METHODOLOGY

This above figure delineates the key components involved in the sentiment analysis of
Amazon reviews using machine learning with Python. It includes crucial elements such as
Data Retrieval, Preprocessing, Feature Extraction, Machine Learning Model Training,
Sentiment Classification, and Results Presentation. The diagram serves as an illustrative
guide, elucidating the orchestrated flow of operations that collectively enable the
extraction and understanding of sentiments expressed in Amazon reviews, facilitating
informed decision-making and strategic enhancements for businesses. This visual
representation underscores the intricacy and synergy of each component in the robust
architecture supporting the sentiment analysis process.

12
Design Phase
Data Flow Diagram

The above (DFD) for sentiment analysis of Amazon reviews through machine learning
using Python delineates the flow of data and processes essential to the analytical journey.
The diagram encapsulates stages like Data Retrieval, Text Preprocessing, Feature
Extraction.

13
Use Case Diagram

The above Use Case 4.3 would showcase essential interactions, including actors like the
User and the Sentiment Analysis System, along with functionalities such as Data Retrieval,
Preprocessing, Machine Learning Model Training, and Results Presentation.

14
Class Diagram

This visual representation provides a concise overview of the comprehensive methodology


employed in analyzing sentiments within the context of Amazon reviews using Python-
based machine learning.

15
Sequence Diagram

This Sequence Diagram 4.5 provides a comprehensive overview of the interactions and
processes involved in the sentiment analysis of Amazon reviews using machine learning
with Python. It visually communicates the flow of information, emphasizing the
integration of user input, machine learning algorithms, and the Amazon platform in the
sentiment analysis workflow.

16
Module Description
Data Collection and Preprocessing
Objective: To gather a balanced dataset of product reviews for model training, we employ
web scraping techniques across various online platforms to ensure diversity in product
categories and opinions. Web scraping involves extracting data from websites hosting
reviews, such as e-commerce platforms, review aggregators, and social media forums.
Once collected, natural language processing (NLP) libraries are utilized for data
preprocessing. This involves several key steps: text cleaning, which removes noise like
HTML tags and punctuation; removing stopwords, common words that don’t carry
significant meaning for sentiment analysis; standardizing text data by converting it to a
uniform format, such as lowercase; tokenization, breaking down text into smaller units for
analysis; and lemmatization or stemming, reducing words to their base forms to improve
model generalization. The goal of these preprocessing steps is to prepare the raw text data
for model training by ensuring it is clean, consistent, and representative of diverse
sentiments. This balanced dataset serves as the foundation for developing a robust
sentiment analysis model capable of accurately classifying product reviews.

17
Data Collection and Preprocessing
Model Selection and Training
Objective: The objective is to identify the most suitable machine learning model for
sentiment classification. This involves evaluating various models such as Support Vector
Machines (SVM), Decision Trees, and Neural Networks. Each model is assessed based on
performance metrics like accuracy, precision, recall, and F1-score. SVMs excel in creating
a clear margin of separation between classes, Decision Trees offer transparent decision-
making processes, and Neural Networks leverage complex architectures to capture intricate
patterns.
After rigorous evaluation, the model demonstrating the highest performance across metrics
is selected. This selection considers not only accuracy but also factors like computational
efficiency and interpretability. Once chosen, the selected model undergoes training using
preprocessed and feature-extracted data. This training process ensures that the model learns
to effectively distinguish between different sentiment classes in product reviews and can
generalize well to unseen data. By following this methodology, we ensure the adoption of
an optimal model for sentiment analysis, capable of accurately discerning sentiments
expressed in diverse product reviews while meeting requirements for efficiency and
interpretability.

18
Model Selection and Training
Evaluation Steps
Objective: Sophisticated sentiment analysis tools as part of its insights and optimization
engine to enhance customer experience and business operations. This technology harnesses
machine learning algorithms to interpret and categorize the emotions expressed in
customer reviews, feedback, and social media mentions. By analyzing these sentiments,
Amazon can glean valuable insights into customer satisfaction, product reception, and
market trends. This data-driven approach allows Amazon to optimize its product
recommendations, tailor marketing strategies, and address customer service issues more
effectively. Additionally, sentiment analysis helps in improving product features and
design by identifying specific aspects that customers care about. By continuously refining
this engine, Amazon not only enhances its operational efficiency but also strengthens its
customer relationships by proactively responding to their emotional feedback and adjusting
its offerings accordingly

19
Evaluation Steps

Steps to execute/run/implement the project


Collection of Data

Collecting data for sentiment analysis on Amazon reviews using machine learning involves
gathering a dataset that comprises textual reviews

Processing the data

Beginning with data collection from diverse sources, such as customer reviews on
platforms like Amazon. The gathered textual data undergoes meticulous preprocessing,
including tasks like text cleaning and tokenization. Following preprocessing, machine
learning algorithms are trained on labeled datasets to recognize patterns and sentiments
within the text. The trained models are then applied to analyze new data, categorizing
sentiments into positive, negative, or neutral, providing valuable insights for decision-
making.

Using Name algorithm

Common algorithms include Support Vector Machines (SVM), Naive Bayes, and
Recurrent Neural Networks (RNNs). SVM excels in binary classification tasks, Naive
Bayes is computationally efficient and effec- tive for text classification, while RNNs, with
their sequential learning, are adept at capturing contextual nuances in sentiment. The
selection depends on the dataset characteristics and the desired balance between
interpretability and complexity in sentiment analysis models.

20
IMPLEMENTATION
Input and Output
Input Design

In designing inputs for sentiment analysis, the focus is on creating a feature-rich


representation of textual data. Techniques such as word embeddings (e.g., Word2Vec or
GloVe) or contextual embeddings (e.g., BERT) capture semantic relationships and
contextual nuances. Additionally, preprocessing steps like text normalization, removal of
stop words, and handling of special characters enhance the quality of input data. The
careful design of input features significantly influences the model Review Analysis

21
Output Design
The output design revolves around presenting meaningful and interpretable results. The
model’s predictions are typically categorized into sentiment classes (positive, negative, or
neutral). Visual representations, such as confusion matrices or precision-recall curves, aid
in evaluating model performance. Additionally, providing a clear summary or visualization
of sentiment distribution in the analyzed data enhances the interpretability of the sentiment
analysis results. The output design aims to convey insights derived from the model’s
predictions in a comprehensible and actionable format. Balancing the trade-off between
model complexity and interpretability is a critical consideration, ensuring the sentiment
analysis model aligns with specific business requirements and user needs. Additionally,
ongoing model monitoring and adaptation to evolving language trends are essential for
maintaining the effectiveness of sentiment analysis systems over time.

22
23
RESULTS AND DISCUSSIONS

Efficiency of the Proposed System

In evaluating the efficiency of sentiment analysis, the quality of the dataset plays a pivotal
role. A larger and more diverse dataset enhances the model’s ability to generalize and
comprehend diverse language patterns, while high-quality labeled data is essential for
training a reliable sentiment analysis model. Additionally, the efficacy of preprocessing
techniques, such as effective text cleaning and proper tokenization, contributes
significantly to model performance.
Feature extraction methods, including the representation of text through word embeddings
or TF-IDF, further impact the model’s capability to cap- ture semantic meaning. The
choice of a machine learning algorithm, training time considerations, and real-time
processing factors, such as latency are critical elements.
Evaluation metrics, encompassing accuracy, precision, recall, and F1 score, provide
insights into the model’s performance, while adaptability to dynamic language patterns and
domain adaptation is crucial for long-term efficiency. Incorporating feedback mechanisms
for model updates and a human-in-the-loop approach, along with vigilant monitoring of
resource consumption, robustness, and generalization, contribute to a comprehensive
assessment of sentiment analysis efficiency. Striking a balance between accuracy, speed,
adaptability, and resource utilization is vital, ne- cessitating regular monitoring, evaluation,

and potential retraining of the model to enhance and maintain its efficiency over time.

24
Comparison of Existing and Proposed System

Existing System Proposed System


Basic sentiment analysis on customer Advanced aspect-based sentiment analysis
reviews, focusing on overall satisfaction. to pinpoint specific product features and
their satisfaction levels.

Utilizes sentiment analysis primarily for Extends usage to real-time customer support
product recommendations and review and dynamic product customization based
summaries. on sentiment trends.

Limited proactive capabilities in reputation Enhanced proactive reputation


management, mainly reactive responses to management using predictive sentiment
negative reviews. modeling to
prevent issues before they escalate.
Integration of sentiment analysis data with
Data used for broad marketing strategies
machine learning for adaptive and
and periodic adjustments.
personalized marketing strategies.
Customer interactions based largely on Utilizes real-time sentiment analysis to guide
historical data and predefined response live customer interaction and support,
patterns. improving immediacy and relevance.
Operational improvements are periodically Continuous operational adjustment using
reviewed and implemented based on ongoing sentiment analysis, aligning more
customer feedback analysis. closely with evolving customer preferences
and feedback.
Basic sentiment analysis on customer Advanced aspect-based sentiment analysis
reviews, focusing on overall satisfaction. to pinpoint specific product features and
their satisfaction levels.
Utilizes sentiment analysis primarily for Extends usage to real-time customer
product recommendations and review support and dynamic product
summaries. customization based
on sentiment trends.

25
Limited proactive capabilities in reputation Enhanced proactive reputation management
management, mainly reactive responses to using predictive sentiment modeling to
negative reviews. prevent issues before they escalate.

Integration of sentiment analysis data with


Data used for broad marketing strategies
machine learning for adaptive and
and periodic adjustments.
personalized marketing strategies.

Customer interactions based largely on Utilizes real-time sentiment analysis to


historical data and predefined response guide live customer interaction
patterns. and support,
improving immediacy and relevance.
Operational improvements are periodically Continuous operational adjustment using
reviewed and implemented based on ongoing sentiment analysis, aligning more
customer feedback analysis. closely with evolving customer preferences
and feedback.
Limited use of sentiment trends in inventory Application of sentiment analysis to optimize
management. inventory levels and product availability.
Reliance on manual review of sentiment Automation of data processing with AI to
analysis outputs. reduce human error and speed up insight
generation.

Feedback utilized mainly in post-purchase Feedback integrated into the product design
scenarios. phase to anticipate user needs better.
Sentiment analysis results not directly linked Sentiment data used to tailor loyalty
to customer loyalty programs. programs, enhancing customer retention.
Generic sentiment analysis models used Custom sentiment models developed for
across various product categories. different product categories to improve
accuracy.

26
Conclusion

Sentiment analysis of Amazon reviews using machine learning is a powerful application


that involves training models to understand and classify the sentiment expressed in
customer reviews. This process typically includes data preprocessing, feature extraction,
model training, and evaluation. Machine learning models, such as natural language
processing algorithms, are trained on labeled datasets to predict sentiments like positive,
negative, or neutral. The success of sentiment analysis in Amazon reviews relies on the
quality and diversity of the training data, the effectiveness of the chosen machine learning
al- gorithm, and the continuous adaptation of the model to changing language patterns.
Incorporating real-time testing at various levels, including unit testing, integration testing,
and system testing, ensures that the sentiment analysis system performs ac- curately and
reliably across different scenarios. Functional testing focuses on individual functions
within the sentiment analysis module, such as preprocessing and sentiment prediction, to
validate their correctness and robustness. Unit tests check specific components in isolation,
integration tests ensure smooth interactions between different modules, and system tests
verify the end- to-end functionality of the entire sentiment analysis system. In conclusion,
sentiment analysis of Amazon reviews using machine learning pro- vides valuable insights
into customer opinions, helping businesses understand prod- uct reception and make
informed decisions. Rigorous testing at different levels is essential to ensure the accuracy,
reliability, and adaptability of the sentiment analysis system, ultimately leading to more
meaningful and actionable results for businesses and consumers alike.

27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy