Intership Report
Intership Report
On
SENTIMENTAL ANALYSIS
By
P. Yeshwanth BU21CSEN05500255
T. Priyanka BU21CSEN0500310
(DEEMED TO BE A UNIVERSITY)
SESSION:2021-20
1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GITAM
DECLARATION
We, hereby declare that the Internship report entitled “Sentimental Analysis using
machine learning” is an original work done in the Department of Computer Science
and Engineering, GITAM School of Technology, GITAM (Deemed to be University)
Bengaluru submitted in partial fulfilment of the requirements for the award of the degree
of B.Tech. in Computer Science and Engineering. The work has not been submitted to any
other college or University for the award of any degree.
Date:
Registration No(s) Name Signature
BU21CSEN0500255 P.YESHWANTH
BU21CSEN0500155 C.JAYA SURYA
BU21CSEN0500189 K.NAGA NITHIN
BU21CSEN0500310 T.PRIYANKA
2
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GITAM
CERTIFICATE
This is to certify that the mini project report entitled “SENTIMENTAL ANALYSIS
USING MACHINE LEARNING” is a bonafide record of work carried out by
P.YESHWANTH (BU21CSEN0500255), C.JAYASURYA (BU21CSEN0500155
), K.NAGA NITHIN(BU21CSEN0500189), T.PRIYANKA(BU21CSEN0500310)
submitted in partial fulfillment of the requirement for the award of the degree of
Bachelor of Technology in Computer Science and Engineering.
Dr.Arun Prasath. G
Assistant Professor
Department of CSE,
GST, Bengaluru
3
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task would
be incomplete without the mention of the people who made it possible, whose consistent
guidance and encouragement crowned our efforts with success.
We consider it our privilege to express our gratitude to all those who guided us in
completing the project.
We express our gratitude to Director Prof. Basavaraj Gundappa Katageri for having
provided uswith the golden opportunity to undertake this project work in their esteemed
organization.
P.YESHWANTH BU21CSEN0500255
T PRIYANKA BU21CSEN0500310
4
Contents
Declaration 2
Certificate 3
Acknowledgment 4
1. INTRODUCTION 6
2. Abstract 7
4.System Specification 10
5.Methdology 11
8.Evaluation Steps 19
9.Implementation 20-22
11.Conclusion 26
5
INTRODUCTION
Sentiment analysis, a branch of natural language processing (NLP) and machine learning,
plays a pivotal role in deciphering the sentiments expressed in textual data. In the context
of Amazon reviews, sentiment analysis involves leveraging advanced algorithms to discern
whether customers’ opinions are positive, negative, or neutral. As an indispensable tool for
businesses, sentiment analysis provides valuable insights into consumer perceptions,
allowing companies to gauge the success of their products and services. Amazon, as one of
the world’s largest online marketplaces, generates an immense volume of customer
reviews daily. Analyzing this vast reservoir of textual data can uncover trends, sentiments,
and critical feedback, providing businesses with actionable information to enhance their
offerings and customer experiences. The integration of machine learning in sentiment
analysis empowers models to learn and adapt to evolving language patterns, ensuring
accurate and relevant results. The effectiveness of sentiment analysis in a dynamic and
diverse marketplace like Amazon is further enhanced by the development of sophisticated
natural language processing techniques. Models are trained on vast datasets that not only
include plain text but also contextual nuances and varying linguistic styles from across the
globe. As a result, businesses can achieve a more nuanced understanding of customer
sentiment, capturing not just the binary of positive or negative, but the full spectrum of
emotions and intensities that underpin consumer feedback.
Furthermore, sentiment analysis tools can be integrated into automated systems to provide
real-time insights into customer reviews and feedback. This real-time capability allows
companies to swiftly respond to customer concerns and improve their products and
services proactively. For instance, a sudden surge in negative reviews on a newly launched
product can trigger an immediate investigation and necessary adjustments to address the
issues raised by customers. Such responsiveness not only helps in maintaining high
customer satisfaction but also builds brand loyalty and trust, which are critical in the
competitive e-commerce landscape.
6
ABSTRACT
In the realm of customer feedback analysis, this project presents a sentiment analysis tool
specifically designed for product reviews. Leveraging the power of natural language
processing (NLP) and machine learning, this tool empowers businesses to gain actionable
insights from a vast well of customer voices. The core objective is to equip businesses with
a robust solution for sentiment analysis. The methodology behind this project adheres to a
comprehensive approach. Data collection forms the initial step, encompassing a diverse set
of sources to ensure a well-rounded perspective. This data then undergoes meticulous
preprocessing to refine its quality. Here, techniques like text cleaning are employed to
eliminate irrelevant information such as typos or promotional content. The percentage
reduction in irrelevant text during this stage serves as a valuable metric, quantifying the
improvement in data quality.Once the data is prepped, it’s fed into the machine learning
engine. A specific algorithm, such as Naive Bayes with a demonstrated accuracy of 85%,
is chosen for the task. To optimize training and ensure generalizability, the data is split into
two sets: a training set (typically 80%) used to teach the model, and a validation set (20%)
used to evaluate its performance.The culmination of these efforts is a functional sentiment
analysis tool capable of accurately classifying product reviews into distinct sentiment
categories (positive, negative, or neutral). A key metric here is the tool’s classification
accuracy. Imagine the tool achieving a remarkable 90% accuracy in differentiating
positive, negative, and neutral sentiments within reviews. This speaks volumes about its
effectiveness in capturing the true voice of the customer.However, accuracy is just one
piece of the puzzle. For businesses to make real-time decisions, processing speed is
paramount. The ability of the tool to analyze a high volume of reviews quickly is
crucial.Businesses of all technical backgrounds should be able to leverage the tool’s
capabilities and gain actionable insights effortlessly. In essence, this sentiment analysis
tool empowers businesses to truly listen to their customers, understand their sentiment, and
ultimately make data-driven decisions that enhance customer satisfaction and drive
business growth.
Keywords:Feature Extraction, Lexicon-based methods, Machine Learning Algorithms,
Naive Bayes, N-grams, Natural Language Processing, Opinion Mining, Part-of-Speech
Tagging, Supervised Learning, Text Classification.
7
LITERATURE REVIEW
[1] S.E.Saad et al.,(2019) proposed that to aim for giving a complete tweet sentiment
analysis on the basis of ordinal regression with machine learning algorithms. The
suggested model included pre-processing tweets as first step and with the feature
extraction model, an
effective feature was generated. The methods such as SVR, RF, Multinomial logistic
regression (SoftMax), and DTs were employed for classifying the sentiment analysis.
Moreover, twitter dataset was used for experimenting the suggested model. The test results
have shown that the suggested model has attained the best accuracy, and also DTs were
performed well when compared over other methods.
[2] Afzaal et al., (2019) developed a novel approach to aspect-based sentiment
classification. The approach focused on recognizing features precisely, achieving the best
classification accuracy. The developed scheme was implemented as a mobile application to
assist tourists in identifying the best hotels. Real-world datasets were used to analyze the
proposed model, demonstrating its effectiveness in both recognition and classification.
[3] Feizollah et al., (2019) In the same year,concentrated on tweets related to halal
products, specifically halal cosmetics and halal tourism. Twitter search function was
utilized to extract information, and a new model was employed for data filtering. Deep
learning
models, including RNN, CNN, and LSTM, were used to compute and evaluate tweets,
with the combination of LSTM and CNN achieving the best accuracy.
[4] Mukhtar et al., (2018) performed sentiment analysis on Urdu blogs from various
domains using Supervised Machine Learning and Lexicon-based models. The Lexicon-
based models employed a well-performing Urdu sentiment analyzer and an Urdu
Sentiment
Lexicon. Results showed that the Lexicon-based model outperformed the supervised
machine learning algorithm.
[5] Kumar et al.,(2020)In 2020, addressed a hybrid deep learning approach named
ConVNet-SVMBoVW for predicting fine-grained sentiment from real-time data. An
aggregation model was developed to measure hybrid polarity, utilizing SVM for training
the
Bag of Visual Words (BoVW) to forecast sentiment in visual content. The suggested
ConVNet-SVMBoVW was found to be outperformed by conventional models.
[6] Abdi et al., (2018) reviewed a machine learning technique for summarizing user
8
opinions from reviews. The method merged multiple features into a unique set for
modeling an accurate classification model. Performance inves- tigation was conducted
for feature
9
selection models and classifiers, revealing that the combination of Information Gain (IG)
and SVM-based classification improved performance. [7] Ray and Chakrabarti, (2019),
designed a deep learning algorithm for extracting features from text for sentiment analysis.
A seven-layer Deep CNN was employed for tagging features in opinionated sentences. The
authors combined deep learning methods with rule-based models, achieving the best
accuracy.
[8] Zhao et al., (2019) Also in 2019,offered a novel image-text consistency-driven
multi- modal sentiment evaluation model. The model explored the correlation between
text and
image, implementing a multi-modal adaptive sentiment analysis model. The suggested
model, integrating visual features using the SentiBank model, outperformed traditional
models.
[9] Park et al., (2019), addressed a semi-supervised sentiment- discriminative objective
for handling partial sentiment data in documents. The sug- gested model not only reflected
partial data but also secured local structures obtained from real data, performing well on
real-time data.
10
System Specification
Hardware Specification
Software Specification
Anaconda Prompt
Anaconda prompt is a type of command line interface which explicitly deals with the ML(
MachineLearning) modules.And navigator is available in all the Windows,Linux and
MacOS.The anaconda prompt has many number of IDE’s which make the coding easier.
The UI can also be implemented in python.
Jupyter
It’s like an open source web application that allows us to share and create the documents
which contains the live code, equations, visualizations and narrative text. It can be used for
data cleaning and transformation, numerical simulation, statistical modeling, data
11
METHODOLOGY
This above figure delineates the key components involved in the sentiment analysis of
Amazon reviews using machine learning with Python. It includes crucial elements such as
Data Retrieval, Preprocessing, Feature Extraction, Machine Learning Model Training,
Sentiment Classification, and Results Presentation. The diagram serves as an illustrative
guide, elucidating the orchestrated flow of operations that collectively enable the
extraction and understanding of sentiments expressed in Amazon reviews, facilitating
informed decision-making and strategic enhancements for businesses. This visual
representation underscores the intricacy and synergy of each component in the robust
architecture supporting the sentiment analysis process.
12
Design Phase
Data Flow Diagram
The above (DFD) for sentiment analysis of Amazon reviews through machine learning
using Python delineates the flow of data and processes essential to the analytical journey.
The diagram encapsulates stages like Data Retrieval, Text Preprocessing, Feature
Extraction.
13
Use Case Diagram
The above Use Case 4.3 would showcase essential interactions, including actors like the
User and the Sentiment Analysis System, along with functionalities such as Data Retrieval,
Preprocessing, Machine Learning Model Training, and Results Presentation.
14
Class Diagram
15
Sequence Diagram
This Sequence Diagram 4.5 provides a comprehensive overview of the interactions and
processes involved in the sentiment analysis of Amazon reviews using machine learning
with Python. It visually communicates the flow of information, emphasizing the
integration of user input, machine learning algorithms, and the Amazon platform in the
sentiment analysis workflow.
16
Module Description
Data Collection and Preprocessing
Objective: To gather a balanced dataset of product reviews for model training, we employ
web scraping techniques across various online platforms to ensure diversity in product
categories and opinions. Web scraping involves extracting data from websites hosting
reviews, such as e-commerce platforms, review aggregators, and social media forums.
Once collected, natural language processing (NLP) libraries are utilized for data
preprocessing. This involves several key steps: text cleaning, which removes noise like
HTML tags and punctuation; removing stopwords, common words that don’t carry
significant meaning for sentiment analysis; standardizing text data by converting it to a
uniform format, such as lowercase; tokenization, breaking down text into smaller units for
analysis; and lemmatization or stemming, reducing words to their base forms to improve
model generalization. The goal of these preprocessing steps is to prepare the raw text data
for model training by ensuring it is clean, consistent, and representative of diverse
sentiments. This balanced dataset serves as the foundation for developing a robust
sentiment analysis model capable of accurately classifying product reviews.
17
Data Collection and Preprocessing
Model Selection and Training
Objective: The objective is to identify the most suitable machine learning model for
sentiment classification. This involves evaluating various models such as Support Vector
Machines (SVM), Decision Trees, and Neural Networks. Each model is assessed based on
performance metrics like accuracy, precision, recall, and F1-score. SVMs excel in creating
a clear margin of separation between classes, Decision Trees offer transparent decision-
making processes, and Neural Networks leverage complex architectures to capture intricate
patterns.
After rigorous evaluation, the model demonstrating the highest performance across metrics
is selected. This selection considers not only accuracy but also factors like computational
efficiency and interpretability. Once chosen, the selected model undergoes training using
preprocessed and feature-extracted data. This training process ensures that the model learns
to effectively distinguish between different sentiment classes in product reviews and can
generalize well to unseen data. By following this methodology, we ensure the adoption of
an optimal model for sentiment analysis, capable of accurately discerning sentiments
expressed in diverse product reviews while meeting requirements for efficiency and
interpretability.
18
Model Selection and Training
Evaluation Steps
Objective: Sophisticated sentiment analysis tools as part of its insights and optimization
engine to enhance customer experience and business operations. This technology harnesses
machine learning algorithms to interpret and categorize the emotions expressed in
customer reviews, feedback, and social media mentions. By analyzing these sentiments,
Amazon can glean valuable insights into customer satisfaction, product reception, and
market trends. This data-driven approach allows Amazon to optimize its product
recommendations, tailor marketing strategies, and address customer service issues more
effectively. Additionally, sentiment analysis helps in improving product features and
design by identifying specific aspects that customers care about. By continuously refining
this engine, Amazon not only enhances its operational efficiency but also strengthens its
customer relationships by proactively responding to their emotional feedback and adjusting
its offerings accordingly
19
Evaluation Steps
Collecting data for sentiment analysis on Amazon reviews using machine learning involves
gathering a dataset that comprises textual reviews
Beginning with data collection from diverse sources, such as customer reviews on
platforms like Amazon. The gathered textual data undergoes meticulous preprocessing,
including tasks like text cleaning and tokenization. Following preprocessing, machine
learning algorithms are trained on labeled datasets to recognize patterns and sentiments
within the text. The trained models are then applied to analyze new data, categorizing
sentiments into positive, negative, or neutral, providing valuable insights for decision-
making.
Common algorithms include Support Vector Machines (SVM), Naive Bayes, and
Recurrent Neural Networks (RNNs). SVM excels in binary classification tasks, Naive
Bayes is computationally efficient and effec- tive for text classification, while RNNs, with
their sequential learning, are adept at capturing contextual nuances in sentiment. The
selection depends on the dataset characteristics and the desired balance between
interpretability and complexity in sentiment analysis models.
20
IMPLEMENTATION
Input and Output
Input Design
21
Output Design
The output design revolves around presenting meaningful and interpretable results. The
model’s predictions are typically categorized into sentiment classes (positive, negative, or
neutral). Visual representations, such as confusion matrices or precision-recall curves, aid
in evaluating model performance. Additionally, providing a clear summary or visualization
of sentiment distribution in the analyzed data enhances the interpretability of the sentiment
analysis results. The output design aims to convey insights derived from the model’s
predictions in a comprehensible and actionable format. Balancing the trade-off between
model complexity and interpretability is a critical consideration, ensuring the sentiment
analysis model aligns with specific business requirements and user needs. Additionally,
ongoing model monitoring and adaptation to evolving language trends are essential for
maintaining the effectiveness of sentiment analysis systems over time.
22
23
RESULTS AND DISCUSSIONS
In evaluating the efficiency of sentiment analysis, the quality of the dataset plays a pivotal
role. A larger and more diverse dataset enhances the model’s ability to generalize and
comprehend diverse language patterns, while high-quality labeled data is essential for
training a reliable sentiment analysis model. Additionally, the efficacy of preprocessing
techniques, such as effective text cleaning and proper tokenization, contributes
significantly to model performance.
Feature extraction methods, including the representation of text through word embeddings
or TF-IDF, further impact the model’s capability to cap- ture semantic meaning. The
choice of a machine learning algorithm, training time considerations, and real-time
processing factors, such as latency are critical elements.
Evaluation metrics, encompassing accuracy, precision, recall, and F1 score, provide
insights into the model’s performance, while adaptability to dynamic language patterns and
domain adaptation is crucial for long-term efficiency. Incorporating feedback mechanisms
for model updates and a human-in-the-loop approach, along with vigilant monitoring of
resource consumption, robustness, and generalization, contribute to a comprehensive
assessment of sentiment analysis efficiency. Striking a balance between accuracy, speed,
adaptability, and resource utilization is vital, ne- cessitating regular monitoring, evaluation,
and potential retraining of the model to enhance and maintain its efficiency over time.
24
Comparison of Existing and Proposed System
Utilizes sentiment analysis primarily for Extends usage to real-time customer support
product recommendations and review and dynamic product customization based
summaries. on sentiment trends.
25
Limited proactive capabilities in reputation Enhanced proactive reputation management
management, mainly reactive responses to using predictive sentiment modeling to
negative reviews. prevent issues before they escalate.
Feedback utilized mainly in post-purchase Feedback integrated into the product design
scenarios. phase to anticipate user needs better.
Sentiment analysis results not directly linked Sentiment data used to tailor loyalty
to customer loyalty programs. programs, enhancing customer retention.
Generic sentiment analysis models used Custom sentiment models developed for
across various product categories. different product categories to improve
accuracy.
26
Conclusion
27