Divi Report
Divi Report
UNIVERSITY
BELAGAVI – 590018,
Karnataka INTERNSHIP
REPORT
ON
“Social Media
Sentiment Analysis”
Submitted in partial fulfilment for the award of degree(18CSI85)
BACHELOR OF ENGINEERING
IN COMPUTER SCIENCE
Submitted by:
DIVAKAR
M
1KS20CS024
Conducted at
Varcons Technologies Pvt Ltd
K.S.INSTITUTE OF TECHNOLOGY
Department of Computer Science And
Internship report 2022-2022 1
Engineering Accredited by NBA, New
Delhi
#14,Raghuvanahalli,Kanakapura Road,Bengaluru-560109
CERTIFICATE
This is to certify that the Internship titled “Social Media Sentiment Analysis” carried out
by Mr. RKH, a bonafide student of K S Institute of Technology, in partial fulfillment for
the award of Bachelor of Engineering, in Computer Science under Visvesvaraya
Technological University, Belagavi, during the year 2022-2023. It is certified that all
corrections/suggestions indicated have been incorporated in the report.
The project report has been approved as it satisfies the academic requirements in respect
of Internship prescribed for the course Internship / Professional Practice (18CSI85)
External Viva:
1)
2)
I, Divakar M, final year student of Branch, College Name - 560 082, declare
that the Internship has been successfully completed, in VARCONS
TECHNOLOGIES PVT LTD. This report is submitted in partial fulfillment of
the requirements for award of Bachelor Degree in Branch name, during the
academic year 2022-2023.
Date : 21-09-2023 :
Place : Bangalore
USN : 1KS20CS024
NAME : DIVAKAR M
We express our sincere thanks to our Principal, for providing us adequate facilities to
undertake this Internship.
We would like to thank our Head of Dept – branch code, for providing us an opportunity to
carry out Internship and for his valuable guidance and support.
We would like to thank our (Lab assistant name) Software Services for guiding us during the
period of internship.
We express our deep and profound gratitude to our guide, Guide name, Assistant/Associate
Prof, for her keen interest and encouragement at every step in completing the Internship.
We would like to thank all the faculty members of our department for the support extended
during the course of Internship.
We would like to thank the non-teaching members of our dept, forhelping us during the
Internship.
Last but not the least, we would like to thank our parents and friends without whose constant
help, the completion of Internship would have not been possible.
NAME: DIVAKAR
M USN: 1KS20CS024
Social media platforms have become integral parts of modern communication, enabling
individuals to express their thoughts, emotions, and opinions on a vast array of topics. As a result,
social media has evolved into a treasure trove of valuable data for businesses, researchers, and
policymakers. Sentiment analysis, also known as opinion mining, is a critical task in harnessing
the power of this data. This abstract provides an overview of the key aspects of social media
sentiment analysis.
This paper begins by defining sentiment analysis and its significance in today's digital age. It
explores the challenges posed by the unique characteristics of social media content, including
brevity, slang, and the use of emojis and hashtags. The methodologies commonly employed for
sentiment analysis, ranging from rule-based approaches to machine learning algorithms, are
discussed in detail, highlighting their strengths and limitations.
Moreover, the paper delves into the diverse applications of sentiment analysis across various
domains, such as marketing, politics, healthcare, and customer feedback management. It
illustrates how sentiment analysis aids in brand reputation management, stock market prediction,
political sentiment tracking, and public health monitoring.
Furthermore, the ethical considerations associated with sentiment analysis on social media are
addressed. The potential for biases and privacy concerns is explored, along with the need for
responsible data handling and transparent methodologies.
Finally, the paper outlines emerging trends and future directions in social media sentiment
analysis, including the incorporation of multimodal data (text, images, and videos), the
development of context-aware models, and the pursuit of real-time sentiment analysis.
Sl no Description Page no
1 Company Profile 9
3 Introduction 13
5 Requirement Analysis 19
7 Implementation 27
8 Snapshots 29-32
9 Conclusion 34
10 References 35
Company is a Technology Organization providing solutions for all web design and
development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET and LINQ.
Meeting the ever increasing automation requirements, Sarvamoola Software Services.
specialize in ERP, Connectivity, SEO Services, Conference Management, effective web
promotion and tailor-made software products, designing solutions best suiting clients
requirements.
we strive to be the front runner in creativity and innovation in software development through
their well-researched expertise and establish it as an out of the box software development
company in Bangalore, India. As a software development company, they translate this
software development expertise into value for their customers through their professional
solutions.
They understand that the best desired output can be achieved only by understanding the
clients demand better. At our Company we work with them clients and help them to defiine
their exact solution requirement. Sometimes even they wonder that they have completely
redefined their solution or new application requirement during the brainstorming session, and
here they position themselves as an IT solutions consulting group comprising of high caliber
consultants.
They believe that Technology when used properly can help any business to scale and achieve
new heights of success. It helps Improve its efficiency, profitability, reliability; to put itin one
sentence ” Technology helps you to Delight your Customers” and that is what we want to
achieve.
We are a Technology Organization providing solutions for all web design and development,
Researching and Publishing Papers to ensure the quality of most used ML Models, MYSQL,
PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting the ever increasing
automation requirements, Compsoft Technologies specialize in ERP, Connectivity, SEO
Services, Conference Management, effective web promotion and tailor-made software
products, designing solutions best suiting clients requirements. The organization where they
have a right mix of professionals as a stakeholders to help us serve our clients with best of
our capability and with at par industry standards.They have young, enthusiastic, passionate
and creative Professionals to develop technological innovations in the field of Mobile
technologies, Web applications as well as Business and Enterprise solution. Motto of our
organization is to “Collaborate with our clients to provide them with best Technological
solution hence creating Good Present and Better Future for our client which will bring a
cascading a positive effect in their business shape as well”. Providing a Complete suite of
technical solutions is not just our tag line, it is Our Vision for Our Clients and for Us, We
strive hard to achieve it.
• Python
• Selenium Testing
• Software Training
Introduction to ML:
Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on the
development of algorithms and statistical models that enable computers to learn and make
predictions or decisions without being explicitly programmed. It is a branch of computer science
that has gained significant attention and popularity in recent years due to its ability to solve
complex problems and make data-driven decisions across various domains.\
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Semi-Supervised Learning
Deep Learning
Problem Statement :
Machine Learning algorithm for social media sentiment analysis. Sentiment analysis in the realm
of social media is a multidisciplinary field with profound implications for various sectors.
Understanding the sentiments expressed by users in online conversations can inform decision-
making processes, enhance user experiences, and contribute to a more nuanced understanding of
public opinions in the digital age. However, it also necessitates a balanced approach, one that
carefully considers ethical concerns while embracing cutting-edge technologies and
methodologies.
SYSTEM
ANALYSIS
1. Existing System:
2. Proposed System:
Intel 4 is the successor to Intel 7 and will be incorporated into the next generation, codenamed
Meteor Lake. The new 14th-generation core processors using the latest Intel 4 process technology
are expected to be available in the second half of this year.
Operating System:Windows
Web Browser: Google, Opera, etc.
Models: Linear Regression model ,CNN model ,SVM model
DESIGN:
# Import
Dependencies import
yaml
from joblib import dump, load
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Naive Bayes Approach
from sklearn.naive_bayes import
MultinomialNB # Trees Approach
from sklearn.tree import
DecisionTreeClassifier # Ensemble Approach
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
import seaborn as sn
import matplotlib.pyplot as plt
class DiseasePrediction:
# Initialize and Load the Config File
def _init_(self, model_name=None):
# Load Config
File try:
with open('./config.yaml', 'r') as
f: self.config =
yaml.safe_load(f)
except Exception as e:
print("Error reading Config file...")
# Verbose
self.verbose =
self.config['verbose'] # Load
Training Data
self.train_features, self.train_labels, self.train_df = self._load_train_dataset()
# Load Test Data
self.test_features, self.test_labels, self.test_df = self._load_test_dataset()
# Feature Correlation in Training Data
self._feature_correlation(data_frame=self.train_df, show_fig=False)
# Model Definition
self.model_name =
model_name # Model Save Path
self.model_save_path = self.config['model_save_path']
if self.verbose:
print("Length of Training Data: ",
df_train.shape) print("Training Features: ",
train_features.shape) print("Training Labels: ",
train_labels.shape)
return train_features, train_labels, df_train
if self.verbose:
print("Length of Test Data: ",
df_test.shape) print("Test Features: ",
test_features.shape) print("Test Labels: ",
test_labels.shape)
return test_features, test_labels, df_test
# Features Correlation
def _feature_correlation(self, data_frame=None, show_fig=False):
# Get Feature Correlation
corr = data_frame.corr()
sn.heatmap(corr, square=True, annot=False,
cmap="YlGnBu") plt.title("Feature Correlation")
plt.tight_layout()
if show_fig:
plt.show()
plt.savefig('feature_correlation.png')
if self.verbose:
print("Number of Training Features: {0}\tNumber of Training Labels:
{1}".format(len(X_train), len(y_train)))
# Model Selection
# ML Model
def train_model(self):
# Get the Data
X_train, y_train, X_val, y_val =
self._train_val_split() classifier = self.select_model()
# Training the Model
classifier = classifier.fit(X_train, y_train)
# Trained Model Evaluation on Validation Dataset
confidence = classifier.score(X_val, y_val)
# Validation Data Prediction
y_pred =
classifier.predict(X_val) # Model
Validation Accuracy
accuracy = accuracy_score(y_val,
y_pred) # Model Confusion Matrix
conf_mat = confusion_matrix(y_val,
y_pred) # Model Classification Report
clf_report = classification_report(y_val,
y_pred) # Model Cross Validation Score
score = cross_val_score(classifier, X_val, y_val, cv=3)
if self.verbose:
print('\nTraining Accuracy: ', confidence) print('\
nValidation Prediction: ', y_pred) print('\
nValidation Accuracy: ', accuracy) print('\
nValidation Confusion Matrix: \n', conf_mat)
print('\nCross Validation Score: \n', score) print('\
nClassification Report: \n', clf_report)
if _name_ == "_main_":
# Model Currently Training
current_model_name = 'decision_tree'
# Instantiate the Class
dp = DiseasePrediction(model_name=current_model_name)
# Train the Model
dp.train_model()
# Get Model Performance on Test
Data test_accuracy,
classification_report =
dp.make_prediction(saved_model_name=current_model_name)
print("Model Test Accuracy: ", test_accuracy)
print("Test Data Classification Report: \n", classification_report)
ANALYSIS:
1. Data Collection:
Gathered a comprehensive dataset consisting of social media account
profiles, encompassing both genuine and fake accounts.
Ensured the dataset's diversity to encompass various social media platforms
and types of fake accounts.
2. Data Preprocessing:
Conducted data cleaning to remove duplicates, irrelevant information, and
any inconsistencies in the dataset.
Extracted relevant features from the social media account profiles,
including profile information, activity metrics, and content characteristics.
3. Feature Engineering:
Implemented feature engineering techniques to extract meaningful
information from the raw data.
Internship report 2022-2022 29
Utilized domain knowledge to select and engineer features relevant to fake
The system can be implemented only after thorough testing is done and if it is found to work
according to the specification. It involves careful planning, investigation of the current
system and it constraints on implementation, design of methods to achieve the change over
and an evaluation of change over methods a part from planning.
Two major tasks of preparing the implementation are education and training of the users and
testing of the system. The more complex the system being implemented, the more involved
will be the system analysis and design effort required just for implementation.
The implementation phase comprises of several activities. The required hardware and
software acquisition is carried out. The system may require some software to be developed.
For this, programs are written and tested. The user then changes over to his new fully tested
system and the old system is discontinued.
TESTING
The testing phase is an important part of software development. It is the Information zed
system will help in automate process of finding errors and missing operations and also a
complete verification to determine whether the objectives are met and the user requirements
are satisfied. Software testing is carried out in three steps:
1. The first includes unit testing, where in each module is tested to provide its
correctness, validity and also determine any missing operations and to verify whether
theobjectives have been met. Errors are noted down and corrected immediately.
2. Unit testing is the important and major part of the project. So errors are rectified easily
in particular module and program clarity is increased. In this project entire system is
divided into several modules and is developed individually. So unit testing is conducted
to individual modules.
3. The second step includes Integration testing. It need not be the case, the software
whose modules when run individually and showing perfect results, will also show
perfect results when run as a whole.
Fig.8.1
Fig.8.2
Fig.8.4
Fig.8.6
Fig.8.8
System security, data security and reliability are the striking features.
1. David, R. and Soni, S., 2022. Disease Prediction from Symptoms and
Doctor Consultation. Advancement of Computer Technology and its
Applications, 5(3), pp.13-17.
2. Jain, A., Chandel, A.S. and Kumar, A., Analysis of Disease Prediction for
Common Behavioral Symptoms.