0% found this document useful (0 votes)
21 views6 pages

term paper of nlp

this is the term paper of of nlp project of final term
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

term paper of nlp

this is the term paper of of nlp project of final term
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

WAllytics - WhatsApp Chat Analysis

Soni Singh
Department of Lovely Professional University, Deeptimaan Krishna Jadaun Savtanter Yadav
Jalandhar-Delhi G.T. Road, Phagwara 144411, 12213381 12205713
Punjab, India Lovely Professional University
Lovely professional University
Phagwada,, Phagwada
Abstract With the growing importance of digital
communication, understanding and analyzing chat data has
become pivotal. This paper introduces WAllytics, an
innovative app designed to analyze WhatsApp chat data,
offering insights through various analytical dimensions. By
combining exploratory data analysis (EDA), sentiment
analysis, topic modeling, emoji usage, and forecasting
techniques, WAllytics provides a multifaceted approach to
understanding user interactions. This research explores the
methodology, features, implementation, and potential
applications of the app, underscoring its significance in
modern communication
analysis.Keywords: Machine Learning, WAllytics - WhatsApp
Chat Analysis App
Introduction
In today’s interconnected world, instant messaging platforms
play a pivotal role in how we communicate. Among these,
WhatsApp stands out as one of the most widely used Fig. 1 About the app
communication tools, catering to both personal and
professional interactions. With billions of users and vast Moreover, the platform places a strong emphasis on data security
volumes of messages exchanged daily, WhatsApp generates and user privacy. WAllytics ensures that all data processing is
an immense amount of data that, if harnessed effectively, can conducted with robust encryption standards and user consent,
yield valuable insights. making it a reliable tool in an era where data privacy is
Recognizing the potential within these vast troves of data, paramount. This commitment to security builds trust and allows
WAllytics was developed as a solution tailored to meet the users to explore analytics with confidence, knowing their
needs of individuals and businesses seeking deeper information is safeguarded.
understanding and actionable intelligence from their chat
histories. WAllytics provides an advanced toolkit designed
I. LITERATURE REVIEW
for comprehensive chat data analysis, allowing users to move
beyond mere text exchanges and delve into patterns, trends,
WhatsApp, with over 2 billion active users globally, has
and metrics that drive better decision-making and foster
become a major platform for personal and professional
meaningful engagements.
communication, generating vast amounts of chat data daily. This
Whether for personal interest, customer service
data presents both opportunities and challenges for analysis due to
enhancement, or strategic business initiatives, WAllytics
its unstructured nature and privacy considerations. The analysis of
equips users with the tools needed to transform chat data into
WhatsApp chat data has been the subject of growing research,
an insightful asset. Through a combination of user-friendly
focusing on extracting meaningful insights regarding
design and powerful analytical capabilities, WAllytics opens
communication patterns, sentiment, and trends. However, few
up new possibilities for understanding and leveraging
studies have developed comprehensive tools for in-depth analysis
WhatsApp communication to its fullest potential.
of WhatsApp chat data, particularly in areas such as sentiment
WAllytics is designed to be versatile and adaptable, meeting
the varying needs of different user groups. For individual analysis, topic modeling, and forecasting trends. Exploratory Data
users, the platform can provide a unique perspective on Analysis (EDA) is a crucial step in understanding large datasets,
personal communication habits, helping to identify trends and it has been widely applied to social media data to examine
such as peak times for conversation or commonly discussed user activity and communication behavior. Researchers have used
topics with friends and family. For businesses, it offers a tools like Pandas, Matplotlib, and Seaborn for visualizing data,
strategic edge by revealing customer preferences. analyzing message frequency, and understanding patterns in
communication. In WhatsApp, EDA can help uncover trends such
as peak message periods and the distribution of messages among
users. Studies on platforms like Facebook Messenger have shown
that user engagement varies with time, and similar trends can to provide users with actionable insights from their chat data.
be observed in WhatsApp chats, providing insights into
when users are most active.

Topic modeling techniques, particularly Latent Dirichlet


Allocation (LDA), have been used extensively to uncover
the underlying themes in large collections of text data. LDA
assumes that each message in a conversation is a mixture of
several topics and can help identify the most discussed
themes. This technique has been applied to social media
Fig. 2 Upload Whatsapp Export Chat
conversations to uncover trends and topics of interest. In the
context of WhatsApp, topic modeling can reveal the central The approach combines the power of data manipulation, natural
themes of group chats or individual conversations, such as language processing (NLP), and machine learning algorithms to
"work," "family," or "social events." By using libraries provide users with actionable insights from their chat data.
like Gensim, topic modeling can be applied to WhatsApp
data to provide insights into the key subjects of A. Data Collection
communication over time.
For the purpose of this study, WhatsApp chat data is collected
Another area of growing interest is emoji analysis, which through exported chat files, which can be obtained directly from
plays a significant role in conveying sentiment and emotional the WhatsApp application. WhatsApp allows users to export their
tone in text-based communication. Emojis have become a individual or group chat data in a .txt format, including all
central element of digital communication, especially on messages, timestamps, sender names, and emojis. This exported
platforms like WhatsApp. Research on emoji usage has file serves as the primary data source for analysis. The chat data is
demonstrated that they help to express emotions and imported into the app using Python’s file handling capabilities.
intentions that may not be easily conveyed through text
alone. Tools like the emoji library and WordCloud have Given the structure of WhatsApp export files, the data is cleaned
been employed to visualize patterns in emoji usage and word and pre-processed to extract relevant information, such as
frequency, allowing users to gain insights into the emotional timestamps, message contents, sender names, and emojis. These
content of their chats. Similar analyses have been applied to elements are structured in a DataFrame for further analysis.
other messaging platforms, such as Facebook and Twitter, to
understand how emojis are used to enhance communication. B. Software and Libraries
Alerts and notifications based on keywords, sentiment shifts,
or specific topics are emerging features in messaging app The WAllytics app was developed using the following software
analysis. Although much of the research has focused on tools and libraries:
automatic alert systems for emails or social media,  Streamlit: A Python framework for building interactive
integrating real-time alerts into WhatsApp chat analysis can web applications, used to provide a user-friendly
offer users actionable insights. For example, setting up alerts interface for users to upload their WhatsApp chat data
for particular keywords or sentiment shifts can help users and explore the results of the analysis.
stay informed about important discussions or monitor  Pandas: A data manipulation library used for cleaning
emotional well-being. The ability to track specific keywords and structuring the raw chat data into a usable format for
or sentiments in real-time can significantly enhance the user analysis.
experience and provide valuable information about ongoing  Matplotlib and Seaborn: Libraries used for data
conversations. visualization, enabling the generation of graphs and
charts to explore the message frequency, sentiment
II. MATERIALS AND METHODS trends, and emoji usage in the chat data.
 NLTK and TextBlob: Natural language processing
The WAllytics app is designed to analyze and extract libraries employed for text preprocessing and sentiment
valuable insights from WhatsApp chat data, leveraging analysis. NLTK is used for tokenization and stop-word
various data science and machine learning techniques. This removal, while TextBlob is utilized for sentiment
section outlines the materials used to develop the app and the classification of messages.
methods employed for analyzing WhatsApp data. The  Gensim: A library for topic modeling and unsupervised
approach combines the power of data manipulation, natural machine learning, specifically Latent Dirichlet Allocation
language processing (NLP), and machine learning algorithms (LDA), which identifies the main topics discussed in the
chat.
 WordCloud: A tool to create word clouds, allowing for
the visualization of the most frequently used words in the
chat messages.
 Scikit-learn: A library used for building machine  Tokenization: The text data is tokenized
learning models for forecasting and clustering tasks, using NLTK to split the messages into individual words
including message frequency prediction. or tokens, removing stopwords, punctuation, and non-
 Transformers: A library for advanced machine relevant characters.
learning models such as BERT and GPT for more
complex tasks, including sentiment analysis and IV EXPERIMENTAL RESULTS
keyword extraction. The WAllytics app was evaluated using a diverse set of
WhatsApp chat datasets to assess its ability to extract meaningful
insights through various analytical techniques. The datasets
consisted of both individual and group chats, ranging from
personal to work-related conversations, with a message count
ranging from 500 to 10,000 messages. The chats were exported
in .txt format and preprocessed to clean and structure the data for
analysis.

Cases

Fig. 3 All the DataFrame

Fig. 5 Message per hours

The initial phase of the analysis involved Exploratory Data


Analysis (EDA), which provided an overview of the messaging
patterns within the chat data. One of the key findings from the
message frequency analysis was the identification of time-based
patterns in user activity. A time series plot revealed that message
activity peaked during specific hours, such as between 9 AM to 11
AM and 8 PM to 10 PM, indicating typical active periods of
communication. Additionally, the heatmap analysis of message
Fig. 4 EDA frequency by the day of the week demonstrated increased
C. Data Preprocessing messaging activity during weekends, which was particularly
The raw WhatsApp chat data in .txt format is first loaded evident in group chats. The distribution of messages among
into the application using Pandas. A series of data cleaning participants also highlighted the central figures in the
conversations, with the most active participant contributing
steps are applied to prepare the data for analysis:The aspects
significantly more messages compared to others, especially in
of the model are defined as-
group chats.
 Text Extraction: The chat messages are parsed to Sentiment analysis conducted on the chat data showed varying
extract the message content, sender names, and emotional tones across different datasets. By applying sentiment
timestamps. Regex (regular expressions) are used analysis tools such as TextBlob and VADER, the messages were
to identify message structures such as sender classified into positive, neutral, or negative categories. The results
names, dates, and times. revealed that a significant portion of messages (around 60%) were
 Handling Missing Data: Messages with missing neutral, while positive and negative sentiments accounted for 30%
information or formatting errors are removed or and 10%, respectively. The analysis also indicated fluctuations in
corrected during the cleaning phase. sentiment over time, with certain time periods exhibiting spikes in
 Time Conversion: Timestamps are converted into negative or positive emotions. For example, negative sentiment
datetime objects for easier manipulation and increased during late evening hours, possibly due to frustrations
analysis, enabling the examination of message expressed in work-related group chats, whereas positive sentiment
patterns over time. surged during discussions of social events or celebrations.
 Emojis: Emojis used in the chat are extracted using Furthermore, keywords associated with specific sentiments, such
the emoji library, which decodes the emoji as "happy" for positive sentiment and "work" for negative
characters and counts their usage.
sentiment, were identified, providing further insights into
emotional shifts linked to the topics being discussed.
Topic modeling using Latent Dirichlet Allocation
(LDA) revealed the dominant themes in the chat data. In
work-related conversations, the most common topics
revolved around project deadlines, team meetings, and work-
related collaboration. In contrast, family-oriented
conversations focused on topics such as vacations, health,
Cases
and daily family activities. The analysis showed that certain
topics gained prominence over time, with work-related
discussions peaking on weekdays, while social or family
topics were more frequent on weekends. Visualizations of
the topic distributions over time provided a clear
understanding of the evolving themes within the
conversations.

Month and Year

Fig. 7 Deaths Cases Prediction using ARIMA Model

V RESULTS ANALYSIS
The analysis conducted using the WAllytics app provided a
comprehensive understanding of WhatsApp chat data,
Cases
showcasing the app's ability to extract valuable insights through a
range of analytical tools. The Exploratory Data Analysis (EDA)
results revealed significant patterns in user activity and
engagement. Message frequency analysis showed distinct peaks
during specific hours, particularly between 9 AM to 11 AM and 8
PM to 10 PM, which are active periods tied to work and social
interactions. Group chat data indicated that certain participants
were more dominant, contributing a larger share of messages and
acting as key communicators. This insight can be useful for team
management and identifying influential members in collaborative
groups. Additionally, the time-based heatmap highlighted that
Fig. 6 Emoji and Word Analysis weekends generally had higher message activity, aligning with
expectations as people often have more time to communicate
during these days.
Additionally, the emoji analysis revealed interesting patterns
in emotional expression. Emojis such as "😂" (face with
tears of joy), "😊" (smiling face), and "❤️" (red heart) were
frequently used across various datasets, with certain emojis
Time Interval
being strongly correlated with positive or negative
sentiments. For instance, heart emojis were often linked to
positive emotions, while sad or crying emojis were
associated with negative sentiments. The word cloud
analysis also provided valuable insights into the most
frequently used words in the conversations. In work-related
chats, words like "project," "meeting," and "deadline" were
dominant, while in personal conversations, words related to
social events and family activities appeared more often.
Such patterns provide meaningful context for understanding
emotional shifts within chat conversations.
Fig. 8 Topic Modeling

Topic analysis using Latent Dirichlet Allocation


(LDA) revealed the app's capability to uncover common
themes in the data. Work-related topics included project
deadlines and meetings, while personal and family
discussions focused on daily activities, health, and social
plans. The distribution of these topics varied by day, with
work discussions peaking on weekdays and personal topics
becoming more prevalent during weekends. This shift
demonstrated how conversation themes change based on the
day and context, providing deeper behavioral insights.
Sentiment analysis offered further insight into the
emotional tone of conversations. The data showed that
approximately 60% of messages were neutral, indicating
that most communication was factual or without strong
emotional cues. Positive sentiment made up about 30%
of messages, reflecting moments of happiness and
supportive interaction, while negative sentiment
accounted for 10%. Notably, spikes in negative sentiment Fig. 10 Message Frequency
during late evenings or around specific dates pointed to
moments of stress or disagreement, which could be The analysis of emojis added another layer of understanding to
linked to work or interpersonal conflicts. Such patterns communication patterns. Frequently used emojis, such as “😂”
provide meaningful context for understanding emotional (face with tears of joy), “☺️” (smiling face), and “❤️” (red heart),
shifts within chat conversations. were associated with positive emotions. Conversely, negative
Overall, the results confirmed that the WAllytics app is emotions were linked to emojis like “😢” (crying face). This usage
an effective tool for comprehensively analyzing pattern reflected how emojis contribute to expressing emotions
WhatsApp chats, helping users uncover trends, emotional that may not be explicitly stated in the text. Overall, the results
nuances, and behavioral patterns in their communication. confirmed that the WAllytics app is an effective tool for
comprehensively analyzing WhatsApp chats, helping users
uncover trends, emotional nuances, and behavioral patterns in
their communication.

Fig. 9 Sentiment Analysis

Sentiment analysis offered further insight into the emotional


tone of conversations. The data showed that approximately
60% of messages were neutral, indicating that most
communication was factual or without strong emotional Fig. 11 Mood meter
cues. Positive sentiment made up about 30% of messages,
reflecting moments of happiness and supportive interaction, The mood meter feature within the WAllytics app provided a
while negative sentiment accounted for 10%. Notably, nuanced view of emotional trends in WhatsApp conversations by
spikes in negative sentiment during late evenings or around combining sentiment analysis results with visual representations.
specific dates pointed to moments of stress or disagreement, This tool allowed users to observe fluctuations in mood over
which could be linked to work or interpersonal conflicts. time, mapping out how conversations shifted between positive,
neutral, and negative sentiments throughout the day or across
specific timeframes. By visualizing emotional peaks and [8] Ekman, P. (1992). An argument for basic emotions. Cognition &
Emotion, 6(3-4), 169-200.
troughs, the mood meter identified periods of high positive [9] Balahur, A., Hermida, J. M., & Montoyo, A. (2012). Building and
interactions, such as celebrations or supportive exchanges, exploiting emotinet, a knowledge base for emotion detection based on
as well as negative sentiment spikes that may indicate the appraisal theory model. IEEE Transactions on Affective
conflicts, stress, or frustration. Such insights are invaluable Computing, 3(1), 88-101.
[10] Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment
for understanding the overall emotional climate of a chat analysis and opinion mining. Proceedings of the International
and can aid in managing group dynamics, enhancing Conference on Language Resources and Evaluation, 1320-1326.
personal communication strategies, and fostering more [11] disease 2019, (COVID-19)”, J. Gen. Intern. Med., vol. 35 pp. 1545–
positive interactions. 1549, 2020

VI CONCLUSION [12] Rajan Gupta and Saibal K. Pal. 2020. “Trend analysis and forecasting
of COVID-19 outbreak in India”, Retrieved from
https://www.medrxiv.org/content/10.1101/2020.03.26.2004451v1.
The results of this research were derived from training data
up to and including Jan 2022, to Jul 2021. Additionally,
based on the current trend, there will undoubtedly be an
increase in the number of instances. According to
established medical standards, health professionals, and
others included in contributing critical services must be
guarded. The number of cases may rise exponentially as a
result of future community spreading brought on by
negligence on the part of both individuals and groups. Since
the peak has not yet arrived, the Indian government must
exercise increased caution and strictly enforce its
regulations. Additionally, there must be a vigorous increase
in the availability of medical facilities throughout the
nation. For data that is collected on a weekly or biweekly
basis, an instinctive system can be created in the future to
retrieve data often and forecast the cases. Government
agencies and medical facilities may keep an eye on demand
and the level of care and isolation needed for new patients
in this way. Data scientists from other regions can use this
study to compare the performance of different ML models
on the Indian dataset. Administrators and healthcare
professionals can use this study to evaluate the condition in
the coming future.
REFERENCES

[1] Bello-Orgaz, G., Jung, J. J., & Camacho, D. (2016). Social big
data: Recent achievements and new challenges. Information
Fusion, 28, 45-59.
[2] Kleinberg, B., van der Vegt, I., & Gill, P. (2020). The temporal
evolution of a hate network: How hate spreads online. Journal
of Computational Social Science, 3(1), 123-135.
[3] Rachuri, K. K., Musolesi, M., & Mascolo, C. (2011).
EmotionSense: A mobile phones-based adaptive platform for
experimental social psychology research. Proceedings of the
12th ACM international conference on Ubiquitous computing,
281-290.
[4] Gupta, P., Joshi, R., & Pawar, V. (2020). Sentiment analysis in
Hindi using deep learning. Journal of King Saud University-
Computer and Information Sciences, 32(1), 90-100.
[5] Kouloumpis, E., Wilson, T., & Moore, J. (2011). Twitter
sentiment analysis: The good the bad and the OMG!
Proceedings of the Fifth International AAAI Conference on
Weblogs and Social Media, 538-541.
[6] Kumar, A., & Sebastian, T. M. (2012). Sentiment analysis on
Twitter. IJCSI International Journal of Computer Science
Issues, 9(3), 372-378.
[7] D’Andrea, E., Ferri, F., Grifoni, P., & Guzzo, T. (2015).
Approaches, tools and applications for sentiment analysis
implementation. International Journal of Computer
Applications, 125(3), 26-33.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy