0% found this document useful (0 votes)

62 views

Sms Text Classification

This document discusses using machine learning techniques to classify SMS text messages. It proposes using clustering to group related words, then training a random forest classifier on word counts from each cluster to label messages. Finally, statistical analysis would identify prevalent issues by location to help target campaigns. Clustering provides a fast first pass at organizing data for classification. Random forest is selected for its ability to generalize well while avoiding overfitting. TF-IDF transforms text into numerical features representing word importance in documents.

Uploaded by

Arvind Narumanchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Sms Text Classification

Uploaded by

Arvind Narumanchi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

SMS TEXT CLASSIFICATION

Course project

for

Introduction to Analytical Modelling (ISYE 6501)

Table of Contents
Inspiration .......................................................................................................................3
Data collection .................................................................................................................3
Data preprocessing ...........................................................................................................4
Stop word removal.................................................................................................................. 4
Stemming and lemmatization.................................................................................................. 4
Classification ...................................................................................................................5
Clustering ............................................................................................................................... 5
Random Forest Classifier ....................................................................................................... 5
Statistical frequency calculation ............................................................................................. 6
Discussion on approach and choice of models ...................................................................6
Why clustering? ...................................................................................................................... 6
Why random forest? ............................................................................................................... 6
Why TF-IDF? ......................................................................................................................... 7
Reality check ....................................................................................................................7
Alternate approaches ........................................................................................................7
Summary..........................................................................................................................9
Works Cited....................................................................................................................10
Inspiration

Inspiration for this project comes from a case study on U-Report [1], a SMS interface
developed by UNICEF Uganda, to collect information and complaints from the locals
and direct them quickly and efficiently to relevant departments. The internet
infrastructure at the time of the development of this service was very poor and most of
the population used cellphones (not smart phones). So essentially, this was a way to
provide a forum to public to share information and submit their grievances.

The problem we are trying to solve is that of an SMS text classification. Around 200,000
messages are received on the portal every day. The objective of the program in Uganda
is to understand the data in real-time, and have issues addressed by the appropriate
departments in UNICEF in a timely manner. Given the high volume and velocity of the
data streams, manual inspection of all messages is no longer sustainable [1].

The problem and solution were floated in early 2010s (2012-2014) and much
development in text analysis has been done since then. Contemporarily speaking, I
believe, the method I illustrate below, would have had a good chance at solving this
problem.

Not only can this problem be used to provide immediate relief in relevant areas, but also
can help in identifying the major problems each geographic location and can help
UNICEF run campaigns to address those issues.

Data collection

Since this is a text classification problem, we will be collecting the following data:

1. Text data
2. Text location – geo-tags or the cell tower information

Apart from this data, since we will be making use of supervised learning algorithms in
conjunction with other models, it is necessary that we have labels to each of the data
point we collect. To solve this, we can make use of data that was in place before the
solution is implemented and use that data. We can label the texts based on the
departments they were forwarded to. We can also employ human labelling but that will
depend on the time constraint and budget of the problem. The end goal for the data
collection process is to have a labelled data point and the most economical (in terms of
time and labor) is the first approach.

3. Label for each data

Data preprocessing
Stop word removal

We will also need to preprocess this data before we feed it to our algorithm. For this we
need to keep in mind a few things about the text data. It is largely going to be
unorganized and might contain words which might be of no relevance to the context of
the text but help in understanding the text better, words like ‘the’, ‘on’, ‘a’, etc. Such
words are called stop words. Essentially a statement like ‘The pizza place on the Times-
Square is simply the best.’ is readable, but the context is given by just 3 words in a
statement comprising of 10 – ‘pizza’, ‘Times-Square’, and ‘best’. So, we need to
remove such stop words from the actual text.
Stemming and lemmatization

The second step in our preprocessing is called stemming. In this process we will attempt
to reduce the words in the text to their root form. For example, in our Pizza example,
‘simply’ ’s root form becomes simple (or simpl, based on how the algorithm is
implemented). Next, we perform lemmatization, where we reduce all the superlatives
to their base form, e.g., ‘best’ becomes ‘good’.

As an additional step we can perform spell checks and abbreviation expansion as well.
These steps can be beneficial when we are dealing with a vocabulary which is highly
prone to be linguistically altered, such as in cases when there is a character limit.
Common examples are using ‘b4’ instead of ‘before’ and so on. SMS have character
limits, so this would be a good place to enforce these methods as well. There are libraries
in python which help in performing the preprocessing steps (NLTK, and sklearn).
Classification
Clustering

We begin our classification process by enforcing a clustering algorithm on the available

corpus (of words). We try to semantically cluster all the words available to us from
across all the data available. We are expecting that the words belonging to certain
categories will cluster together; ‘water’, ‘bore’, ‘rain’ etc. can be in cluster 1 and words
like ‘blood’, ‘fluids’ etc. belong to cluster 2. We can decide on the number of clusters
with the help of public policy and administration experts. To perform this clustering,
we can make use of K-means clustering algorithm. We apply a TF-IDF [2] (term
frequency, inverse document frequency) algorithm to calculate scores of each word and
then send these scores to the K-Means algorithm to cluster data together.

Given:
• Text after preprocessing
Use:
• TF-IDF
• K-Means Clustering
To:
• Calculate score of each word and convert it into a numerical data form a
linguistic data
• Cluster similar words together
Random Forest Classifier

Once our corpus(vocabulary) is clustered, for each text we identify a count of words
from each cluster. Thus, we now have new extracted features which represent how much
each cluster represents in the received text. We use this data to train a random forest
classifier. The labels are provided with our data and thus this becomes a supervised
learning approach, where we segregate each text into different labels.

Given:
• Word count from each cluster for each text
• Label for each text
Use:
• Random forest Classifier
To:
• Classify each text into multiple relevant categories as required for administration
Statistical frequency calculation

With data classified into different categories for administration, we can go one step
further by clustering the text based on its geographic origin and with the help of
predicted labels, we can identify which issues are prevalent in the region and help
UNICEF launch targeted campaigns.

Given:
• Labelled text messages from Random Forest Classifier
• Location of origin each text
Use:
• Statistical frequency calculation
To:
• To identify which issues are impacting any given region the most.

Discussion on approach and choice of models

Why clustering?

Clustering is a heuristic approach to solving a problem. This means that it is a quick and
efficient but not an optimum solution. The use-case for the solution was to analyze data in real
time and forward it to relevant departments. Thus, the solution is more inclined towards speed
than accuracy. The other alternative, as discussed in a later section, was to use transformer
models or neural networks. These often provide an optimized solution but require a lot of
hyperparameter optimization and build time (depending on number of hidden layers and
neurons).
Why random forest?

Random forest is an aggregation of multiple decision trees. As a result, it provides us with a

good approximation of a general solution to the problem. This is one factor which plays in
favor of selecting a Random Forest over a simple Decision tree, as it helps avoid overfitting.
Why TF-IDF?

TF-IDF of a term tells us about its relative significance in the document. The intuition is that a
term is more relevant to a document if it is occurring more in one document (term frequency)
and less in others (inverse document frequency). For example, the term ‘water’ is more
occurring in a text containing complaints information about water and relatively less in all the
documents put together. The term ‘the’, however, will occur at least once in every text. Its TF
will high (but that doesn’t mean the text is about ‘the’, it is about water), but its IDF will be
very low, thus bringing its overall score down. Words with high TF-IDF score are essential our
cluster centers or topics we want to form cluster around.

Reality check
To cluster words with similar semantics might require manual intervention and tuning, i.e.,
using K-Means as a linguistic cluster mechanism is not going to be the most straight forward
approach. There are several preprocessing steps which require careful tweaking of algorithms
to suit our use-case. A more feasible approach would be to use off-the-shelf algorithms from
some of the latest libraries and linguistic research. Some of these algorithms employ modified
K-Means under the hood to meet the goal.

Even with proper clustering algorithm in place, we might only reach a certain level of
performance level unless we deploy something more sophisticated. Since most clustering are
heuristics, it is not guaranteed to give the best results. Any number between 70-80% in such
cases can be considered as peak performance (highly optimistic). However, there are
techniques in place which have proven to be much more effective (discussed below).

There might be other factors involved like non-English linguistic patterns and words. We are
not using any language-based model but spelling mistakes might throw model training off.

Alternate approaches
Neural networks have proven to be very effective in the latest research for linguistic patterns
and natural language processing. The reason is that it tries to learn and identify patterns in the
text that are hard to define with logic. Neural Network architectures like Long Short-Term
Memory (LSTM) have been industry standard for quite some time now. Addition of attention
mechanisms have also led to its wider application in NLP.
Built on the attention mechanisms from LSTM, in recent years (2017 onwards), Transformer
models are emerging as the de-facto models for developing solutions linguistic problems.
These models, however, are trained on a huge corpus and thus have huge memory and
computation requirements. A smaller variation fine-tuned for specific tasks can be deployed as
well but requires deeper understanding of the original variation of the model and the results are
also harder to explain to stakeholders as is the case for majority of Neural Network models.

Support Vector Classifiers are a good alternative to random forest classifier solution proposed
earlier. A soft classifier is going to be a better choice given there might be overlap between
certain categories of text. The reason to prefer random forest over SVC is just in the ease of
optimization of the model. Although random forest is not just one tree, it is a collection of
multiple trees, and the result is aggregated for all such trees, it helps avoiding overfitting. SVC,
on the other hand, gives one classifier, whose results are easier to explain, but optimization and
prevention of overfitting is an added step. There is no clear winner among the two for the
current scenario and it comes down to personal preference (there might be performance
differences as well, but we will not know unless both are tested), and thus it would be better to
test both approaches before deployment.
Summary

What data do we need?

1. Text from SMS

2. Location from where the Text was sent
3. Label of each text

Data Preprocessing

1. Stop word removal

2. Stemming
3. Lemmatization
4. Spell Correction

SMS classification

1. Text clustering to identify clusters using K means

2. Create bag of words from clusters
3. Random forest classifier to identify classes
4. Statistical frequency to identify geographical pain point and run campaigns

We will use the models in the following way:

1. We will cluster the words which seem similar. Each cluster talks about different
groups like water, child labor, crime, food etc.
2. For each text we count how many words from each cluster it contains.
3. We will use the extracted features to train a random forest classifier. The features
will be word count in each cluster as identified from step 2.
4. We then cluster the outcomes based on geographic location to identify
campaigns to run.
Works Cited

[1] Informs.org, “The Life-saving Power of a Text Message,” [Online]. Available:

https://www.informs.org/Impact/O.R.-Analytics-Success-Stories/The-Life-saving-
Power-of-a-Text-Message.

[2] L. d. Sa, “Medium,” [Online]. Available: https://medium.com/@lucasdesa/text-clustering-

with-k-means-a039d84a941b.

THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Introduction Tris (Oxalato) Metallates (III)
0% (1)
Introduction Tris (Oxalato) Metallates (III)
3 pages
100+ Java Interview Questions and Answers
No ratings yet
100+ Java Interview Questions and Answers
11 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Concept Mining: Fundamentals and Applications
From Everand
Concept Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Supervised Learningclassification Part3
No ratings yet
Supervised Learningclassification Part3
42 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Text Classification MLND Project Report Prasann Pandya
No ratings yet
Text Classification MLND Project Report Prasann Pandya
17 pages
Group08_BDM01_Topic-Modelling-in-Text-Classification
No ratings yet
Group08_BDM01_Topic-Modelling-in-Text-Classification
19 pages
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
NLP Unit-3
No ratings yet
NLP Unit-3
17 pages
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
From Everand
Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation
Alexandra George
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Unstructured
No ratings yet
Unstructured
37 pages
The Predictive Program Manager BOXSET VOL 1 & VOL 2
From Everand
The Predictive Program Manager BOXSET VOL 1 & VOL 2
Puneet Mathur
5/5 (1)
228 International Conference On Engineering Technologies (ICENTE'17)
No ratings yet
228 International Conference On Engineering Technologies (ICENTE'17)
3 pages
Background Research: 2.1 Machine Learning
No ratings yet
Background Research: 2.1 Machine Learning
9 pages
The Predictive Program Manager Boxset Vol 1 Vol 2: The Predictive Program Manager, #1
From Everand
The Predictive Program Manager Boxset Vol 1 Vol 2: The Predictive Program Manager, #1
Puneet Mathur
5/5 (1)
Data Science for Librarians: Transforming Information into Insight
From Everand
Data Science for Librarians: Transforming Information into Insight
Jason Miller
1/5 (1)
Chapter Veera 6
No ratings yet
Chapter Veera 6
4 pages
Disaster Response Classification Using NLP
No ratings yet
Disaster Response Classification Using NLP
24 pages
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
From Everand
Accelerated DevOps with AI, ML & RPA: Non-Programmer’s Guide to AIOPS & MLOPS
Stephen Fleming
5/5 (2)
HANTC. Edited
No ratings yet
HANTC. Edited
41 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
The Predictive Project Manager
From Everand
The Predictive Project Manager
Puneet Mathur
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
Text Classification
No ratings yet
Text Classification
32 pages
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Ijcst V3i2p17
No ratings yet
Ijcst V3i2p17
5 pages
Data Science
No ratings yet
Data Science
25 pages
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Project Proposal - Group 17-2-5
No ratings yet
Project Proposal - Group 17-2-5
4 pages
Computational Thinking Meets Student Learning: Extending the ISTE Standards
From Everand
Computational Thinking Meets Student Learning: Extending the ISTE Standards
Kiki Prottsman
No ratings yet
Unit-3
No ratings yet
Unit-3
27 pages
Technovate Poster - Template (AutoRecovered)
No ratings yet
Technovate Poster - Template (AutoRecovered)
1 page
Project Report
No ratings yet
Project Report
10 pages
Large Language Models
From Everand
Large Language Models
A. Scholtens
2/5 (2)
Fuzzy logic and Genetic Algorithm based Text Classification Twitter بحث المؤتمر
No ratings yet
Fuzzy logic and Genetic Algorithm based Text Classification Twitter بحث المؤتمر
9 pages
Deep Learning for Computer Vision with SAS: An Introduction
From Everand
Deep Learning for Computer Vision with SAS: An Introduction
Robert Blanchard
No ratings yet
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Effective Classification of Text
No ratings yet
Effective Classification of Text
6 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Semantic Translation: Fundamentals and Applications
From Everand
Semantic Translation: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure
From Everand
Machine Learning Upgrade: A Data Scientist's Guide to MLOps, LLMs, and ML Infrastructure
Kristen Kehrer
No ratings yet
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
2.5/5 (2)
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
From Everand
Neural Networks: A Practical Guide for Understanding and Programming Neural Networks and Useful Insights for Inspiring Reinvention
Steven Cooper
No ratings yet
Document and Knowledge Management Interrelationships
From Everand
Document and Knowledge Management Interrelationships
A. Afritopic
4.5/5 (2)
TnT-LLM: Text Mining at Scale with Large Language Models
No ratings yet
TnT-LLM: Text Mining at Scale with Large Language Models
17 pages
Problem Solving Analysis
From Everand
Problem Solving Analysis
Ron Rieke
No ratings yet
Introduction to Data Science Using R
From Everand
Introduction to Data Science Using R
Prema Alla
No ratings yet
Record Plus: GE Consumer & Industrial Power Protection
No ratings yet
Record Plus: GE Consumer & Industrial Power Protection
50 pages
Methods in Educational Research From Theory to Practice Second Edition Marguerite G. Lodico pdf download
No ratings yet
Methods in Educational Research From Theory to Practice Second Edition Marguerite G. Lodico pdf download
49 pages
Torque Measurement With Speed Sensors
No ratings yet
Torque Measurement With Speed Sensors
37 pages
Ceramidas
No ratings yet
Ceramidas
8 pages
Pipes Fittings
No ratings yet
Pipes Fittings
37 pages
2022 FDM Conference
No ratings yet
2022 FDM Conference
12 pages
Uncontrolled Rectifiers
No ratings yet
Uncontrolled Rectifiers
19 pages
30 Top Most Magnetic Circuit - Electrical Engineering Multiple Choice Questions and Answers
No ratings yet
30 Top Most Magnetic Circuit - Electrical Engineering Multiple Choice Questions and Answers
6 pages
Amazon CloudWatch
No ratings yet
Amazon CloudWatch
6 pages
3G UMTS Network Planning and Optimization Using Atoll - Case Study: North Morocco
100% (3)
3G UMTS Network Planning and Optimization Using Atoll - Case Study: North Morocco
89 pages
Chem Group 2
No ratings yet
Chem Group 2
32 pages
Defining Abaqus Contacts For 3-D Models in Hypermesh - Hm-4320
No ratings yet
Defining Abaqus Contacts For 3-D Models in Hypermesh - Hm-4320
12 pages
Conceptual Engineering and Pragmatism - Historical and Theoretical Perspectives
No ratings yet
Conceptual Engineering and Pragmatism - Historical and Theoretical Perspectives
9 pages
Approaches and Methods in Computational Linguistics
No ratings yet
Approaches and Methods in Computational Linguistics
18 pages
CAT5E-1KCCA-spec Test Results From Fluke Cable Certifier PDF
No ratings yet
CAT5E-1KCCA-spec Test Results From Fluke Cable Certifier PDF
1 page
Positioners - Isa - 751301 - SPB
100% (2)
Positioners - Isa - 751301 - SPB
48 pages
Design of Chimney
No ratings yet
Design of Chimney
13 pages
2019 Summer Model Answer Paper (Msbte Study Resources)
No ratings yet
2019 Summer Model Answer Paper (Msbte Study Resources)
22 pages
How To Determine Your Ring Size
No ratings yet
How To Determine Your Ring Size
3 pages
Group 8
No ratings yet
Group 8
62 pages
Asme-B18 2 3 1m
No ratings yet
Asme-B18 2 3 1m
29 pages
Me 17 Pe
No ratings yet
Me 17 Pe
2 pages
Template For Journal of Computational Design and Engineering Jcde V Dot 2 0 Updated March 2024
No ratings yet
Template For Journal of Computational Design and Engineering Jcde V Dot 2 0 Updated March 2024
14 pages
Seeing Four Dimensional Space and Beyond Using Knots 1st Edition Eiji Ogasa - Download the full ebook version right now
No ratings yet
Seeing Four Dimensional Space and Beyond Using Knots 1st Edition Eiji Ogasa - Download the full ebook version right now
72 pages
Lesson 1
No ratings yet
Lesson 1
51 pages
DAA Assignment 4
No ratings yet
DAA Assignment 4
7 pages
Manual Taller Sauer Danfoss M40 46 PDF
No ratings yet
Manual Taller Sauer Danfoss M40 46 PDF
40 pages
High Voltage Engineering 01
No ratings yet
High Voltage Engineering 01
12 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sms Text Classification

Uploaded by

Sms Text Classification

Uploaded by

SMS TEXT CLASSIFICATION

Introduction to Analytical Modelling (ISYE 6501)

3. Label for each data

We begin our classification process by enforcing a clustering algorithm on the available

Discussion on approach and choice of models

Random forest is an aggregation of multiple decision trees. As a result, it provides us with a

What data do we need?

1. Text from SMS

1. Stop word removal

1. Text clustering to identify clusters using K means

We will use the models in the following way:

[1] Informs.org, “The Life-saving Power of a Text Message,” [Online]. Available:

[2] L. d. Sa, “Medium,” [Online]. Available: https://medium.com/@lucasdesa/text-clustering-

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.