0% found this document useful (0 votes)

477 views8 pages

Case Study On Text Mining

This document discusses text mining techniques and their applications. It begins with an introduction to text mining, explaining that it is used to extract meaningful patterns and knowledge from large amounts of unstructured text data. It then describes several common text mining techniques like information extraction, categorization, clustering, and summarization. The document also discusses some of the key differences between traditional data mining and text mining. Finally, it provides more detailed explanations of several text mining techniques like information extraction, information retrieval, categorization, clustering, and summarization.

Uploaded by

Shanthi Ganesan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

477 views8 pages

Case Study On Text Mining

Uploaded by

Shanthi Ganesan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

CASE STUDY ON TEXT MINING: APPLICATION,

ISSUES AND CHALLENGES

ABSTRACT:
Data Mining is the method of retrieving meaningful information from the ocean of
data. The data are in the form of text, audio, video and images. Obtaining information from
these data is not an easy task. It requires different techniques to extract information. Text
mining is one of them.

Text mining is the process of extracting information, pattern or knowledge from

different text documents available on different resources. Every day million bytes data are
added in exiting data. Most of data stored in text documents which are unstructured data and
cannot be used for any processing to extract useful information. So different techniques such
as classification, clustering and information extraction are applied for this purpose. There is a
number of text categorization techniques are developed. Some of them are based on
supervised and some of them unsupervised manner of document arrangement. In this paper
focus is based on Text Mining, different text mining techniques and its application.

INTRODUCTION

The data size in the computer world increases by exponential rates day by day. Every
day million Megabytes data are added in the exiting data. Almost all types of institutions,
organizations and commercial industries store their data in electronically digital form. A
standard amount of text circulates on the Internet in the form of digital libraries, repositories
and other textual information such as blogs, social networks and e-mails. It is a very difficult
task to determining the trends and patterns appropriate to extract appropriate valuable
knowledge from this large volume of data.

Traditional data mining tools cannot handle text data because it takes time and effort
to extract information. Text Mining is a process to extract meaningful and interesting models
for exploring knowledge from textual data sources. Text Mining is a multidisciplinary field
that is based on Data Mining, Information Retrieval, Machine Learning, Statistics, and
Computational Linguistics. Different text mining techniques such as summarization,
classification, clustering, etc., can be applied to extract knowledge. Text extraction processes
the text in natural language that is stored in semi-structured and unstructured format. The
Text Exploring techniques are continuously applied in the industry, the university, the web
applications, internet and other fields. Application area such as search engines, filter emails,
analysis of product suggestions, detection of fraud and social media analysis uses text mining
for the exploration of opinion, characteristics extraction, feeling, predictive analysis and
trend.
Generic process of text mining performs the following steps :
❖ Unstructured data collection from different sources, available in different file formats such
as plain text, web pages, pdf files, etc.
❖ Pre-processing and cleaning operations are performed to detect and remove anomalies.
Cleaning process be sure to capture the real essence of the available text and is performed
to delete the stop words (process of identifying the root of certain words) and index the
data.
❖ Processing and controlling operations are applied to audit and further clean the data set by
automatic processing.
❖ Pattern analysis is implemented by Management Information System (MIS).
❖ Information processed in the above steps is used to extract valuable and relevant
information for effective and timely decision making and trend analysis.

Extraction of information from different document is a tedious and difficult task. The selection of
suitable technique for mining text reduces the time and to find the relevant patterns for analysis and
decision making. The main goal of this paper is to analyze different text mining techniques which help
to perform text analytics effectively and efficiently from large amount of data.the issues that arise
during text mining process are identified.
DIFFERENCE BETWEEN DATA MINING AND TEXT MINING :
DATA MINING TEXT MINING

Overview A range of functions to A range of functions to

search for patterns and turn unstructured textual data
relationships in structured into structured information to
data enable data analysis

Data type Structured data from Unstructured textual data

large datasets found in found in emails, documents,
systems such databases, presentations, videos, file
spreadsheets, ERP, CRM and shares, social media and the
accounting applications Internet.

Data retrieval Structured data is Unstructured textual data

homogenous and organized comes in many different
making it easy to retrieve formats and content types
located in a more diverse
range of applications and
systems.

Data preparation Structured data is formal and Linguistic and statistical

formatted facilitating the techniques – including NLP
process of ingesting data into keywording and metatagging
analytical models – must be applied to turn
unstructured into usable
structured data.

Need for taxonomy There is no need to create a As the unstructured text

over-riding taxonomy for text comes in many different
mining forms and formats, there
needs to be an over-riding
taxonomy for the data so that
it can be organized into a
common framework.

TEXT MINING TECHNIQUES:

To teach machine how to analyze, understand and generate text, technologies are
produced by natural language processing. The technologies like information extraction,
summarization, categorization, clustering and information visualization, are used in the text
mining process.
1. Information Extraction

Information extraction refers to the process of extracting meaningful

information from vast chunks of textual data.
This text mining technique focuses on identifying the extraction of entities,
attributes, and their relationships from semi-structured or unstructured texts.
Whatever information is extracted is then stored in a database for future
access and retrieval.
The efficacy and relevancy of the outcomes are checked and evaluated using
precision and recall processes.

2. Information Retrieval
Information Retrieval (IR) refers to the process of extracting relevant and associated
patterns based on a specific set of words or phrases. In this technique, IR systems make use
of different algorithms to track and monitor user behaviors and discover relevant data
accordingly. Google and Yahoo search engines are the two most renowned IR systems.

3. Categorization
This is one of those text mining techniques that is a form of “supervised” learning wherein normal language
texts are assigned to a predefined set of topics depending upon their content. Thus, categorization or
rather NLP(Naural language processing) is a process of gathering text documents and processing and
analyzing them to uncover the right indexes for each document. The co-referencing method is commonly
used as a part of NLP to extract relevant synonyms and abbreviations from textual data. Today, NLP has
become an automated process used in a host of contexts ranging from personalized commercials delivery
to spam filtering and categorizing web pages under hierarchical definitions, and much more.

4. Clustering
Clustering is one of the most critical text mining techniques. It seeks to identify intrinsic
structures in textual information and organize them into relevant subgroups or ‘clusters’ for
further analysis. A significant challenge in the clustering process is to form meaningful
clusters from the unlabeled textual data without having any prior information on them.
Cluster analysis is a standard text mining tool that assists in data distribution or acts as a pre-
processing step for other text mining algorithms running on detected clusters.

5. Summarisation
Text summarisation refers to the process of automatically generating a compressed
version of a specific text that holds valuable information for the end-user. The aim of this
technique is to browse through multiple text sources to summaries of texts containing a
considerable proportion of information in a concise format, keeping the overall meaning and
intent of the original documents essentially the same. Text summarisation integrates and
combines the various methods that employ text categorization like decision trees, neural
networks, regression models, and swarm intelligence.

6. Natural Language Processing

Natural Language Processing is a challenging problem in the field of Text mining that
uses the concept of artificial intelligence to tackle it. NLP is the study of human language so
that computers can understand natural languages similar to that of humans. By NLP a
computer system is able to remove text mining ambiguities such as homonymy, polysemy,
synonymy and hyponymy. NLP also recognize similar concepts – even if they’ve been
expressed in very different ways. For example, the same word may be spelt differently
(hemophilia/haemophilia, tumor/tumour), the same word may be realized differently in
different contexts (tumor/tumors, suffers/suffered), the same concept may be expressed by
different words entirely (Tylenol / Acetaminophen, heart attack / myocardial infarction).

HOW DOES TEXT MINING WORK:

Text mining helps to analyze large amounts of raw data and find relevant insights.
Combined with machine learning, it can create text analysis models that learn to extract or
classify specific information based on previous information stored.

Even though it may seem like a complicated matter, it can actually be quite simple to get
started with.

❖ The first step is gathering your data. Let’s say you want to analyze conversations
with users through your company’s Intercom live chat. The first you’ll need to do is
generate a document containing this data.

❖ Data can be internal (interactions through chats, emails, surveys, spreadsheets,

databases, etc) or external (information from social media, review sites, news outlets,
and any other websites).

❖ The second step is preparing your data. Text mining systems use several NLP
techniques ― like tokenization, parsing, lemmatization, stemming and stop removal
― to build the inputs of your machine learning model.

EVALUATION:
It is possible to evaluate text extractors by using the same performance metrics as text
classification: accuracy, precision, recall and F1 score. However, these metrics only consider
exact matches as true positives, leaving partial matches aside.
Let’s look at an example:

❖ Suppose you create an address extractor. This could be an example of an exact match
(true positive for the tag Address): ‘6818 Eget St., Tacoma’. However, the output
could also be ‘6818 Eget St.’. In this case, even though it is a partial match, it should
not be considered as a false positive for the tag Address.

❖ To include these partial matches, you should use a performance metric known as
ROUGE (Recall-Oriented Understudy for Gisting Evaluation). ROUGE is a family of
metrics that can be used to better evaluate the performance of text extractors than
traditional metrics such as accuracy or F1. How do they work? They calculate the
lengths and number of sequences overlapping between the original text and the
extraction (extracted text).

❖ The ROUGE metrics (the parameters you would use to compare overlapping between
the two texts mentioned above) need to be defined manually. That way, you can
define ROUGE-n metrics (when n is the length of the units), or a ROUGE-L metric if
you intend is to compare the longest common sequence.

CHALLENGES IN TEXT MINING

❖ Information is in unstructured textual form and it’s in natural Language(NL)
❖ Not readily accessible to be used by computers
❖ Dealing with huge collections of documents
❖ Require skilful person to choose which documents that will treat ,and analysis the
output
❖ Require more time
❖ Cost,50,000$ just to software
❖ Large textual database
❖ Almost all publications are also in electronic form
❖ Very high number of possible “dimensions”( but sparse);
❖ All possible word and phrase type in the language!!
❖ Complex and subtle relationship between concepts in text
❖ Noisy data example : spelling mistakes
❖ Word ambiguity and context sensitivity

APPLICATIONS:

❖ text categorization into specific domains for example spam - non spam emails or for
detecting sexually explicit content ;

❖ text clustering to automatically organize a set of documents. Lets say you have a
folder of 200000 documents in .pdf and you want to organize them…. by hand.
❖ sentiment analysis to identify and extract subjective information in documents.
Detect what your customers are saying about your company when they use social
media

❖ concept/entity extraction that is capable of identifying people, places, organizations,

and other entities from documents. Well this has as a limit only the immagination.

❖ document summarization to automatically provide the most important points in the

original document. This is particulary good for news summary

❖ learning relations between named entities. Here an intresting paper : Identifying

Semantic Relations Between Named Entities from Chinese Texts

❖ Another subtask in NLP is also POS or parts-of-speech of the language. In this task
you try to associate part-of-speach—such as nouns, adjectives, verbs—to words in a
text, based on context and relationship to adjacent words.

❖ Another important task in NLP is coreference resolution. It is about understanding

references to multiple entities existing in the text and disambiguating that reference.

❖ Detection of junk Emails: Unwanted or unsolicited materials which are sending as

email by an organization for advertising or promotional purpose are called junk E-
mails. Text Mining techniques are applied to detect unwanted junk e-mails
automatically using classification techniques.

For example: The sentence “Obama told Joe Biden that he should consider running for
president” is one of those phrases which contains coreferences (Joe Biden and he). This task
is considered as a stepping stone in doing more complex tasks such as question answering
and summarization

CONCLUSION:
Today a huge volume of digital data is available in computer world and most of them
in textual form. To extract information from this unstructured document text mining
techniques are applied. This paper presents a brief overview about Text Mining, data mining
and its related terms. Text mining and data mining both are applied for information extraction
using a number of techniques. The major difference between these two processes is based on
source of data on which mining techniques are to be applied. Data mining is applied on
structured data while text mining is performed on unstructured or semi- structured data.
Different text mining techniques are Information Extraction, information retrieval, document
classification and clustering etc. Natural language processing and machine learning algorithm
are also applied to mine text documents. Information plays a vital role for an organization’s
success and generated from extraction of data. The major text mining application area are
fraudulent detection, detection of spam, Customer relationship management, Research and
development etc. In Text Mining , there are some issues such as ambiguities present in text .
A number of researched has been made but still text mining is immature. Processing of
natural language text is very difficult. A lots of research opportunities are available in this
area.

Module 1 Part1
No ratings yet
Module 1 Part1
54 pages
1 2 3 4 5 Merged
No ratings yet
1 2 3 4 5 Merged
23 pages
10 1109@icaccs 2019 8728547
No ratings yet
10 1109@icaccs 2019 8728547
5 pages
Mining Educational Data To Analyze Students' Performance A Case Study of Mawuli School, Ho
100% (1)
Mining Educational Data To Analyze Students' Performance A Case Study of Mawuli School, Ho
29 pages
XCS-503 Data Base Management Systems Seminar Report On Temporal
No ratings yet
XCS-503 Data Base Management Systems Seminar Report On Temporal
1 page
DMTerm Paper
No ratings yet
DMTerm Paper
4 pages
A Comparison of Machine Learning Algorithms For Customer Churn Prediction
No ratings yet
A Comparison of Machine Learning Algorithms For Customer Churn Prediction
6 pages
Data Mining in Business Intelligence
No ratings yet
Data Mining in Business Intelligence
63 pages
Nolan Growth Model
100% (1)
Nolan Growth Model
14 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
ML With Python and Tensorflow
No ratings yet
ML With Python and Tensorflow
3 pages
Module 1: Introduction To Web Development Module 1: Introduction To Web Development
0% (1)
Module 1: Introduction To Web Development Module 1: Introduction To Web Development
39 pages
04 Make The Case For An IT Governance Redesign
No ratings yet
04 Make The Case For An IT Governance Redesign
14 pages
SCL Activity
No ratings yet
SCL Activity
5 pages
UNIT - 1 Text Mining
No ratings yet
UNIT - 1 Text Mining
18 pages
21cs71 Model Set 1 Paper Solution
No ratings yet
21cs71 Model Set 1 Paper Solution
32 pages
Submitted To: Submitted By:: Text Mining
No ratings yet
Submitted To: Submitted By:: Text Mining
15 pages
Deploying The Tivoli Storage Manager Client in A Windows 2000 Environment Sg246141
No ratings yet
Deploying The Tivoli Storage Manager Client in A Windows 2000 Environment Sg246141
190 pages
Introduction To Sentences: Types of Sentences Declarative Sentences
No ratings yet
Introduction To Sentences: Types of Sentences Declarative Sentences
17 pages
INF1505 - Module 5 - Study Notes
No ratings yet
INF1505 - Module 5 - Study Notes
16 pages
Feedback Management Systems
No ratings yet
Feedback Management Systems
64 pages
Data Guard
No ratings yet
Data Guard
2 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
CITI - Collaborative Institutional Training Initiative8
No ratings yet
CITI - Collaborative Institutional Training Initiative8
15 pages
Case Kellog
No ratings yet
Case Kellog
2 pages
ES Quiz
100% (3)
ES Quiz
85 pages
08 - Forward and Backward Chaining
No ratings yet
08 - Forward and Backward Chaining
12 pages
Exploring The Differences in Performance Measurement Between Research and Development
No ratings yet
Exploring The Differences in Performance Measurement Between Research and Development
19 pages
User Guide: Informatica MDM - Customer 360 10.2 Hotfix 5
No ratings yet
User Guide: Informatica MDM - Customer 360 10.2 Hotfix 5
78 pages
Netflix Case Study Yash
No ratings yet
Netflix Case Study Yash
10 pages
CH 3
No ratings yet
CH 3
27 pages
NLP Tutorial - Javatpoint
No ratings yet
NLP Tutorial - Javatpoint
20 pages
CSE 3121 Information Visualization R Studio All Codes
No ratings yet
CSE 3121 Information Visualization R Studio All Codes
9 pages
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
No ratings yet
1 Explain Apriori Algorithm With Example or Finding Frequent Item Sets Using With Candidate Generation
21 pages
What Is Blockchain
No ratings yet
What Is Blockchain
8 pages
World Wide Web
No ratings yet
World Wide Web
8 pages
Data Mining in Healthcare
No ratings yet
Data Mining in Healthcare
10 pages
Decision Support System Models
100% (1)
Decision Support System Models
12 pages
Ds Inoculation
No ratings yet
Ds Inoculation
2 pages
Odubade Toheeb Adewale CV
No ratings yet
Odubade Toheeb Adewale CV
1 page
Padis Message Standards12
0% (3)
Padis Message Standards12
11 pages
MulesoftDevLead-Resume
No ratings yet
MulesoftDevLead-Resume
11 pages
Text Mining: Techniques and Its Application: December 2014
100% (1)
Text Mining: Techniques and Its Application: December 2014
5 pages
Unit 1-2mark: 1. Define Database Management System
No ratings yet
Unit 1-2mark: 1. Define Database Management System
15 pages
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
No ratings yet
(IJCST-V6I4P5) :S.Sheela, T.Bharathi
7 pages
IT Governance and Ethics
No ratings yet
IT Governance and Ethics
20 pages
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
No ratings yet
43.IJCSCN PreprocessingTechniquesforTextMining Ilamathi Nithya
11 pages
18.emissions of Non-Methane Volatile Organic Compounds From A Landfill Site Ina Major City of India Impact On Local Air Quality
No ratings yet
18.emissions of Non-Methane Volatile Organic Compounds From A Landfill Site Ina Major City of India Impact On Local Air Quality
11 pages
Searching State-Space: Dr. K. Lakshmi
No ratings yet
Searching State-Space: Dr. K. Lakshmi
11 pages
Asana's Performance Management Approach - Edited
No ratings yet
Asana's Performance Management Approach - Edited
5 pages
CFIN Frequently Asked Questions
No ratings yet
CFIN Frequently Asked Questions
13 pages
ERP Systems: Experts Vision
No ratings yet
ERP Systems: Experts Vision
46 pages
Laxnar Infotech Systems Brief Introduction
No ratings yet
Laxnar Infotech Systems Brief Introduction
14 pages
Automated Attendance Systems Using Face Recognition by K-Means Algorithms
No ratings yet
Automated Attendance Systems Using Face Recognition by K-Means Algorithms
8 pages
Documentation Systems
No ratings yet
Documentation Systems
3 pages
Applications of Machine Learning in Cryptography: A Survey: Mohammed M. Alani
No ratings yet
Applications of Machine Learning in Cryptography: A Survey: Mohammed M. Alani
8 pages
Applications of Machine Learning in Cryptography: A Survey: Mohammed M. Alani
No ratings yet
Applications of Machine Learning in Cryptography: A Survey: Mohammed M. Alani
8 pages
Text Analytics
No ratings yet
Text Analytics
30 pages
Problem Solving - II: Dr. K. Lakshmi
No ratings yet
Problem Solving - II: Dr. K. Lakshmi
7 pages
Realities and Perils of Mobile Cloud Computing: Authors: Bheemappa.H Gururaj.P Secab.I.E.T Bijapur, Karnataka
No ratings yet
Realities and Perils of Mobile Cloud Computing: Authors: Bheemappa.H Gururaj.P Secab.I.E.T Bijapur, Karnataka
7 pages
The Visioning Phase
No ratings yet
The Visioning Phase
24 pages
Synopsis of Event Management System
33% (3)
Synopsis of Event Management System
29 pages
Business Research Methods 9th Edition by Zikmund Test Bank Compress Removed
No ratings yet
Business Research Methods 9th Edition by Zikmund Test Bank Compress Removed
19 pages
Method Section-Seminar Paper
No ratings yet
Method Section-Seminar Paper
6 pages
Alphabetical List of Library Routines
No ratings yet
Alphabetical List of Library Routines
8 pages
Seminar Dbms
No ratings yet
Seminar Dbms
6 pages
Resume Saili Mukadam
No ratings yet
Resume Saili Mukadam
1 page
Characteristics of AI Problems: Dr. K. Lakshmi
No ratings yet
Characteristics of AI Problems: Dr. K. Lakshmi
4 pages
Data Mining
No ratings yet
Data Mining
8 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
Survey Data Analysis
No ratings yet
Survey Data Analysis
17 pages
Comm 226 - My Notes
No ratings yet
Comm 226 - My Notes
101 pages
Important Questions Digital & Social Media Marketing 18MBAMM403
No ratings yet
Important Questions Digital & Social Media Marketing 18MBAMM403
2 pages
Cccccase Study On Cloud
No ratings yet
Cccccase Study On Cloud
3 pages
Ism QB
No ratings yet
Ism QB
23 pages
Social Media
No ratings yet
Social Media
17 pages
Module 1 Capsule 2 ITIL Core Concepts V1.3
No ratings yet
Module 1 Capsule 2 ITIL Core Concepts V1.3
5 pages
Text Mining: A Burgeoning Technology For Knowledge Extraction
100% (1)
Text Mining: A Burgeoning Technology For Knowledge Extraction
5 pages
Cloud Case Study
No ratings yet
Cloud Case Study
4 pages
Research Paper On Smart Mirror
No ratings yet
Research Paper On Smart Mirror
5 pages
Periyar Centunary Polytechnic College Periyar Nagar-Vallam: Department of General Engineering Schedule
No ratings yet
Periyar Centunary Polytechnic College Periyar Nagar-Vallam: Department of General Engineering Schedule
2 pages
Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport
No ratings yet
Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport
2 pages
Software Project Plan: NIDS: Network Intrusion Detection System
No ratings yet
Software Project Plan: NIDS: Network Intrusion Detection System
4 pages
Service Quality Models A Review - Seth
No ratings yet
Service Quality Models A Review - Seth
53 pages
Richard L. Nolan Developed The Theoretical Stages of Growth Model (SGM) During The 1970s
100% (1)
Richard L. Nolan Developed The Theoretical Stages of Growth Model (SGM) During The 1970s
6 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Merise Method Power Amc
0% (1)
Merise Method Power Amc
3 pages
New Century Case
No ratings yet
New Century Case
8 pages
Understand The Standardization Protocol For Iot Understand The Concepts of Web of Things. Understand The Concepts of Cloud of Things With Understand The Basic Concepts of Aspect Oriented
No ratings yet
Understand The Standardization Protocol For Iot Understand The Concepts of Web of Things. Understand The Concepts of Cloud of Things With Understand The Basic Concepts of Aspect Oriented
2 pages
Navigating The Competitive Landscape: The Drivers and Consequences of Competitive Aggressiveness
100% (1)
Navigating The Competitive Landscape: The Drivers and Consequences of Competitive Aggressiveness
4 pages
Unit - 5 Web of Things and Cloud of Things
No ratings yet
Unit - 5 Web of Things and Cloud of Things
34 pages
Research Paper
No ratings yet
Research Paper
7 pages
Organisational Behaviour
100% (1)
Organisational Behaviour
14 pages
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
No ratings yet
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
5 pages
Financial Analysis of Amazom - Inc Company
No ratings yet
Financial Analysis of Amazom - Inc Company
9 pages
Text Mining Techniques Applications and Issues2
No ratings yet
Text Mining Techniques Applications and Issues2
5 pages
Competing With Information Technology
No ratings yet
Competing With Information Technology
39 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
7 pages
Applications of Data Mining in The Banking Sector
No ratings yet
Applications of Data Mining in The Banking Sector
8 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
ITECH 5402 - SampleClassTest2
No ratings yet
ITECH 5402 - SampleClassTest2
8 pages
Status of Knowledge Management A Case Study On Reliance Industries Limited
100% (1)
Status of Knowledge Management A Case Study On Reliance Industries Limited
25 pages
L15 The Disadvantages of ICT 1
No ratings yet
L15 The Disadvantages of ICT 1
15 pages
CH 2
No ratings yet
CH 2
11 pages
Business Rules Management System - Healthcare Transcript - IBM
No ratings yet
Business Rules Management System - Healthcare Transcript - IBM
19 pages
Difference Between .Include and .Append?
No ratings yet
Difference Between .Include and .Append?
4 pages
Managing Information Systems
No ratings yet
Managing Information Systems
5 pages
POM Outline
No ratings yet
POM Outline
3 pages
A Survey On Data Mining
No ratings yet
A Survey On Data Mining
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Case Study On Text Mining

Uploaded by

Case Study On Text Mining

Uploaded by

CASE STUDY ON TEXT MINING: APPLICATION,

ISSUES AND CHALLENGES

Text mining is the process of extracting information, pattern or knowledge from

Overview A range of functions to A range of functions to

Data type Structured data from Unstructured textual data

Data retrieval Structured data is Unstructured textual data

Data preparation Structured data is formal and Linguistic and statistical

Need for taxonomy There is no need to create a As the unstructured text

TEXT MINING TECHNIQUES:

Information extraction refers to the process of extracting meaningful

6. Natural Language Processing

HOW DOES TEXT MINING WORK:

❖ Data can be internal (interactions through chats, emails, surveys, spreadsheets,

CHALLENGES IN TEXT MINING

❖ concept/entity extraction that is capable of identifying people, places, organizations,

❖ document summarization to automatically provide the most important points in the

❖ learning relations between named entities. Here an intresting paper : Identifying

❖ Another important task in NLP is coreference resolution. It is about understanding

❖ Detection of junk Emails: Unwanted or unsolicited materials which are sending as

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.