0% found this document useful (0 votes)

15 views

Data-Mining-OVERVIEW (1)

Data mining is the process of extracting valuable information from large data sets, enabling organizations to make informed decisions and identify trends. The document outlines the data mining process, including understanding business objectives, data preparation, modeling, evaluation, and deployment, while highlighting its applications in market analysis, fraud detection, and customer understanding. It also discusses the importance of data mining in enhancing efficiency, risk management, and innovation across various industries.

Uploaded by

linardtipagad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Data-Mining-OVERVIEW (1)

Uploaded by

linardtipagad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Mining is defined as the procedure of extracting information from huge sets of

data. In other words, we can say that data mining is mining knowledge from data. The
tutorial starts off with a basic overview and the terminologies involved in data mining
and then gradually moves on to cover topics such as knowledge discovery, query
language, classification and prediction, decision tree induction, cluster analysis, and
how to mine the Web.

Data mining, also known as Knowledge Discovery in Data (KDD), is the process of
uncovering patterns and other valuable information from large data sets. Over the last
few decades, the development of data warehousing technology and the growth of big
data have rapidly accelerated the adoption of data mining techniques, helping
companies transform their raw data into useful information. However, even though that
technology continuously evolves to handle data at a large scale, leaders still face
challenges with scalability and automation.

Data mining enables organizations to make better decisions through intelligent data
analyses. Two main purposes may be given to the data mining techniques that underlie
these analyses; they can indicate the target file, or predict its outcome using machine
learning algorithms. These methods are being used to organize and filter data, showing
the most interesting information such as fraud detection, user behavior, bottlenecks, or
even security failures.

When combined with data analytics and visualization tools, like Apache Spark, delving
into the world of data mining has never been easier, and extracting relevant insights has
never been faster. Advances in artificial intelligence only continue to expedite adoption
across industries. This Data mining tutorial explains the basics of data mining and
then extends to learn its advanced concepts also.

Data Mining Process

The data mining process explains different phases to be executed step by step.

Understand Business

 Identify the Company's and Project's Objectives first

 Problems that need to be addressed
 Project constraints or limitations
 The business impact of potential solutions

Understand the Data

 Identify what type of data is needed to solve the issue i.e.begin preliminary
analysis of the data
 Collect it from authentic sources; obtain access rights, and prepare a data
description report

Prepare the Data

 Clean the data: handle missing data, data errors, default values, and data
corrections.
 Integrate the data: combine two disparate data sets to get the final target data
set.
 Format the data: convert data types or configure data for the specific mining
technology being used.
 Prepare the data in a format

Model the Data

 Employ algorithms to ascertain data patterns

 create, the model, test it, and validate the model

Evaluation

 Validate models with business goals

 Change the model, adjust the business goal, or revisit the data, if needed

Deployment

 Generate business intelligence

 Continually monitoring, and maintaining the data mining application

Why Data Mining?

Data mining is important to learn for several reasons:

 Extracting Insights: Data mining techniques allow users to extract useful

information and patterns from vast amounts of data. Businesses can make sound
decisions, identify trends, and compete with their peers through analysis of these
patterns.
 Decision Making: Data mining contributes to the decision-making process.
Businesses can predict future trends and outcomes with a high degree of
confidence through the analysis of historical data.
 Customer Understanding: By analyzing the behavior, preferences, and
purchasing patterns of customers, data mining enables enterprises to gain a
more accurate understanding of their clients. This information can be used for
personalized marketing strategies, improving customer satisfaction, and
enhancing their loyalty.
 Risk Management: Using data mining techniques to analyze patterns and
anomalies in the data, businesses can identify possible risks or frauds. In sectors
such as finance, insurance, and healthcare where risk management is of
paramount importance, this should be a particular concern.
 Improved Efficiency: Data mining, which can greatly enhance the efficiency of
operations, aids in automatically discovering patterns and insights from data.
Businesses can reduce the time and resources needed to focus on more strategy
initiatives by outsourcing repetitive tasks.
 Innovation: Hidden patterns and relationships in the data that can lead to new
product ideas, innovativeness, or business possibilities may be discovered by
analyzing it. Businesses can remain ahead of the competition and drive
innovation through creative data exploration and analysis.
 Personal Development: The analytical and problem-solving skills are enhanced
by the knowledge of data mining. It provides you with valuable tools and
techniques for handling and analyzing large datasets, which are essential skills in
today's data-driven world.

In general, data mining is important for learning as it enables businesses to collect

useful information from the data so that they can make educated decisions, mitigate
risks, increase efficiency, understand customers more effectively, innovate, and develop
themselves.

here s a huge amount of data available in the Information Industry. This data is of no
use until it is converted into useful information. It is necessary to analyze this huge
amount of data and extract useful information from it.

Extraction of information is not the only process we need to perform; data mining also
involves other processes such as Data Cleaning, Data Integration, Data Transformation,
Data Mining, Pattern Evaluation and Data Presentation. Once all these processes are
over, we would be able to use this information in many applications such as Fraud
Detection, Market Analysis, Production Control, Science Exploration, etc.

What is Data Mining?

Data Mining is defined as extracting information from huge sets of data. In other words,
we can say that data mining is the procedure of mining knowledge from data. The
information or knowledge extracted so can be used for any of the following applications
−

 Market Analysis
 Fraud Detection
 Customer Retention
 Production Control
 Science Exploration
Data Mining Applications

Data mining is highly useful in the following domains −

 Market Analysis and Management

 Corporate Analysis & Risk Management
 Fraud Detection

Apart from these, data mining can also be used in the areas of production control,
customer retention, science exploration, sports, astrology, and Internet Web Surf-Aid

Market Analysis and Management

Listed below are the various fields of market where data mining is used −

Customer Profiling − Data mining helps determine what kind of people buy
what kind of products.
Identifying Customer Requirements − Data mining helps in identifying the best
products for different customers. It uses prediction to find the factors that may
attract new customers.
Cross Market Analysis − Data mining performs Association/correlations
between product sales.
Target Marketing − Data mining helps to find clusters of model customers who
share the same characteristics such as interests, spending habits, income, etc.
Determining Customer purchasing pattern − Data mining helps in determining
customer purchasing pattern.
Providing Summary Information − Data mining provides us various
multidimensional summary reports.

Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector −

 Finance Planning and Asset Evaluation − It involves cash flow analysis and
prediction, contingent claim analysis to evaluate assets.
 Resource Planning − It involves summarizing and comparing the resources and
spending.
 Competition − It involves monitoring competitors and market directions.

Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to
detect frauds. In fraud telephone calls, it helps to find the destination of the call, duration
of the call, time of the day or week, etc. It also analyzes the patterns that deviate from
expected norms.

Data mining deals with the kind of patterns that can be mined. On the basis of the kind
of data to be mined, there are two categories of functions involved in Data Mining −

 Descriptive
 Classification and Prediction

Descriptive Function

The descriptive function deals with the general properties of data in the database. Here
is the list of descriptive functions −

 Class/Concept Description
 Mining of Frequent Patterns
 Mining of Associations
 Mining of Correlations
 Mining of Clusters

Class/Concept Description

Class/Concept refers to the data to be associated with the classes or concepts. For
example, in a company, the classes of items for sales include computer and printers,
and concepts of customers include big spenders and budget spenders. Such
descriptions of a class or a concept are called class/concept descriptions. These
descriptions can be derived by the following two ways −

 Data Characterization − This refers to summarizing data of class under study.

This class under study is called as Target Class.
 Data Discrimination − It refers to the mapping or classification of a class with
some predefined group or class.

Mining of Frequent Patterns

Frequent patterns are those patterns that occur frequently in transactional data. Here is
the list of kind of frequent patterns −
Frequent Item Set − It refers to a set of items that frequently appear together, for
example, milk and bread.
Frequent Subsequence − A sequence of patterns that occur frequently such as
purchasing a camera is followed by memory card.
Frequent Sub Structure − Substructure refers to different structural forms, such
as graphs, trees, or lattices, which may be combined with item-sets or
subsequences.

Mining of Association

Associations are used in retail sales to identify patterns that are frequently purchased
together. This process refers to the process of uncovering the relationship among data
and determining association rules.

For example, a retailer generates an association rule that shows that 70% of time milk is
sold with bread and only 30% of times biscuits are sold with bread.

Mining of Correlations

It is a kind of additional analysis performed to uncover interesting statistical correlations

between associated-attribute-value pairs or between two item sets to analyze that if
they have positive, negative or no effect on each other.

Mining of Clusters

Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming
group of objects that are very similar to each other but are highly different from the
objects in other clusters.

Classification and Prediction

Classification is the process of finding a model that describes the data classes or
concepts. The purpose is to be able to use this model to predict the class of objects
whose class label is unknown. This derived model is based on the analysis of sets of
training data. The derived model can be presented in the following forms −
 Classification (IF-THEN) Rules
 Decision Trees
 Mathematical Formulae
 Neural Networks

The list of functions involved in these processes are as follows −

 Classification − It predicts the class of objects whose class label is unknown. Its
objective is to find a derived model that describes and distinguishes data classes
or concepts. The Derived Model is based on the analysis set of training data i.e.
the data object whose class label is well known.
 Prediction − It is used to predict missing or unavailable numerical data values
rather than class labels. Regression Analysis is generally used for prediction.
Prediction can also be used for identification of distribution trends based on
available data.
 Outlier Analysis − Outliers may be defined as the data objects that do not
comply with the general behavior or model of the data available.
 Evolution Analysis − Evolution analysis refers to the description and model
regularities or trends for objects whose behavior changes over time.

Data Mining Task Primitives

 We can specify a data mining task in the form of a data mining query.
 This query is input to the system.
 A data mining query is defined in terms of data mining task primitives.
Note − These primitives allow us to communicate in an interactive manner with the data
mining system. Here is the list of Data Mining Task Primitives −
 Set of task relevant data to be mined.
 Kind of knowledge to be mined.
 Background knowledge to be used in discovery process.
 Interestingness measures and thresholds for pattern evaluation.
 Representation for visualizing the discovered patterns.

Set of task relevant data to be mined

This is the portion of database in which the user is interested. This portion includes the
following −

 Database Attributes
 Data Warehouse dimensions of interest

Kind of knowledge to be mined

It refers to the kind of functions to be performed. These functions are −

 Characterization
 Discrimination
 Association and Correlation Analysis
 Classification
 Prediction
 Clustering
 Outlier Analysis
 Evolution Analysis

Background knowledge

The background knowledge allows data to be mined at multiple levels of abstraction.

For example, the Concept hierarchies are one of the background knowledge that allows
data to be mined at multiple levels of abstraction.

Interestingness measures and thresholds for pattern evaluation

This is used to evaluate the patterns that are discovered by the process of knowledge
discovery. There are different interesting measures for different kind of knowledge.

Representation for visualizing the discovered patterns

This refers to the form in which discovered patterns are to be displayed. These
representations may include the following. −

 Rules
 Tables
 Charts
 Graphs
 Decision Trees
 Cubes

WSO - IB Sheet
No ratings yet
WSO - IB Sheet
2 pages
Lisa Manoban's E-Commerce Project Charter
No ratings yet
Lisa Manoban's E-Commerce Project Charter
10 pages
Kantar - Consultant Interview Questions
No ratings yet
Kantar - Consultant Interview Questions
11 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Module 1 Introduction To Data Mining
No ratings yet
Module 1 Introduction To Data Mining
4 pages
VO_MCA_S4_Data Mining Unit 1
No ratings yet
VO_MCA_S4_Data Mining Unit 1
18 pages
DataMiningFinal
No ratings yet
DataMiningFinal
38 pages
Data Mining
No ratings yet
Data Mining
2 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Unit 3
No ratings yet
Unit 3
22 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Technincal Report
No ratings yet
Technincal Report
10 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Mining
No ratings yet
Data Mining
6 pages
Data mining
No ratings yet
Data mining
8 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
DM Module1
No ratings yet
DM Module1
15 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Unit-1
No ratings yet
Unit-1
7 pages
Data Mining
No ratings yet
Data Mining
11 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining
No ratings yet
Data Mining
18 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
ISS-DSS - Module 3
No ratings yet
ISS-DSS - Module 3
23 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
12 pages
Data Science
No ratings yet
Data Science
11 pages
DM Unit-1
No ratings yet
DM Unit-1
27 pages
DWDM 3 UNIT NOTES
No ratings yet
DWDM 3 UNIT NOTES
10 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
Data Mining L1,2
No ratings yet
Data Mining L1,2
26 pages
unit 1 DM
No ratings yet
unit 1 DM
24 pages
UNIT - 5
No ratings yet
UNIT - 5
22 pages
Data Mining
No ratings yet
Data Mining
4 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Data Mining Process Week3
No ratings yet
Data Mining Process Week3
13 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DMW Notes by Me
No ratings yet
DMW Notes by Me
45 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
Data Mining - Docx Unit 1
No ratings yet
Data Mining - Docx Unit 1
10 pages
Data Mining Process 1
No ratings yet
Data Mining Process 1
4 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
MUAZ
No ratings yet
MUAZ
21 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
DADM Data Analytics
No ratings yet
DADM Data Analytics
3 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
Dadm (1) Sidra
No ratings yet
Dadm (1) Sidra
9 pages
Data Mining
No ratings yet
Data Mining
30 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
dm mod1
No ratings yet
dm mod1
29 pages
Module 3
No ratings yet
Module 3
187 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
46 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
DATA MINING Notes
No ratings yet
DATA MINING Notes
3 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
LABOR - Mega Pro International Business Vs Domingo - Procedural and Substantive Due Process
No ratings yet
LABOR - Mega Pro International Business Vs Domingo - Procedural and Substantive Due Process
2 pages
Contemporary World Midterm
No ratings yet
Contemporary World Midterm
20 pages
Statistical Analysis With Software Applications: Module No. 5 Title: Planning The Study
No ratings yet
Statistical Analysis With Software Applications: Module No. 5 Title: Planning The Study
7 pages
PHRI Employee Relations
No ratings yet
PHRI Employee Relations
55 pages
Chapter 12 - Diversity in Organizations
No ratings yet
Chapter 12 - Diversity in Organizations
9 pages
The Evolution of The European Low-Ryanair Case Study: Laura Diaconu
No ratings yet
The Evolution of The European Low-Ryanair Case Study: Laura Diaconu
5 pages
Materials Marketplace Final Report
No ratings yet
Materials Marketplace Final Report
115 pages
Follow-Ups MakkarIELTS Sep16v1.0
No ratings yet
Follow-Ups MakkarIELTS Sep16v1.0
114 pages
Observer's Report Template
100% (1)
Observer's Report Template
2 pages
Essentials of Good Financial Statements, Types of Financial Analysis
No ratings yet
Essentials of Good Financial Statements, Types of Financial Analysis
2 pages
49 Danon v. Brimo & Co., G.R. No. 15823, September 12, 1921
No ratings yet
49 Danon v. Brimo & Co., G.R. No. 15823, September 12, 1921
2 pages
Form CT1 2018 (post 30-06-22) (1)
No ratings yet
Form CT1 2018 (post 30-06-22) (1)
6 pages
CCM 1
No ratings yet
CCM 1
23 pages
TFA 1st Week
No ratings yet
TFA 1st Week
35 pages
For Vehicle Weight Enforcement Applications: Proven Reliability and Ease of Use
No ratings yet
For Vehicle Weight Enforcement Applications: Proven Reliability and Ease of Use
6 pages
Advanced Strategic Management Book
No ratings yet
Advanced Strategic Management Book
180 pages
industrial safety question bank
No ratings yet
industrial safety question bank
7 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
42 pages
Bilingual 分成合約 IO - template 20191126
No ratings yet
Bilingual 分成合約 IO - template 20191126
11 pages
Arta Sample Accomplished Form - Sectoral Mapping of Regulators
100% (2)
Arta Sample Accomplished Form - Sectoral Mapping of Regulators
7 pages
W6 Module 5-Business Process Modeling
No ratings yet
W6 Module 5-Business Process Modeling
41 pages
Wa0012.
No ratings yet
Wa0012.
2 pages
Design Science Research in Practice
No ratings yet
Design Science Research in Practice
19 pages
Top 20 Startups in India - 2022 - Great Learning
No ratings yet
Top 20 Startups in India - 2022 - Great Learning
16 pages
Jep 33 2 3
No ratings yet
Jep 33 2 3
43 pages
GA4 PROJECT SHIVAM
No ratings yet
GA4 PROJECT SHIVAM
23 pages
Commerce Project 1
No ratings yet
Commerce Project 1
20 pages
Practical Guides To Success in The Rics Apc
No ratings yet
Practical Guides To Success in The Rics Apc
48 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data-Mining-OVERVIEW (1)

Uploaded by

Data-Mining-OVERVIEW (1)

Uploaded by

Data Mining is defined as the procedure of extracting information from huge sets of

Data Mining Process

 Identify the Company's and Project's Objectives first

Understand the Data

Prepare the Data

Model the Data

 Employ algorithms to ascertain data patterns

 Validate models with business goals

 Generate business intelligence

Why Data Mining?

Data mining is important to learn for several reasons:

 Extracting Insights: Data mining techniques allow users to extract useful

In general, data mining is important for learning as it enables businesses to collect

What is Data Mining?

Data mining is highly useful in the following domains −

 Market Analysis and Management

Market Analysis and Management

Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector −

 Data Characterization − This refers to summarizing data of class under study.

Mining of Frequent Patterns

It is a kind of additional analysis performed to uncover interesting statistical correlations

Classification and Prediction

The list of functions involved in these processes are as follows −

Data Mining Task Primitives

Set of task relevant data to be mined

Kind of knowledge to be mined

It refers to the kind of functions to be performed. These functions are −

The background knowledge allows data to be mined at multiple levels of abstraction.

Interestingness measures and thresholds for pattern evaluation

Representation for visualizing the discovered patterns

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.