0% found this document useful (0 votes)

84 views23 pages

Unit 3

The document provides an introduction to data mining. It discusses the motivation for data mining as extracting knowledge from large amounts of data. It defines data mining as the process of discovering patterns in large data sets using techniques from artificial intelligence, machine learning, statistics, and database systems. The document outlines the key steps in the knowledge discovery process (KDD) used for data mining. It also discusses different types of data that can be mined and some common issues in data mining.

Uploaded by

Manan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views23 pages

Unit 3

Uploaded by

Manan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

2170715 – Data Mining &

Business Intelligence

Unit-3
Introduction to Data Mining
(DM)
Outline
 Motivation: Why data mining?
 What is data mining?
 Data mining functionalities
 Classification of Data mining systems
 Data Mining: On what kind of data?
 Data Mining Architecture
 KDD Process
 Data mining issues
Motivation : Why Data Mining?

Data Mining

Twitter Trends Google Trends

Motivation : Why Data Mining?
“Necessity is the Mother of Invention”

Data Solution
“Data Mining”
Explosion Extraction of interesting
Knowledge from data in large

Problem databases

“It has been estimated that the amount of information in

the world doubles every 10 months.”
 There is a tremendous increase in the amount of data recorded
and stored on digital media as well as individual sources.
Why Data Mining? (Cont..)
“We are drowning in data, but starving for knowledge!”
“Data rich but Information poor”

 Since the 1960’s, database and information technology has been

changed systematically from primitive file processing systems to
powerful database systems.
 The research and development in database systems since the
1970’s has led to the development of relational database systems.
Why Data Mining? (Cont..)
Years Evolution
Since 1960’s Data collection, database creation, IMS (hierarchical database system
by IBM) and network DBMS
1970s Relational data model, relational DBMS implementation
1980s RDBMS, advanced data models, application-oriented DBMS (spatial,
scientific, engineering, etc.)
1990s Data mining, data warehousing, multimedia databases, and web
databases
2000s Stream data management and mining, Social Networks (Facebook,
etc.), web technology (XML) and global information systems
At Present Heterogeneous database systems, big data

Every day data grows exponentially,

but these all data are really important to us??
What is Data Mining?
Data mining refers to extracting or “mining” knowledge from
large amounts of data.

“Knowledge mining from data” or “Knowledge mining”

“Extract knowledge from large data or databases”

“Knowledge discovery from database (KDD)”

What is Data Mining? (Cont..)
 It is the computational process of discovering patterns in large
data sets involving methods at the intersection of artificial
intelligence, machine learning, statistics, and database systems.

The overall goal of the data mining process is to extract

information from a large data sets and transform it into
an understandable structure for further use.
What is Data Mining? (Cont..)
Data  Knowledge  Action  Goal

Netflix collects user ratings of movies (data) => What types of movies you will
like (knowledge) => Recommend new movies to you (action) => Users stay
with Netflix (goal)

Gene sequences of cancer patients (data) => Which genes lead to cancer?
(knowledge) => Appropriate treatment (action) => Save life (goal)

Road traffic (data) => Which road is likely to be congested? (knowledge) =>
Suggest better routes to drivers (action) => Save time and energy (goal)
KDD Process
 The knowledge discovery process is an iterative sequence of the following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved from the
database)
4. Data transformation (where data are transformed and consolidated into forms
appropriate for mining by performing summary or aggregation operations)
5. Data mining (an essential process where intelligent methods are applied to
extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns representing
knowledge based on interestingness measures)
7. Knowledge presentation (where visualization and knowledge representation
techniques are used to present mined knowledge to users)
KDD Process (Cont..)
Appropriate
To identifyforthe
mining
truly by
interesting
Intelligent Pattern Evaluation
performing summary
patterns
methods are
or applied
aggregation Patterns
representing
in order
The analysis To remove
operations,
extractfor
toknowledge data Data Mining
Knowledge
task are noise andinstance.
patterns.
retrieved frominconsistent
the database. data.
Transformation
Transformed
Data
Preprocessing

Visualization and knowledge

Selection Preprocessed representation techniques are
Data used to present the mined
knowledge to the user.
Target Data

Unit: 3 – Introduction to Data Mining (DM) 11 Darshan Institute of Engineering & Technology
Domains of Data Mining Systems
 Data mining is an interdisciplinary field, joining of a set of
disciplines, including database systems, statistics, machine
learning, visualization and information science.

Database
Technology

Machine
Statistics
Learning

Data
Mining
Other
Visualization
Disciplines

Information
Science
Data Mining—On what kind of data?
 Relational Databases:
• A database system, also called a database management system (DBMS),
consists of a collection of interrelated data, known as a database, and a set
of software programs to manage and access the data.
• E.g. : SQL Server, Oracle etc.
 Data Warehouses:
• A data warehouse is a repository of information collected from multiple
sources.
• Data warehouses are constructed via a process of data cleaning, data
integration, data transformation, data loading, and periodic data refreshing.
• E.g. : Stock Market, D-Mart, Big Bazar etc.
Data Mining—On what kind of data? (Cont..)
 Transactional Databases:
• Transactional database consists of a file where each record represents a
transaction.
• A transaction typically includes a unique transaction identity number (TID)
and a list of the items making up the transaction (such as items purchased
in a store).
• E.g. : Online shopping like Flipkart, Amazon etc.
 Other Data
• Spatial data (Maps or Location)
• Engineering design data (Design of Buildings, Offices Structures)
• Hypertext and multimedia data (Including text, image, video, and audio
data), the World Wide Web (a huge, widely distributed information
repository made available on the Internet).
Data Mining Issues
 Data mining issues can be classified into five categories:
1. Mining Methodology
2. User Interaction
3. Efficiency and Scalability
4. Diversity of Database Types
5. Data Mining and Society
1) Mining Methodology
 Mining various and new kinds of knowledge
• Data mining covers a wide spectrum of data analysis and knowledge
discovery tasks, so these tasks may use the same database in different ways
and require the development of numerous data mining techniques.
 Mining knowledge in multidimensional space
• When searching for knowledge in large data sets, we can explore the data
in multidimensional space.
• That is, we can search for interesting patterns among combinations of
dimensions (attributes) at varying levels of abstraction. Such mining is
known as (exploratory) multidimensional data mining.
1) Mining Methodology (Cont..)
 Data mining—an interdisciplinary effort
• The power of data mining can be substantially enhanced by integrating new
methods from multiple disciplines.
• For example, to mine data with natural language text, it makes sense to
fuse data mining methods of information retrieval and natural language
processing.
 Handling uncertainty, noise, or incompleteness of data
• Data often contain noise, errors, exceptions, uncertainty or incomplete.
• Errors and noise may confuse the data mining process, leading to the
derivation of erroneous patterns.
2) User Interaction
 Interactive mining
• The data mining process should be highly interactive. Thus, it is important
to build flexible user interfaces and an exploratory mining environment,
facilitating the user’s interaction with the system.
 Incorporation of background knowledge
• Background knowledge, constraints, rules, and other information
regarding the domain under study should be incorporated into the
knowledge discovery process.
 Presentation and visualization of data mining results
• How any system can present data mining results, vividly(clear image in
mind) and flexibly ?, so that the discovered knowledge can be easily
understood and directly usable by humans.
3) Efficiency and Scalability
 Efficiency and scalability of data mining algorithms
• Data mining algorithms must be efficient and scalable in order to
effectively extract information from huge amounts of data lies in many data
repositories or in dynamic data streams.
• In other words, the running time of a data mining algorithm must be
predictable, short, and acceptable by applications.
• Efficiency, scalability, performance, optimization, and the ability to execute
in real time are key criteria for new mining algorithms.
 Parallel, distributed, and incremental mining algorithms
• The giant size of many data sets, the wide distribution of data, and the
computational complexity of some data mining methods are factors that
motivate the development of parallel and distributed data-intensive mining
algorithms.
4) Diversity of Database Types
 Handling complex types of data
• Data mining is how to uncover knowledge from stream, time-series,
sequence, graph, social network, and multirelational data.
• In mining various types of attributes are available and also different types of
data in database or dataset.
 Mining dynamic, networked, and global data repositories
• Data from multiple sources are connected by the Internet and various kinds
of networks like distributed and heterogeneous global information
systems.
• The discovery of knowledge from different sources of structured, semi-
structured, or unstructured challengeable.
• Web Mining, multisource data mining and information network mining
have become challenging and fast-evolving data mining fields.
5) Data Mining and Society
 Social impacts of data mining
• With data mining penetrating our everyday lives, it is important to study the
impact of data mining on society, How can we used at a mining technology to
benefit our society? How can we guard against its misuse?
 Privacy-preserving data mining
• Data mining will help in scientific discovery, business management, economy
recovery, and security protection (e.g., the real-time discovery of intruders and
cyber attacks).
• However, it poses the risk of disclosing an individual’s personal information.
 Invisible data mining
• We cannot expect everyone in society to learn and master in data mining
techniques.
• For example, when purchasing items online, users may be unaware that the store is
likely collecting data on the buying patterns of its customers, which may be used to
recommend other items for purchase in the future.

Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
DWDM - Unit - II
No ratings yet
DWDM - Unit - II
55 pages
02-Introduction to Data Mining
No ratings yet
02-Introduction to Data Mining
40 pages
02 DM BI Data Mining
No ratings yet
02 DM BI Data Mining
66 pages
Slide 03 Chapter1 Introduction
No ratings yet
Slide 03 Chapter1 Introduction
36 pages
DM-Unit 1 PPT
No ratings yet
DM-Unit 1 PPT
110 pages
DB-14
No ratings yet
DB-14
97 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
No ratings yet
2-Introduction To Data Mining, Steps in Data Mining Process-31-07-2024
77 pages
Data Mining
No ratings yet
Data Mining
61 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
01Intro.pptx
No ratings yet
01Intro.pptx
40 pages
intro data mining
No ratings yet
intro data mining
51 pages
01Intro (2)
No ratings yet
01Intro (2)
45 pages
01Intro
No ratings yet
01Intro
41 pages
01Intro (1)
No ratings yet
01Intro (1)
40 pages
Module 1
No ratings yet
Module 1
40 pages
VIPDMTheoryChapter1
No ratings yet
VIPDMTheoryChapter1
25 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
41 pages
Week 01 Chapt01
No ratings yet
Week 01 Chapt01
49 pages
LECTURE 1 data mining
No ratings yet
LECTURE 1 data mining
41 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
Introduction
No ratings yet
Introduction
46 pages
01Intro
No ratings yet
01Intro
28 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
01 Intro
No ratings yet
01 Intro
35 pages
Lecture_01_11jan
No ratings yet
Lecture_01_11jan
29 pages
Introduction
No ratings yet
Introduction
27 pages
01 Intro
No ratings yet
01 Intro
29 pages
01 Intro
No ratings yet
01 Intro
40 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
No ratings yet
Cse5243 Intro. To Data Mining: Chapter 1. Introduction
56 pages
Combine 056
No ratings yet
Combine 056
57 pages
01 - Introduction To Datamining
No ratings yet
01 - Introduction To Datamining
19 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
UNIT 1 INTRODUCTION TO DATASCIENCE
No ratings yet
UNIT 1 INTRODUCTION TO DATASCIENCE
14 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
41 pages
01 Intro
No ratings yet
01 Intro
22 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Lecture 1. Introduction
No ratings yet
Lecture 1. Introduction
42 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Unit - I
No ratings yet
Unit - I
22 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Course: COMP6140 - Data Mining Effective Period: September 2017
No ratings yet
Course: COMP6140 - Data Mining Effective Period: September 2017
24 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
39 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
Shrena Tiwari - B.tech - CS - SVVV
No ratings yet
Shrena Tiwari - B.tech - CS - SVVV
2 pages
b.tech Cse II-II-dbms (Mr 22) Mid1-Objective Question Bank 04-04-2024
No ratings yet
b.tech Cse II-II-dbms (Mr 22) Mid1-Objective Question Bank 04-04-2024
10 pages
01 Intro
No ratings yet
01 Intro
23 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Interview Questions and Answers
No ratings yet
Interview Questions and Answers
12 pages
Chap 1
No ratings yet
Chap 1
32 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
BI102_FBI_ASSESSMENT_EXAM_2017_Answers
No ratings yet
BI102_FBI_ASSESSMENT_EXAM_2017_Answers
4 pages
INFS 427 Automated Information Retrieval
No ratings yet
INFS 427 Automated Information Retrieval
388 pages
Cyber Warfare Techniques Tactics and Tools For Security Practitioners Book
100% (2)
Cyber Warfare Techniques Tactics and Tools For Security Practitioners Book
434 pages
Data Mining
No ratings yet
Data Mining
27 pages
Biodata
No ratings yet
Biodata
2 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
AI Unit 1
No ratings yet
AI Unit 1
14 pages
TuhinDutta MediaDotNet SDE Resume
No ratings yet
TuhinDutta MediaDotNet SDE Resume
1 page
Big Data Thesis Statement
100% (2)
Big Data Thesis Statement
6 pages
Security Without Obscurity A Guide To Cryptographic Architectures (Stapleton, Jeffrey James)
No ratings yet
Security Without Obscurity A Guide To Cryptographic Architectures (Stapleton, Jeffrey James)
207 pages
DSA Project Title (1)
No ratings yet
DSA Project Title (1)
1 page
Science Bsc Information Technology Semester 6 2024 April Principles of Geographic Information Systems Cbcs
No ratings yet
Science Bsc Information Technology Semester 6 2024 April Principles of Geographic Information Systems Cbcs
2 pages
Open Source Technologies
No ratings yet
Open Source Technologies
19 pages
Full-Time 4-Year Undergraduate Degrees - Minor Programmes
No ratings yet
Full-Time 4-Year Undergraduate Degrees - Minor Programmes
1 page
Database_ SSS Three
No ratings yet
Database_ SSS Three
5 pages
What We Learned From A Year of Building With LLMs (Part I) - O'Reilly
No ratings yet
What We Learned From A Year of Building With LLMs (Part I) - O'Reilly
22 pages
PHP Basics Activity
No ratings yet
PHP Basics Activity
2 pages
IAI: Machine Learning: © John A. Bullinaria, 2005
No ratings yet
IAI: Machine Learning: © John A. Bullinaria, 2005
20 pages
Study Guide For Exam DP-203 - Data Engineering On Microsoft Azure - Microsoft Learn
No ratings yet
Study Guide For Exam DP-203 - Data Engineering On Microsoft Azure - Microsoft Learn
4 pages
CV Format
No ratings yet
CV Format
1 page
Oracle SQL Key Topics
No ratings yet
Oracle SQL Key Topics
1 page
Untitled
No ratings yet
Untitled
2 pages
CV (Java Software Developer)
No ratings yet
CV (Java Software Developer)
1 page
The Unreasonable Effectiveness of Data PDF
No ratings yet
The Unreasonable Effectiveness of Data PDF
5 pages
ELKI
No ratings yet
ELKI
7 pages
Semantic Feature Enabled Agglomerative Clustering For Information Technology Job Profile Analysis
No ratings yet
Semantic Feature Enabled Agglomerative Clustering For Information Technology Job Profile Analysis
5 pages
Lab Manual DC - Exp 9
No ratings yet
Lab Manual DC - Exp 9
3 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3

Uploaded by

Unit 3

Uploaded by

2170715 – Data Mining &

Twitter Trends Google Trends

“It has been estimated that the amount of information in

 Since the 1960’s, database and information technology has been

Every day data grows exponentially,

“Knowledge mining from data” or “Knowledge mining”

“Extract knowledge from large data or databases”

“Knowledge discovery from database (KDD)”

The overall goal of the data mining process is to extract

Visualization and knowledge

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.