0% found this document useful (0 votes)

6 views

Lecture Notes 1.1 & 1.2

The document provides an overview of data mining, defining it as the process of extracting knowledge from large data sets for applications such as market analysis and fraud detection. It outlines the knowledge discovery process, types of data, functionalities of data mining, and various techniques used in classification, prediction, and clustering. Additionally, it discusses the classification of data mining systems and major issues in data mining, emphasizing the need for efficient algorithms and effective presentation of results.

Uploaded by

Sajal Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture Notes 1.1 & 1.2

Uploaded by

Sajal Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Lecture Notes

Course Name: Data Mining and Warehousing

Course Code: 22CSH– 380

Introduction
DEFINITION OF DATA MINING?

Data Mining is defined as extracting information from huge sets of data. In other words, we can say that
data mining is the procedure of mining knowledge from data. The information or knowledge extracted so
can be used for any of the following applications

• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
Major Sources of data: -
Business –Web, E-commerce, Transactions, Stocks - Science – Remote Sensing, Bio informatics,
Scientific Simulation - Society and Everyone – News, Digital Cameras, You Tube * Need for turning data
into knowledge – Drowning in data, but starving for knowledge.
Definition of Data Mining?
Extracting and ‘Mining’ knowledge from large amounts of data. “Gold Mining from rock or sand” is same
as “Knowledge mining from data”
Other terms for Data Mining:

 Knowledge Mining
 Knowledge Extraction o Pattern Analysis

KNOWLEDGE DISCOVERY (KDD) PROCESS:

Several Key Steps:

▶ Data processing
▶ Data cleaning (remove noise and inconsistent data)
▶ Data integration (multiple data sources maybe combined)
▶ Data selection (data relevant to the analysis task are retrieved from database)
Data transformation (data transformed or consolidated into forms)
▶ appropriate for mining)
(Done with data preprocessing)
▶ Data mining (an essential process where intelligent methods are applied to extract
data patterns)
▶ Pattern evaluation (identify the truly interesting patterns)
▶ Knowledge presentation (mined knowledge is presented to the user with visualization or
representation techniques)

DATA MINING ON WHAT KIND OF DATA? ( TYPES OF DATA ):

RELATIONAL DATABASES:
 A database system, also called a database management system (DBMS),
consists of a collection of interrelated data, known as a database, and a set of software
programs to manage and access the data.
 A relational database: is a collection of tables, each of which is assigned a unique name.
 Each table consists of a set of attributes (columns or fields) and usually stores a large set of
tuples (records or rows).
 Each tuple in a relational table represents an object identified by a unique key and
described by a set of attribute values.
 A semantic data model, such as an entity-relationship (ER) data model, is often
constructed for relational databases.

 An ER data model represents the database as a set of entities and their relationships.

Apex Institute of Technology, Chandigarh University, India

DATA MINING FUNCTIONALITIES— WHAT KINDS
OF PATTERNS CAN BE MINED?:
Data mining functionalities are used to specify the kind of patterns to be found in data mining
tasks. data mining tasks can be classified into two categories: descriptive and predictive.
Descriptive mining tasks characterize the general properties of the data in the database.
Predictive mining tasks perform inference on the current data in order to make predictions.
CONCEPT/CLASS DESCRIPTION: CHARACTERIZATION
AND DISCRIMINATION:
 Data can be associated with classes or concepts.
 Example: AllElectronics store, classes of items for sale include
computers and printers, and concepts of customers include bigSpenders and
budgetSpenders.
 It can be useful to describe individual classes and concepts in summarized, concise,
and yet precise terms. Such descriptions of a class or a concept are called class/concept
descriptions. These descriptions can be derived via
 data characterization, by summarizing the data of the class under study (often called
the target class) in general terms,
 data discrimination, by comparison of the target class with one or a set of
comparative classes (often called the contrasting classes), or both data characterization and
discrimination.
Data characterization:

 It is a summarization of the general characteristics or features of a target class of

data.
 The data corresponding to the user-specified class are typically collected by a
database query the output of data characterization can be presented in various forms.
Examples include pie charts, bar charts, curves, multidimensional data cubes, and
multidimensional tables, including crosstabs.
Data discrimination:

 It is a comparison of the general features of target class data objects with the
general features of objects from one or a set of contrasting classes.
 The target and contrasting classes can be specified by the user, and the
corresponding data objects retrieved through database queries.
MINING FREQUENT PATTERNS, ASSOCIATIONS, AND CORRELATIONS:
Frequent patterns, as the name suggests, are patterns that occur frequently in data. There are
many kinds of frequent patterns, including item sets, sub sequences, and substructures.
A frequent itemset typically refers to a set of items that frequently appear together in a
transactional data set, such as Computer and Software.
Example: Association analysis. Suppose, as a marketing manager of
AllElectronics, you would like to determine which items are frequently purchased together
within the same transactions.

Apex Institute of Technology, Chandigarh University, India

Example of such a rule, mined from the AllElectronics transactional database, is
buys(X;―computer‖) buys(X; ―software‖) [support = 1%, confidence = 50%].
where X is a variable representing a customer. A confidence, or certainty, of 50% means that
if a customer buys a computer, there is a 50% chance that she will buy software as well. A
1% support means that 1% of all of the transactions under analysis showed that computer and
software were purchased together.

CLASSIFICATION AND PREDICTION:

Classification is the process of finding a model (or function) that describes and distinguishes
data classes or concepts, for the purpose of being able to use the model to predict the class of
objects whose class label is unknown.
“How is the derived model presented?”:
The derived model may be represented in various forms, such as classification (IF-
THEN) rules, decision trees, mathematical formulae, or neural networks.
A decision tree is a flow-chart-like tree structure, where each node denotes a test on an
attribute value, each branch represents an outcome of the test, and tree leaves represent
classes or class distributions. Decision trees can easily be converted to classification rules.
A neural network, when used for classification, is typically a collection of neuron-like
processing units with weighted connections between the units. There are many other methods
for constructing classification models, such as naïve Bayesian classification, support vector
machines, and k-nearest neighbour classification.
Whereas classification predicts categorical (discrete, unordered) labels, prediction models
Continuous-valued functions. That is, it is used to predict missing or unavailable numerical
data values rather than class labels. Although the term prediction may refer to both numeric
prediction and class label prediction,
Cluster Analysis
Classification and prediction analyse class-labelled data objects, where as clustering
analyzes data objects without consulting a known class label.
Outlier Analysis
A database may contain data objects that do not comply with the general behavior or model of
the data. These data objects are outliers. Most data mining methods discard outliers as noise
or exceptions.
Evolution Analysis
Data evolution analysis describes and models regularities or trends for objects whose
behavior changes over time. Although this may include characterization, discrimination,
association and correlation analysis, classification, prediction, or clustering of time related
data.

Apex Institute of Technology, Chandigarh University, India

CLASSIFICATION OF DATA MINING SYSTEMS:

Data mining is an interdisciplinary field, the confluence of a set of disciplines, including

database systems, statistics, machine learning, visualization, and information science.
Moreover, depending on the data mining approach used, techniques from other disciplines
may be applied, such as neural networks, fuzzy and/or rough set theory, knowledge
representation, inductive logic programming, or high- performance computing.

Data mining systems can be categorized according to various criteria, as follows:

Classification according to the kinds of databases mined:
A data mining system can be classified according to the kinds of databases mined. Database
systems can be classified according to different criteria (such as data models, or the types of
data or applications involved), each of which may require its own data mining technique.
Classification according to the kinds of knowledge mined:
Data mining systems can be categorized according to the kinds of knowledge they mine, that
is, based on data mining functionalities, such as characterization, discrimination, association
and correlation analysis, classification, prediction, clustering, outlier analysis, and evolution
analysis.
Classification according to the kinds of techniques utilized:
Data mining systems can be categorized according to the underlying data mining techniques
employed. These techniques can be described according to the degree of user interaction
involved (e.g., autonomous systems, interactive exploratory systems, query-driven systems)
or the methods of data analysis employed (e.g., database-oriented or data warehouse–
oriented techniques, machine learning, statistics, visualization, pattern recognition, neural
networks, and so on).
Classification according to the applications adapted:
Data mining systems can also be categorized according to the applications they adapt. For
example, data mining systems may be tailored specifically for finance, telecommunications,
DNA, stock markets, e-mail, and so on. Different applications often require the integration of

Apex Institute of Technology, Chandigarh University, India

application-specific methods.

DATA MINING TASK PRIMITIVES:

A data mining query is defined in terms of the following primitives:
Task-relevant data:
This is the database portion to be investigated. For example, suppose that you are a manager
of All Electronics in charge of sales in the United States and Canada. In particular, you would
like to study the buying trends of customers in Canada. Rather than mining on the entire
database. These are referred to as relevant attributes
The kinds of knowledge to be mined:
This specifies the data mining functions to be performed, such as characterization,
discrimination, association, classification, clustering, or evolution analysis. For instance, if
studying the buying habits of customers in Canada.
Background knowledge:
Users can specify background knowledge, or knowledge about the domain to be mined. This
knowledge is useful for guiding the knowledge discovery process, and for evaluating the
patterns found. There are several kinds of background knowledge.
Interestingness measures:
These functions are used to separate uninteresting patterns from knowledge. They may be
used to guide the mining process, or after discovery, to evaluate the discovered patterns.
Different kinds of knowledge may have different interestingness measures.
Presentation and visualization of discovered patterns:
This refers to the form in which discovered patterns are to be displayed. Users can choose
from different forms for knowledge presentation, such as rules, tables, charts, graphs,
decision trees, and cubes.

MAJOR ISSUES IN DATA MINING:

Mining different kinds of knowledge in databases. - The need of different users is not the
same. And Different user may be in interested in different kind of knowledge. Therefore it is
necessary for data mining to cover broad range of knowledge discovery task.
Interactive mining of knowledge at multiple levels of abstraction. - The data mining
process needs to be interactive because it allows users to focus the search for patterns,
providing and refining data mining requests based on returned results.
Incorporation of background knowledge. - To guide discovery process and to express the
discovered patterns, the background knowledge can be used.
Background knowledge may be used to express the discovered patterns not only in concise
terms but at multiple level of abstraction.
Data mining query languages and ad hoc data mining. - Data Mining Query language that
allows the user to describe ad hoc mining tasks, should be integrated with a data warehouse

Apex Institute of Technology, Chandigarh University, India

query language and optimized for efficient and flexible data mining. Presentation and
visualization of data mining results. - Once the patterns are discovered it needs to be
expressed in high level languages, visual representations. This representations should be
easily understandable by the users.
Handling noisy or incomplete data. - The data cleaning methods are required that can
handle the noise, incomplete objects while mining the data regularities. If data cleaning
methods are not there then the accuracy of the discovered patterns will be poor.
Pattern evaluation. - It refers to interestingness of the problem. The patterns discovered
should be interesting because either they represent common knowledge or lack novelty.
Efficiency and scalability of data mining algorithms. - In order to effectively extract the
information from huge amount of data in databases, data mining algorithm must be efficient
and scalable.
Parallel, distributed, and incremental mining algorithms. –
The factors such as huge size of databases, wide distribution of data,and complexity of data
mining methods motivate the development of parallel and distributed data mining algorithms.
These algorithm divide the data into partitions which is further processed parallel. Then the
results from the partitions is merged. The incremental algorithms, updates databases without
having mine the data again from scratch.

Suggestive Reading Material

• TEXT BOOKS
Introduction to Data Mining, Tan, Steinbach and Vipin Kumar, Pearson Education, 2016
• REFERENCE BOOKS
Data Mining: Concepts and Techniques, Pei, Han and Kamber, Elsevier
• Journals:
• http://www.ijsmsjournal.org/ijsms-v4i4p137.html
• https://www.springer.com/journal/41060

Apex Institute of Technology, Chandigarh University, India

MAS 100 Questions With Answer PDF
75% (4)
MAS 100 Questions With Answer PDF
25 pages
Data Mining Notes For BCA 5th Sem 2019 PDF
No ratings yet
Data Mining Notes For BCA 5th Sem 2019 PDF
46 pages
PS7 Primera Parte
No ratings yet
PS7 Primera Parte
5 pages
Jelly Roll Morton's Discography
No ratings yet
Jelly Roll Morton's Discography
27 pages
OCSC Economic Impact Study 091812 (Final)
No ratings yet
OCSC Economic Impact Study 091812 (Final)
45 pages
STATCON Case Digests
0% (1)
STATCON Case Digests
236 pages
Unit 1
No ratings yet
Unit 1
21 pages
1.1 - Data Mining
No ratings yet
1.1 - Data Mining
18 pages
data mining unit I notes
No ratings yet
data mining unit I notes
24 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Data Mining Functionalities
100% (1)
Data Mining Functionalities
4 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
module 1
No ratings yet
module 1
41 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Unit-1 Notes (1)
No ratings yet
Unit-1 Notes (1)
24 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Data Warehouse & Mining
No ratings yet
Data Warehouse & Mining
28 pages
DM Unit1 Intro
No ratings yet
DM Unit1 Intro
12 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
CH 2
No ratings yet
CH 2
37 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
DATA MINING
No ratings yet
DATA MINING
7 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
Unit 1
No ratings yet
Unit 1
59 pages
001Lecture_1 Introduction-1
No ratings yet
001Lecture_1 Introduction-1
40 pages
Data Mining
No ratings yet
Data Mining
6 pages
unit 3 BI & Data science (1)
No ratings yet
unit 3 BI & Data science (1)
19 pages
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
No ratings yet
Kinds of Data: 1. Data Bases Data 2.data Warehouses Data 3. Transactional Data
24 pages
Bca DM Unit I
No ratings yet
Bca DM Unit I
20 pages
CIS527: Data Warehousing, Filtering, and Mining: Fall 2004, CIS, Temple University
No ratings yet
CIS527: Data Warehousing, Filtering, and Mining: Fall 2004, CIS, Temple University
50 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
DATA_MINING_UNIT_1
No ratings yet
DATA_MINING_UNIT_1
13 pages
02-Data Mining Functionalities-2
No ratings yet
02-Data Mining Functionalities-2
23 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
2 unit
No ratings yet
2 unit
15 pages
Data Mining 1 2 and 3
No ratings yet
Data Mining 1 2 and 3
20 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
DM-unit 1
No ratings yet
DM-unit 1
22 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Data Mining
No ratings yet
Data Mining
23 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
DM NOTES PRA
No ratings yet
DM NOTES PRA
63 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
2 Data Mining
No ratings yet
2 Data Mining
20 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Soln 1
100% (1)
Soln 1
6 pages
Data Minng
No ratings yet
Data Minng
20 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
24 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
III-IT-Data Mining Unit 1-Session 2-Part1
No ratings yet
III-IT-Data Mining Unit 1-Session 2-Part1
17 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture Notes 1.7 & 1.8
No ratings yet
Lecture Notes 1.7 & 1.8
3 pages
Lecture Notes 1.3 & 1.4
No ratings yet
Lecture Notes 1.3 & 1.4
2 pages
assingment (1)
No ratings yet
assingment (1)
3 pages
Chapter-2.1 CFG and Derivations
No ratings yet
Chapter-2.1 CFG and Derivations
11 pages
Experiment 2.2
No ratings yet
Experiment 2.2
17 pages
AIML Case Study
No ratings yet
AIML Case Study
4 pages
Ihsanullah CV
No ratings yet
Ihsanullah CV
3 pages
Guide To Where To Find Venepuncture Training and Competency Assessment A...
No ratings yet
Guide To Where To Find Venepuncture Training and Competency Assessment A...
1 page
Chapter 2 Creating and Editing A Webpage Using Inline Styles
No ratings yet
Chapter 2 Creating and Editing A Webpage Using Inline Styles
29 pages
Gujarat Result 2024
No ratings yet
Gujarat Result 2024
240 pages
Iic Sedena MZ 10 0 Esp. Téc. Display Mach Pro View
No ratings yet
Iic Sedena MZ 10 0 Esp. Téc. Display Mach Pro View
12 pages
Notification National Seeds Corporation Limited Non Executive MGT Diploma Trainee Posts
No ratings yet
Notification National Seeds Corporation Limited Non Executive MGT Diploma Trainee Posts
7 pages
Statement of Purpose
No ratings yet
Statement of Purpose
2 pages
Lourdes B. Mesa Lourdes B. Mesa
No ratings yet
Lourdes B. Mesa Lourdes B. Mesa
3 pages
Registration Form No. 1 For The Fonavi Return Process
No ratings yet
Registration Form No. 1 For The Fonavi Return Process
2 pages
TMC Plan Approval
No ratings yet
TMC Plan Approval
2 pages
HACCP WORKBOOK 2021 Eng
No ratings yet
HACCP WORKBOOK 2021 Eng
14 pages
Temilezitire
No ratings yet
Temilezitire
2 pages
Assignment 3
No ratings yet
Assignment 3
25 pages
By The End of The 17th Century
No ratings yet
By The End of The 17th Century
5 pages
Kelly Salary Guide 2017
100% (2)
Kelly Salary Guide 2017
48 pages
Project Proposal On G 8 Five Star Intern
100% (3)
Project Proposal On G 8 Five Star Intern
67 pages
STUDENTS' ACHIEVEMENT (PS1) (1.1 and 1.2 Attainment & Progress)
No ratings yet
STUDENTS' ACHIEVEMENT (PS1) (1.1 and 1.2 Attainment & Progress)
11 pages
Beard of Sorrow
No ratings yet
Beard of Sorrow
12 pages
ADR4
No ratings yet
ADR4
4 pages
Preparation of Silica Gel From Rice Husk Ash Using
No ratings yet
Preparation of Silica Gel From Rice Husk Ash Using
7 pages
21PA1A0571 Internship Final Report
No ratings yet
21PA1A0571 Internship Final Report
38 pages
A 4.5mm2 Multimodal Biosensing SoC For PPG, ECG, BIOZ and GSR Acquisition in Consumer Wearable Devices
No ratings yet
A 4.5mm2 Multimodal Biosensing SoC For PPG, ECG, BIOZ and GSR Acquisition in Consumer Wearable Devices
3 pages
2stroke Tuning
No ratings yet
2stroke Tuning
5 pages
1 - Intoduction To Deep Geothermal Engineering
No ratings yet
1 - Intoduction To Deep Geothermal Engineering
24 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture Notes 1.1 & 1.2

Uploaded by

Lecture Notes 1.1 & 1.2

Uploaded by

Lecture Notes

Course Name: Data Mining and Warehousing

KNOWLEDGE DISCOVERY (KDD) PROCESS:

Several Key Steps:

DATA MINING ON WHAT KIND OF DATA? ( TYPES OF DATA ):

Apex Institute of Technology, Chandigarh University, India

 It is a summarization of the general characteristics or features of a target class of

Apex Institute of Technology, Chandigarh University, India

CLASSIFICATION AND PREDICTION:

Apex Institute of Technology, Chandigarh University, India

Data mining is an interdisciplinary field, the confluence of a set of disciplines, including

Data mining systems can be categorized according to various criteria, as follows:

Apex Institute of Technology, Chandigarh University, India

DATA MINING TASK PRIMITIVES:

MAJOR ISSUES IN DATA MINING:

Apex Institute of Technology, Chandigarh University, India

Suggestive Reading Material

Apex Institute of Technology, Chandigarh University, India

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.