0% found this document useful (0 votes)

70 views

Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets

Introduction Data Mining Tasks Classification & Evaluation Clustering Application Examples

Uploaded by

Asim Tahir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views

Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets

Introduction Data Mining Tasks Classification & Evaluation Clustering Application Examples

Uploaded by

Asim Tahir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Data Mining

Tutorial
Gregory Piatetsky-Shapiro
KDnuggets

2
© 2006 KDnuggets
Trends leading to Data Flood
 More data is generated:
 Web, text, images …
 Business transactions, calls,
...
 Scientific data: astronomy,
biology, etc

 More data is captured:

 Storage technology faster
and cheaper
 DBMS can handle bigger DB

3
© 2006 KDnuggets
Largest Databases in 2005
Winter Corp. 2005 Commercial
Database Survey:
1. Max Planck Inst. for
Meteorology , 222 TB
2. Yahoo ~ 100 TB (Largest Data
Warehouse)
3. AT&T ~ 94 TB
www.wintercorp.com/VLDB/2005_TopTen_Survey/TopTenWinners_2005.asp

4
© 2006 KDnuggets
Data Growth

In 2 years (2003 to 2005),

the size of the largest database TRIPLED!

5
© 2006 KDnuggets
Data Growth Rate

 Twice as much information was created in 2002

as in 1999 (~30% growth rate)
 Other growth rate estimates even higher
 Very little data will ever be looked at by a human

Knowledge Discovery is NEEDED to make sense

and use of data.

6
© 2006 KDnuggets
Knowledge Discovery Definition
Knowledge Discovery in Data is the
non-trivial process of identifying
 valid
 novel
 potentially useful
 and ultimately understandable patterns in data.
from Advances in Knowledge Discovery and Data
Mining, Fayyad, Piatetsky-Shapiro, Smyth, and
Uthurusamy, (Chapter 1), AAAI/MIT Press 1996

7
© 2006 KDnuggets
Related Fields

Machine Visualization
Learning
Data Mining and
Knowledge Discovery

Statistics Databases

8
© 2006 KDnuggets
Statistics, Machine Learning and
Data Mining
 Statistics:
 more theory-based
 more focused on testing hypotheses
 Machine learning
 more heuristic
 focused on improving performance of a learning agent
 also looks at real-time learning and robotics – areas not part of data
mining
 Data Mining and Knowledge Discovery
 integrates theory and heuristics
 focus on the entire process of knowledge discovery, including data
cleaning, learning, and integration and visualization of results
 Distinctions are fuzzy

9
© 2006 KDnuggets
Knowledge Discovery Process
flow, according to CRISP-DM

see
Monitoring www.crisp-dm.org
for more
information

Continuous
monitoring and
improvement is
an addition to CRISP

10
© 2006 KDnuggets
Historical Note:
Many Names of Data Mining
 Data Fishing, Data Dredging: 1960-
 used by statisticians (as bad name)

 Data Mining :1990 --

 used in DB community, business

 Knowledge Discovery in Databases (1989-)

 used by AI, Machine Learning Community
 also Data Archaeology, Information Harvesting,
Information Discovery, Knowledge Extraction, ...
Currently: Data Mining and Knowledge Discovery
are used interchangeably
11
© 2006 KDnuggets
Data Mining Tasks

 Instance (also Item or Record):

 an example, described by a number of attributes,
 e.g. a day can be described by temperature, humidity
and cloud status

 Attribute or Field
 measuring aspects of the Instance, e.g. temperature

 Class (Label)
 grouping of instances, e.g. days good for playing

13
© 2006 KDnuggets
Major Data Mining Tasks
Classification: predicting an item class
Clustering: finding clusters in data
Associations: e.g. A & B & C occur frequently
Visualization: to facilitate human discovery
Summarization: describing a group
 Deviation Detection: finding changes
 Estimation: predicting a continuous value
 Link Analysis: finding relationships
…
© 2006 KDnuggets 14
Classification
Learn a method for predicting the instance class from
pre-labeled (classified) instances

Many approaches:
Statistics,
Decision Trees,
Neural Networks,
...

16
© 2006 KDnuggets
Association Rules &
Frequent Itemsets
Transactions
TID Produce Frequent Itemsets:
1 MILK, BREAD, EGGS
2 BREAD, SUGAR Milk, Bread (4)
3 BREAD, CEREAL
Bread, Cereal (3)
4 MILK, BREAD, SUGAR
5 MILK, CEREAL Milk, Bread, Cereal (2)
6 BREAD, CEREAL …
7 MILK, CEREAL
8 MILK, BREAD, CEREAL, EGGS
9 MILK, BREAD, CEREAL

Rules:
Milk => Bread (66%)

 Presenting the
discovered results in a
visually "nice" way

 Describe features of the

selected group
 Use natural language
and graphics
 Usually in Combination
with Deviation detection
or other methods

Average length of stay in this study area rose 45.7 percent,

from 4.3 days to 6.2 days, because ...

Find true patterns

and avoid overfitting

(finding seemingly signifcant

Day Bang by Roosh V by Gene - PDF Archive
No ratings yet
Day Bang by Roosh V by Gene - PDF Archive
8 pages
Project Report of LT Panel Boards
No ratings yet
Project Report of LT Panel Boards
11 pages
Remote Deposit Capture Project Part 1: Project Integration Management
No ratings yet
Remote Deposit Capture Project Part 1: Project Integration Management
5 pages
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
No ratings yet
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
89 pages
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
No ratings yet
Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets
89 pages
Dmtut
No ratings yet
Dmtut
88 pages
Dm1 Introduction ML Data Mining
100% (1)
Dm1 Introduction ML Data Mining
39 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
DM Lec1
No ratings yet
DM Lec1
40 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Data Mining: Nicoleta ROGOVSCHI
No ratings yet
Data Mining: Nicoleta ROGOVSCHI
84 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
Tum Dersler Veri Madenciligi
No ratings yet
Tum Dersler Veri Madenciligi
123 pages
CPS 196.03: Information Management and Mining: Shivnath Babu
No ratings yet
CPS 196.03: Information Management and Mining: Shivnath Babu
30 pages
KDD - Knowledge Discovery in Databases
No ratings yet
KDD - Knowledge Discovery in Databases
546 pages
Dm1 Introduction Ml Data Mining
No ratings yet
Dm1 Introduction Ml Data Mining
39 pages
Hung-Son Intro-DM KD PDF
No ratings yet
Hung-Son Intro-DM KD PDF
58 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
DM 01 Introduction ML Data Mining
No ratings yet
DM 01 Introduction ML Data Mining
39 pages
Data Mining: July 18, 2019 1
No ratings yet
Data Mining: July 18, 2019 1
41 pages
1.1 DM-intro
No ratings yet
1.1 DM-intro
25 pages
DB-14
No ratings yet
DB-14
97 pages
UNIT 1 (1)
No ratings yet
UNIT 1 (1)
59 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Lect 1 2 Data Mining 3
No ratings yet
Lect 1 2 Data Mining 3
19 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
What Is Data Mining?: Many Definitions
No ratings yet
What Is Data Mining?: Many Definitions
15 pages
CIS 467 - Topic 1 - Introduction - 2020
No ratings yet
CIS 467 - Topic 1 - Introduction - 2020
79 pages
Ch1 Overview Kdd_ml
No ratings yet
Ch1 Overview Kdd_ml
23 pages
1. Introduction
No ratings yet
1. Introduction
26 pages
Data Mining
No ratings yet
Data Mining
26 pages
DM-Unit-I Introduction To Association-1
No ratings yet
DM-Unit-I Introduction To Association-1
97 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
32 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Data Mining
No ratings yet
Data Mining
23 pages
07 DataMining
No ratings yet
07 DataMining
37 pages
1 - 1 Intro To Data Mining - ch1
No ratings yet
1 - 1 Intro To Data Mining - ch1
18 pages
Knowledge Discovery and Data Mining (KDD)
No ratings yet
Knowledge Discovery and Data Mining (KDD)
52 pages
DWM 4
No ratings yet
DWM 4
23 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
DE Unit1_Introdcution_DE_8Jul24
No ratings yet
DE Unit1_Introdcution_DE_8Jul24
56 pages
Module1 DataMining Ktustudents - in
No ratings yet
Module1 DataMining Ktustudents - in
24 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
1 DMiningKuliah 1 Introduction
No ratings yet
1 DMiningKuliah 1 Introduction
51 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
47 pages
DM_C1_Overview
No ratings yet
DM_C1_Overview
55 pages
BI-Unit-3-Part-1-PPT.ppt
No ratings yet
BI-Unit-3-Part-1-PPT.ppt
51 pages
Topic 1b - History, Evolution and DM Classification
No ratings yet
Topic 1b - History, Evolution and DM Classification
16 pages
Chapter1 Introduction 2016
No ratings yet
Chapter1 Introduction 2016
44 pages
Chapter 1___Data Mining and Data Warehouse
No ratings yet
Chapter 1___Data Mining and Data Warehouse
44 pages
DMlecture1
No ratings yet
DMlecture1
39 pages
Datamining & Cluster Coputing
No ratings yet
Datamining & Cluster Coputing
16 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer System Servicing: TLE-Information and Communication Technology
No ratings yet
Computer System Servicing: TLE-Information and Communication Technology
14 pages
Walking Tapir Automaton
No ratings yet
Walking Tapir Automaton
17 pages
Temperature Monıtorıng of Chıllıng System Usıng IoT Technıques
No ratings yet
Temperature Monıtorıng of Chıllıng System Usıng IoT Technıques
6 pages
Wireless Technology
No ratings yet
Wireless Technology
24 pages
Strathclyde-Portsmouth-Research-Areas-for-Split-Site-PhD
No ratings yet
Strathclyde-Portsmouth-Research-Areas-for-Split-Site-PhD
3 pages
Lavadora Meka - Manaul de Operaciones
100% (2)
Lavadora Meka - Manaul de Operaciones
22 pages
Brigade Product Catalogue Edition 20 English
No ratings yet
Brigade Product Catalogue Edition 20 English
88 pages
VNPT Approach in Smart City Development
No ratings yet
VNPT Approach in Smart City Development
23 pages
Aruba BOM FInal
No ratings yet
Aruba BOM FInal
3 pages
Cambridge International AS & A Level: Computer Science 9608/11
No ratings yet
Cambridge International AS & A Level: Computer Science 9608/11
16 pages
Document (1)
No ratings yet
Document (1)
175 pages
20240910 SVG244 Cadastral Surveying class test
No ratings yet
20240910 SVG244 Cadastral Surveying class test
4 pages
INTAC-OM
No ratings yet
INTAC-OM
32 pages
CCC 2
No ratings yet
CCC 2
32 pages
ForgeOps Dok
No ratings yet
ForgeOps Dok
59 pages
Adwea Approved Vendors List
No ratings yet
Adwea Approved Vendors List
321 pages
UsbFix Report
No ratings yet
UsbFix Report
4 pages
Ericsson Mobility Report June 2024
No ratings yet
Ericsson Mobility Report June 2024
40 pages
CSE Guide - Standalone Solar System PDF
No ratings yet
CSE Guide - Standalone Solar System PDF
8 pages
Skuld List of Correspondent
No ratings yet
Skuld List of Correspondent
351 pages
Giáo trình vật liệu và công nghệ cơ khí - Pgs.Ts.Hoàng Tùng
No ratings yet
Giáo trình vật liệu và công nghệ cơ khí - Pgs.Ts.Hoàng Tùng
162 pages
Xelix+Implementation+Guide+-+Full+Platform
No ratings yet
Xelix+Implementation+Guide+-+Full+Platform
23 pages
Bosello Wre Thunder en 60 020 0040ii (2).PDF
No ratings yet
Bosello Wre Thunder en 60 020 0040ii (2).PDF
5 pages
C Question Bank Ebook
No ratings yet
C Question Bank Ebook
83 pages
Choose Among Investment Alternatives
No ratings yet
Choose Among Investment Alternatives
19 pages
Learning Outcomes: (Student To Write Briefly About Learning Obtained From The Academic Tasks)
No ratings yet
Learning Outcomes: (Student To Write Briefly About Learning Obtained From The Academic Tasks)
8 pages
Circular 20240417 Mobile Crane Load Test
No ratings yet
Circular 20240417 Mobile Crane Load Test
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets

Uploaded by

Data Mining Tutorial: Gregory Piatetsky-Shapiro Kdnuggets

Uploaded by

Data Mining

 More data is captured:

In 2 years (2003 to 2005),

 Twice as much information was created in 2002

Knowledge Discovery is NEEDED to make sense

 Data Mining :1990 --

 Knowledge Discovery in Databases (1989-)

 Instance (also Item or Record):

 Describe features of the

Average length of stay in this study area rose 45.7 percent,

Find true patterns

(finding seemingly signifcant

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.