0% found this document useful (0 votes)

54 views

Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi

The document provides an overview of data mining. It defines data mining as extracting usable information from large raw datasets using software to analyze patterns. Data mining is used in business to gain insights into customers and optimize resource usage. It involves collecting, storing, and processing data using algorithms to segment data and predict future events. The document also discusses data mining applications in market analysis, corporate risk management, and fraud detection.

Uploaded by

Sonal Bachhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi

Uploaded by

Sonal Bachhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

MIT CIDCO DATA MINING BSc(IT) SEM - VI

UNIT -I
What is Data Mining?
 In simple words, data mining is defined as a process used to extract usable data from a larger set
of any raw data.
 It implies analyzing data patterns in large batches of data using one or more software. Data
mining has applications in multiple fields, like science and research.
 As an application of data mining, businesses can learn more about their customers and develop
more effective strategies related to various business functions and in turn leverage resources in a
more optimal and insightful manner.
 This helps businesses be closer to their objective and make better decisions.
 Data mining involves effective data collection and warehousing as well as computer processing.
 For segmenting the data and evaluating the probability of future events, data mining uses
sophisticated mathematical algorithms.
 Data mining is also known as Knowledge Discovery in Data (KDD).
 Key features of data mining:
1. Automatic pattern predictions based on trend and behavior analysis.
2. Prediction based on likely outcomes.
3. Creation of decision-oriented information.
4. Focus on large data sets and databases for analysis.
5. Clustering based on finding and visually documented groups of facts not previously known.



Definition:
 Data Mining is defined as extracting information from huge sets of data. In other words, we can
say that data mining is the procedure of mining knowledge from data.

Prepared by : Asst.Prof. Rutuja Sontakke Page 1

MIT CIDCO DATA MINING BSc(IT) SEM - VI

DBMS VS DATA MINING :

 A DBMS (Database Management System) is a complete system used for managing digital
databases that allows storage of database content, creation/maintenance of data, search and other
functionalities.
 On the other hand, Data Mining is a field in computer science, which deals with the extraction of
previously unknown and interesting information from raw data.
 Usually, the data used as the input for the Data mining process is stored in databases.
 Users who are inclined toward statistics use Data Mining. They utilize statistical models to look
for hidden patterns in data.
 Data miners are interested in finding useful relationships between different data elements, which
is ultimately profitable for businesses.
 DBMS is a full-fledged system for housing and managing a set of digital databases.
 However Data Mining is a technique or a concept in computer science, which deals with
extracting useful and previously unknown information from raw data.
 Most of the times, these raw data are stored in very large databases.
 Therefore Data miners use the existing functionalities of DBMS to handle, manage and even
preprocess raw data before and during the Data mining process.
 However, a DBMS system alone cannot be used to analyze data. But, some DBMS at present
have inbuilt data analyzing tools or capabilities.

Issues and Challenges in DM:

Data mining is not an easy task, as the algorithms used can get very complex and data is not always
available at one place. It needs to be integrated from various heterogeneous data sources. These factors
also create some issues. Here in this tutorial, we will discuss the major issues regarding −

 Mining Methodology and User Interaction

 Performance Issues
 Diverse Data Types Issues

Prepared by : Asst.Prof. Rutuja Sontakke Page 2

MIT CIDCO DATA MINING BSc(IT) SEM - VI

The following diagram describes the major issues.

Prepared by : Asst.Prof. Rutuja Sontakke Page 3

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Prepared by : Asst.Prof. Rutuja Sontakke Page 4

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Data Mining Applications

Data mining is highly useful in the following domains −

 Market Analysis and Management

 Corporate Analysis & Risk Management

 Fraud Detection

Apart from these, data mining can also be used in the areas of production control, customer retention,
science exploration, sports, astrology, and Internet Web Surf-Aid

Market Analysisand Management

Listed below are the various fields of market where data mining is used −

 Customer Profiling − Data mining helps determine what kind of people buy what kind of
products.

 Identifying Customer Requirements − Data mining helps in identifying the best products for
different customers. It uses prediction to find the factors that may attract new customers. 

 Cross Market Analysis − Data mining performs Association/correlations between product

sales. 

 Target Marketing − Data mining helps to find clusters of model customers who share the same
characteristics such as interests, spending habits, income, etc.

 Determining Customer purchasing pattern − Data mining helps in determining customer

purchasing pattern.

 Providing Summary Information − Data mining provides us various multidimensional

summary reports.

Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector −

 Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction,
contingent claim analysis to evaluate assets. 

 Resource Planning − It involves summarizing and comparing the resources and spending. 

 Competition − It involves monitoring competitors and market directions.

Prepared by : Asst.Prof. Rutuja Sontakke Page 5

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In
fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.

DM Applications-Case Studies

Prepared by : Asst.Prof. Rutuja Sontakke Page 6

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Current Trends Affecting DM:

Prepared by : Asst.Prof. Rutuja Sontakke Page 7

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Basic Data mining Task:

Data mining deals with the kind of patterns that can be mined. On the basis of the kind of data to be
mined, there are two categories of functions involved in Data Mining −

 Descriptive
 Classification and Prediction
Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of
descriptive functions −

 Class/Concept Description
 Mining of Frequent Patterns 
 Mining of Associations
 Mining of Correlations
 Mining of Clusters

Prepared by : Asst.Prof. Rutuja Sontakke Page 8

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Class/Concept Description
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a
company, the classes of items for sales include computer and printers, and concepts of customers
include big spenders and budget spenders. Such descriptions of a class or a concept are called
class/concept descriptions. These descriptions can be derived by the following two ways −

 Data Characterization − This refers to summarizing data of class under study. This class under
study is called as Target Class. 

 Data Discrimination − It refers to the mapping or classification of a class with some predefined
group or class. 

Mining of Frequent Patterns

Frequent patterns are those patterns that occur frequently in transactional data. Here is the list of kind of
frequent patterns −

 Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk
and bread. 

 Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a

camera is followed by memory card.

 Frequent Sub Structure − Substructure refers to different structural forms, such as graphs,
trees, or lattices, which may be combined with item-sets or subsequences. 

Mining of Association
Associations are used in retail sales to identify patterns that are frequently purchased together. This
process refers to the process of uncovering the relationship among data and determining association
rules.

For example, a retailer generates an association rule that shows that 70% of time milk is sold with
bread and only 30% of times biscuits are sold with bread.

Mining of Correlations
It is a kind of additional analysis performed to uncover interesting statistical correlations between
associated-attribute-value pairs or between two item sets to analyze that if they have positive, negative
or no effect on each other.

Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects
that are very similar to each other but are highly different from the objects in other clusters.

Prepared by : Asst.Prof. Rutuja Sontakke Page 9

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Classification and Prediction

Classification is the process of finding a model that describes the data classes or concepts. The purpose
is to be able to use this model to predict the class of objects whose class label is unknown. This derived
model is based on the analysis of sets of training data. The derived model can be presented in the
following forms −

 Classification (IF-THEN) Rules

 Decision Trees
 Mathematical Formulae
 Neural Networks
The list of functions involved in these processes are as follows −

 Classification − It predicts the class of objects whose class label is unknown. Its objective is to
find a derived model that describes and distinguishes data classes or concepts. The Derived
Model is based on the analysis set of training data i.e. the data object whose class label is well
known. 

 Prediction − It is used to predict missing or unavailable numerical data values rather than class
labels. Regression Analysis is generally used for prediction. Prediction can also be used for
identification of distribution trends based on available data.

 Outlier Analysis − Outliers may be defined as the data objects that do not comply with the
general behavior or model of the data available. 

 Evolution Analysis − Evolution analysis refers to the description and model regularities or
trends for objects whose behavior changes over time. 

Data Mining Task Primitives

 We can specify a data mining task in the form of a data mining query. 
 This query is input to the system. 
 A data mining query is defined in terms of data mining task primitives.
Note − These primitives allow us to communicate in an interactive manner with the data mining
system. Here is the list of Data Mining Task Primitives −

 Set of task relevant data to be mined. 

 Kind of knowledge to be mined. 

Prepared by : Asst.Prof. Rutuja Sontakke Page 10

MIT CIDCO DATA MINING BSc(IT) SEM - VI

 Background knowledge to be used in discovery process.

 Interestingness measures and thresholds for pattern evaluation. 
 Representation for visualizing the discovered patterns.
Set of task relevant data to be mined
This is the portion of database in which the user is interested. This portion includes the following −

 Database Attributes
 Data Warehouse dimensions of interest
Kind of knowledge to be mined
It refers to the kind of functions to be performed. These functions are −

 Characterization
 Discrimination
 Association and Correlation Analysis
 Classification
 Prediction
 Clustering
 Outlier Analysis
 Evolution Analysis
Background knowledge
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the
Concept hierarchies are one of the background knowledge that allows data to be mined at multiple
levels of abstraction.

Interestingness measures and thresholds for pattern evaluation

This is used to evaluate the patterns that are discovered by the process of knowledge discovery. There
are different interesting measures for different kind of knowledge.

Representation for visualizing the discovered patterns

This refers to the form in which discovered patterns are to be displayed. These representations may
include the following. −

 Rules
 Tables

Prepared by : Asst.Prof. Rutuja Sontakke Page 11

MIT CIDCO DATA MINING BSc(IT) SEM - VI

 Charts
 Graphs
 Decision Trees
 Cubes

DM TECHNIQUES: Given in the classroom.

Theory questions on unit – I

1. What is data miming? Explain in brief.
2. Discuss different data mining tasks.
3. Give brief account of data mining techniques.
4. What is clustering of data? Distinguish supervised and unsupervised
classifications.
5. What are the different application areas of data mining?

Prepared by : Asst.Prof. Rutuja Sontakke Page 12

dp-900 - 2c26aa3133b9 - 260 Questions
100% (1)
dp-900 - 2c26aa3133b9 - 260 Questions
187 pages
BSI-TR-03162 - English
No ratings yet
BSI-TR-03162 - English
31 pages
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
No ratings yet
Unit 4 New Database Applications and Environments: by Bhupendra Singh Saud
14 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
No ratings yet
Data Mining Is Defined As The Procedure of Extracting Information From Huge Sets of Data
6 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Seminar on Data Mining Concepts and Its
No ratings yet
Seminar on Data Mining Concepts and Its
8 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
43 pages
unit-5-dwdm
No ratings yet
unit-5-dwdm
42 pages
Data Mining
No ratings yet
Data Mining
8 pages
Unit - I
No ratings yet
Unit - I
22 pages
UNIT I DBMI
No ratings yet
UNIT I DBMI
35 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Data Mining
No ratings yet
Data Mining
6 pages
DataMiningFinal
No ratings yet
DataMiningFinal
38 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
data mining unit I notes
No ratings yet
data mining unit I notes
24 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
MC5403 Adbdm Unit Ii Notes
No ratings yet
MC5403 Adbdm Unit Ii Notes
59 pages
Chap 1
No ratings yet
Chap 1
32 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
dm mod1
No ratings yet
dm mod1
29 pages
Data Mining
No ratings yet
Data Mining
31 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
08 Data Mining Application
No ratings yet
08 Data Mining Application
19 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
Anaum Hamid: Lecture 01 - Introduction To DM
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
50 pages
data mining 1
No ratings yet
data mining 1
39 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
24 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
No ratings yet
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
DM Material
No ratings yet
DM Material
98 pages
Unit 1
No ratings yet
Unit 1
59 pages
Overview of Data Mining
No ratings yet
Overview of Data Mining
4 pages
L1 CH 1 Introd
No ratings yet
L1 CH 1 Introd
97 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
12 pages
DM ITERA 2020 w1
No ratings yet
DM ITERA 2020 w1
35 pages
Data Mining
100% (4)
Data Mining
9 pages
DMDW
No ratings yet
DMDW
287 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
DWDM 01 Introduction
No ratings yet
DWDM 01 Introduction
43 pages
SWEN3165 Lecture 9 - Data Mining
No ratings yet
SWEN3165 Lecture 9 - Data Mining
32 pages
Data Mining
No ratings yet
Data Mining
88 pages
Dadm (1) Sidra
No ratings yet
Dadm (1) Sidra
9 pages
Introduction To Data Mining: Dr. Hany Saleeb
No ratings yet
Introduction To Data Mining: Dr. Hany Saleeb
17 pages
Data Mining
No ratings yet
Data Mining
14 pages
U1_1
No ratings yet
U1_1
13 pages
Cs507 Data Mining
100% (1)
Cs507 Data Mining
3 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kaspars_Mednis_keep_all_secrets_encrypted_and_secure
No ratings yet
Kaspars_Mednis_keep_all_secrets_encrypted_and_secure
53 pages
Service-Oriented Architecture Characteristics
No ratings yet
Service-Oriented Architecture Characteristics
26 pages
CS223final Project F2017-1
No ratings yet
CS223final Project F2017-1
4 pages
FSAE Installation Guide
No ratings yet
FSAE Installation Guide
36 pages
Human Computerinteraction: Comsats University Islamabad, Wah Campus
No ratings yet
Human Computerinteraction: Comsats University Islamabad, Wah Campus
24 pages
Poco Whitepaper v2.3
No ratings yet
Poco Whitepaper v2.3
34 pages
Unit-Iii Distributed Objects and Remote Invocation
No ratings yet
Unit-Iii Distributed Objects and Remote Invocation
12 pages
AI Robot Trouble Shooting Guide: User Was Unable To Download From Links and You Need To Send Ea Direct
No ratings yet
AI Robot Trouble Shooting Guide: User Was Unable To Download From Links and You Need To Send Ea Direct
3 pages
(Ebook) The Book of Qt 4: The Art of Building Qt Applications by Daniel Molkentin ISBN 9781593271473, 9783937514123, 1593271476, 3937514120 download
100% (1)
(Ebook) The Book of Qt 4: The Art of Building Qt Applications by Daniel Molkentin ISBN 9781593271473, 9783937514123, 1593271476, 3937514120 download
48 pages
LRX 42a
No ratings yet
LRX 42a
17 pages
Smu02b User Manual v200r001c00 02 PDF
No ratings yet
Smu02b User Manual v200r001c00 02 PDF
209 pages
E Payment Thesis
100% (3)
E Payment Thesis
5 pages
94-0510-4-A Artemis Mk6 Quick Reference Guide
No ratings yet
94-0510-4-A Artemis Mk6 Quick Reference Guide
2 pages
Worksheet: MODULE 3-4 Direction: Multiple Choice: Write The Letter of The Correct Answer of The Following Questions
100% (1)
Worksheet: MODULE 3-4 Direction: Multiple Choice: Write The Letter of The Correct Answer of The Following Questions
2 pages
Resume Builder For Veterans
100% (1)
Resume Builder For Veterans
6 pages
What Is Log Sheet
No ratings yet
What Is Log Sheet
13 pages
HTML PHP
No ratings yet
HTML PHP
62 pages
100 Embedded Interview Questions
100% (1)
100 Embedded Interview Questions
107 pages
It Man Pages
No ratings yet
It Man Pages
69 pages
RTU-710H Wireless I/O Module With On-Off and Analog I/O: Claudiastr. 5 51149 Köln-Porz Germany
No ratings yet
RTU-710H Wireless I/O Module With On-Off and Analog I/O: Claudiastr. 5 51149 Köln-Porz Germany
81 pages
UIUX Project Report
No ratings yet
UIUX Project Report
38 pages
Bda 041
No ratings yet
Bda 041
35 pages
ROC800 Instruction Manual-132344
100% (1)
ROC800 Instruction Manual-132344
184 pages
20 01 2022 22 51 29 Net
No ratings yet
20 01 2022 22 51 29 Net
9 pages
Patching in Oracle
No ratings yet
Patching in Oracle
7 pages
DL_UNIT V (1)
No ratings yet
DL_UNIT V (1)
12 pages
User Manual: Heartsome Translation Studio Release 8
No ratings yet
User Manual: Heartsome Translation Studio Release 8
151 pages
AVEVA ERM-Design Integration User Guide For AVEVA Marine
100% (3)
AVEVA ERM-Design Integration User Guide For AVEVA Marine
213 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi

Uploaded by

Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi

Uploaded by

MIT CIDCO DATA MINING BSc(IT) SEM - VI

Prepared by : Asst.Prof. Rutuja Sontakke Page 1

DBMS VS DATA MINING :

Issues and Challenges in DM:

 Mining Methodology and User Interaction

Prepared by : Asst.Prof. Rutuja Sontakke Page 2

The following diagram describes the major issues.

Prepared by : Asst.Prof. Rutuja Sontakke Page 3

Prepared by : Asst.Prof. Rutuja Sontakke Page 4

Data Mining Applications

 Market Analysis and Management

 Corporate Analysis & Risk Management

Market Analysisand Management

 Cross Market Analysis − Data mining performs Association/correlations between product

 Determining Customer purchasing pattern − Data mining helps in determining customer

 Providing Summary Information − Data mining provides us various multidimensional

Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector −

 Competition − It involves monitoring competitors and market directions.

Prepared by : Asst.Prof. Rutuja Sontakke Page 5

Prepared by : Asst.Prof. Rutuja Sontakke Page 6

Current Trends Affecting DM:

Prepared by : Asst.Prof. Rutuja Sontakke Page 7

Basic Data mining Task:

Prepared by : Asst.Prof. Rutuja Sontakke Page 8

Mining of Frequent Patterns

 Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a

Prepared by : Asst.Prof. Rutuja Sontakke Page 9

Classification and Prediction

 Classification (IF-THEN) Rules

Data Mining Task Primitives

 Set of task relevant data to be mined. 

Prepared by : Asst.Prof. Rutuja Sontakke Page 10

 Background knowledge to be used in discovery process.

Interestingness measures and thresholds for pattern evaluation

Representation for visualizing the discovered patterns

Prepared by : Asst.Prof. Rutuja Sontakke Page 11

DM TECHNIQUES: Given in the classroom.

Theory questions on unit – I

Prepared by : Asst.Prof. Rutuja Sontakke Page 12

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.