0% found this document useful (0 votes)
54 views

Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi

The document provides an overview of data mining. It defines data mining as extracting usable information from large raw datasets using software to analyze patterns. Data mining is used in business to gain insights into customers and optimize resource usage. It involves collecting, storing, and processing data using algorithms to segment data and predict future events. The document also discusses data mining applications in market analysis, corporate risk management, and fraud detection.

Uploaded by

Sonal Bachhao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi

The document provides an overview of data mining. It defines data mining as extracting usable information from large raw datasets using software to analyze patterns. Data mining is used in business to gain insights into customers and optimize resource usage. It involves collecting, storing, and processing data using algorithms to segment data and predict future events. The document also discusses data mining applications in market analysis, corporate risk management, and fraud detection.

Uploaded by

Sonal Bachhao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

MIT CIDCO DATA MINING BSc(IT) SEM - VI

UNIT -I
What is Data Mining?
 In simple words, data mining is defined as a process used to extract usable data from a larger set
of any raw data.
 It implies analyzing data patterns in large batches of data using one or more software. Data
mining has applications in multiple fields, like science and research.
 As an application of data mining, businesses can learn more about their customers and develop
more effective strategies related to various business functions and in turn leverage resources in a
more optimal and insightful manner.
 This helps businesses be closer to their objective and make better decisions.
 Data mining involves effective data collection and warehousing as well as computer processing.
 For segmenting the data and evaluating the probability of future events, data mining uses
sophisticated mathematical algorithms.
 Data mining is also known as Knowledge Discovery in Data (KDD).
 Key features of data mining:
1. Automatic pattern predictions based on trend and behavior analysis.
2. Prediction based on likely outcomes.
3. Creation of decision-oriented information.
4. Focus on large data sets and databases for analysis.
5. Clustering based on finding and visually documented groups of facts not previously known.



Definition:
 Data Mining is defined as extracting information from huge sets of data. In other words, we can
say that data mining is the procedure of mining knowledge from data.

Prepared by : Asst.Prof. Rutuja Sontakke Page 1


MIT CIDCO DATA MINING BSc(IT) SEM - VI

DBMS VS DATA MINING :


 A DBMS (Database Management System) is a complete system used for managing digital
databases that allows storage of database content, creation/maintenance of data, search and other
functionalities.
 On the other hand, Data Mining is a field in computer science, which deals with the extraction of
previously unknown and interesting information from raw data.
 Usually, the data used as the input for the Data mining process is stored in databases.
 Users who are inclined toward statistics use Data Mining. They utilize statistical models to look
for hidden patterns in data.
 Data miners are interested in finding useful relationships between different data elements, which
is ultimately profitable for businesses.
 DBMS is a full-fledged system for housing and managing a set of digital databases.
 However Data Mining is a technique or a concept in computer science, which deals with
extracting useful and previously unknown information from raw data.
 Most of the times, these raw data are stored in very large databases.
 Therefore Data miners use the existing functionalities of DBMS to handle, manage and even
preprocess raw data before and during the Data mining process.
 However, a DBMS system alone cannot be used to analyze data. But, some DBMS at present
have inbuilt data analyzing tools or capabilities.

Issues and Challenges in DM:


Data mining is not an easy task, as the algorithms used can get very complex and data is not always
available at one place. It needs to be integrated from various heterogeneous data sources. These factors
also create some issues. Here in this tutorial, we will discuss the major issues regarding −

 Mining Methodology and User Interaction


 Performance Issues
 Diverse Data Types Issues

Prepared by : Asst.Prof. Rutuja Sontakke Page 2


MIT CIDCO DATA MINING BSc(IT) SEM - VI

The following diagram describes the major issues.

OR

Prepared by : Asst.Prof. Rutuja Sontakke Page 3


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Prepared by : Asst.Prof. Rutuja Sontakke Page 4


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Data Mining Applications


Data mining is highly useful in the following domains −

 Market Analysis and Management

 Corporate Analysis & Risk Management

 Fraud Detection

Apart from these, data mining can also be used in the areas of production control, customer retention,
science exploration, sports, astrology, and Internet Web Surf-Aid

Market Analysisand Management


Listed below are the various fields of market where data mining is used −

 Customer Profiling − Data mining helps determine what kind of people buy what kind of
products.

 Identifying Customer Requirements − Data mining helps in identifying the best products for
different customers. It uses prediction to find the factors that may attract new customers. 

 Cross Market Analysis − Data mining performs Association/correlations between product


sales. 

 Target Marketing − Data mining helps to find clusters of model customers who share the same
characteristics such as interests, spending habits, income, etc.

 Determining Customer purchasing pattern − Data mining helps in determining customer


purchasing pattern.

 Providing Summary Information − Data mining provides us various multidimensional


summary reports.

Corporate Analysis and Risk Management

Data mining is used in the following fields of the Corporate Sector −

 Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction,
contingent claim analysis to evaluate assets. 

 Resource Planning − It involves summarizing and comparing the resources and spending. 

 Competition − It involves monitoring competitors and market directions.

Prepared by : Asst.Prof. Rutuja Sontakke Page 5


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In
fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.

DM Applications-Case Studies

Prepared by : Asst.Prof. Rutuja Sontakke Page 6


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Current Trends Affecting DM:

Prepared by : Asst.Prof. Rutuja Sontakke Page 7


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Basic Data mining Task:


Data mining deals with the kind of patterns that can be mined. On the basis of the kind of data to be
mined, there are two categories of functions involved in Data Mining −

 Descriptive
 Classification and Prediction
Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of
descriptive functions −

 Class/Concept Description
 Mining of Frequent Patterns 
 Mining of Associations
 Mining of Correlations
 Mining of Clusters

Prepared by : Asst.Prof. Rutuja Sontakke Page 8


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Class/Concept Description
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a
company, the classes of items for sales include computer and printers, and concepts of customers
include big spenders and budget spenders. Such descriptions of a class or a concept are called
class/concept descriptions. These descriptions can be derived by the following two ways −

 Data Characterization − This refers to summarizing data of class under study. This class under
study is called as Target Class. 

 Data Discrimination − It refers to the mapping or classification of a class with some predefined
group or class. 

Mining of Frequent Patterns


Frequent patterns are those patterns that occur frequently in transactional data. Here is the list of kind of
frequent patterns −

 Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk
and bread. 

 Frequent Subsequence − A sequence of patterns that occur frequently such as purchasing a


camera is followed by memory card.

 Frequent Sub Structure − Substructure refers to different structural forms, such as graphs,
trees, or lattices, which may be combined with item-sets or subsequences. 

Mining of Association
Associations are used in retail sales to identify patterns that are frequently purchased together. This
process refers to the process of uncovering the relationship among data and determining association
rules.

For example, a retailer generates an association rule that shows that 70% of time milk is sold with
bread and only 30% of times biscuits are sold with bread.

Mining of Correlations
It is a kind of additional analysis performed to uncover interesting statistical correlations between
associated-attribute-value pairs or between two item sets to analyze that if they have positive, negative
or no effect on each other.

Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects
that are very similar to each other but are highly different from the objects in other clusters.

Prepared by : Asst.Prof. Rutuja Sontakke Page 9


MIT CIDCO DATA MINING BSc(IT) SEM - VI

Classification and Prediction


Classification is the process of finding a model that describes the data classes or concepts. The purpose
is to be able to use this model to predict the class of objects whose class label is unknown. This derived
model is based on the analysis of sets of training data. The derived model can be presented in the
following forms −

 Classification (IF-THEN) Rules


 Decision Trees
 Mathematical Formulae
 Neural Networks
The list of functions involved in these processes are as follows −

 Classification − It predicts the class of objects whose class label is unknown. Its objective is to
find a derived model that describes and distinguishes data classes or concepts. The Derived
Model is based on the analysis set of training data i.e. the data object whose class label is well
known. 

 Prediction − It is used to predict missing or unavailable numerical data values rather than class
labels. Regression Analysis is generally used for prediction. Prediction can also be used for
identification of distribution trends based on available data.

 Outlier Analysis − Outliers may be defined as the data objects that do not comply with the
general behavior or model of the data available. 

 Evolution Analysis − Evolution analysis refers to the description and model regularities or
trends for objects whose behavior changes over time. 

Data Mining Task Primitives

 We can specify a data mining task in the form of a data mining query. 
 This query is input to the system. 
 A data mining query is defined in terms of data mining task primitives.
Note − These primitives allow us to communicate in an interactive manner with the data mining
system. Here is the list of Data Mining Task Primitives −

 Set of task relevant data to be mined. 


 Kind of knowledge to be mined. 

Prepared by : Asst.Prof. Rutuja Sontakke Page 10


MIT CIDCO DATA MINING BSc(IT) SEM - VI

 Background knowledge to be used in discovery process.


 Interestingness measures and thresholds for pattern evaluation. 
 Representation for visualizing the discovered patterns.
Set of task relevant data to be mined
This is the portion of database in which the user is interested. This portion includes the following −

 Database Attributes
 Data Warehouse dimensions of interest
Kind of knowledge to be mined
It refers to the kind of functions to be performed. These functions are −

 Characterization
 Discrimination
 Association and Correlation Analysis
 Classification
 Prediction
 Clustering
 Outlier Analysis
 Evolution Analysis
Background knowledge
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the
Concept hierarchies are one of the background knowledge that allows data to be mined at multiple
levels of abstraction.

Interestingness measures and thresholds for pattern evaluation


This is used to evaluate the patterns that are discovered by the process of knowledge discovery. There
are different interesting measures for different kind of knowledge.

Representation for visualizing the discovered patterns


This refers to the form in which discovered patterns are to be displayed. These representations may
include the following. −

 Rules
 Tables

Prepared by : Asst.Prof. Rutuja Sontakke Page 11


MIT CIDCO DATA MINING BSc(IT) SEM - VI

 Charts
 Graphs
 Decision Trees
 Cubes

DM TECHNIQUES: Given in the classroom.

Theory questions on unit – I


1. What is data miming? Explain in brief.
2. Discuss different data mining tasks.
3. Give brief account of data mining techniques.
4. What is clustering of data? Distinguish supervised and unsupervised
classifications.
5. What are the different application areas of data mining?

Prepared by : Asst.Prof. Rutuja Sontakke Page 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy