Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi
Unit - I What Is Data Mining?: Mit Cidco Data Mining BSC (It) Sem - Vi
UNIT -I
What is Data Mining?
In simple words, data mining is defined as a process used to extract usable data from a larger set
of any raw data.
It implies analyzing data patterns in large batches of data using one or more software. Data
mining has applications in multiple fields, like science and research.
As an application of data mining, businesses can learn more about their customers and develop
more effective strategies related to various business functions and in turn leverage resources in a
more optimal and insightful manner.
This helps businesses be closer to their objective and make better decisions.
Data mining involves effective data collection and warehousing as well as computer processing.
For segmenting the data and evaluating the probability of future events, data mining uses
sophisticated mathematical algorithms.
Data mining is also known as Knowledge Discovery in Data (KDD).
Key features of data mining:
1. Automatic pattern predictions based on trend and behavior analysis.
2. Prediction based on likely outcomes.
3. Creation of decision-oriented information.
4. Focus on large data sets and databases for analysis.
5. Clustering based on finding and visually documented groups of facts not previously known.
Definition:
Data Mining is defined as extracting information from huge sets of data. In other words, we can
say that data mining is the procedure of mining knowledge from data.
OR
Fraud Detection
Apart from these, data mining can also be used in the areas of production control, customer retention,
science exploration, sports, astrology, and Internet Web Surf-Aid
Customer Profiling − Data mining helps determine what kind of people buy what kind of
products.
Identifying Customer Requirements − Data mining helps in identifying the best products for
different customers. It uses prediction to find the factors that may attract new customers.
Target Marketing − Data mining helps to find clusters of model customers who share the same
characteristics such as interests, spending habits, income, etc.
Finance Planning and Asset Evaluation − It involves cash flow analysis and prediction,
contingent claim analysis to evaluate assets.
Resource Planning − It involves summarizing and comparing the resources and spending.
Fraud Detection
Data mining is also used in the fields of credit card services and telecommunication to detect frauds. In
fraud telephone calls, it helps to find the destination of the call, duration of the call, time of the day or
week, etc. It also analyzes the patterns that deviate from expected norms.
DM Applications-Case Studies
Descriptive
Classification and Prediction
Descriptive Function
The descriptive function deals with the general properties of data in the database. Here is the list of
descriptive functions −
Class/Concept Description
Mining of Frequent Patterns
Mining of Associations
Mining of Correlations
Mining of Clusters
Class/Concept Description
Class/Concept refers to the data to be associated with the classes or concepts. For example, in a
company, the classes of items for sales include computer and printers, and concepts of customers
include big spenders and budget spenders. Such descriptions of a class or a concept are called
class/concept descriptions. These descriptions can be derived by the following two ways −
Data Characterization − This refers to summarizing data of class under study. This class under
study is called as Target Class.
Data Discrimination − It refers to the mapping or classification of a class with some predefined
group or class.
Frequent Item Set − It refers to a set of items that frequently appear together, for example, milk
and bread.
Frequent Sub Structure − Substructure refers to different structural forms, such as graphs,
trees, or lattices, which may be combined with item-sets or subsequences.
Mining of Association
Associations are used in retail sales to identify patterns that are frequently purchased together. This
process refers to the process of uncovering the relationship among data and determining association
rules.
For example, a retailer generates an association rule that shows that 70% of time milk is sold with
bread and only 30% of times biscuits are sold with bread.
Mining of Correlations
It is a kind of additional analysis performed to uncover interesting statistical correlations between
associated-attribute-value pairs or between two item sets to analyze that if they have positive, negative
or no effect on each other.
Mining of Clusters
Cluster refers to a group of similar kind of objects. Cluster analysis refers to forming group of objects
that are very similar to each other but are highly different from the objects in other clusters.
Classification − It predicts the class of objects whose class label is unknown. Its objective is to
find a derived model that describes and distinguishes data classes or concepts. The Derived
Model is based on the analysis set of training data i.e. the data object whose class label is well
known.
Prediction − It is used to predict missing or unavailable numerical data values rather than class
labels. Regression Analysis is generally used for prediction. Prediction can also be used for
identification of distribution trends based on available data.
Outlier Analysis − Outliers may be defined as the data objects that do not comply with the
general behavior or model of the data available.
Evolution Analysis − Evolution analysis refers to the description and model regularities or
trends for objects whose behavior changes over time.
We can specify a data mining task in the form of a data mining query.
This query is input to the system.
A data mining query is defined in terms of data mining task primitives.
Note − These primitives allow us to communicate in an interactive manner with the data mining
system. Here is the list of Data Mining Task Primitives −
Database Attributes
Data Warehouse dimensions of interest
Kind of knowledge to be mined
It refers to the kind of functions to be performed. These functions are −
Characterization
Discrimination
Association and Correlation Analysis
Classification
Prediction
Clustering
Outlier Analysis
Evolution Analysis
Background knowledge
The background knowledge allows data to be mined at multiple levels of abstraction. For example, the
Concept hierarchies are one of the background knowledge that allows data to be mined at multiple
levels of abstraction.
Rules
Tables
Charts
Graphs
Decision Trees
Cubes