Lecture Notes 1.1 & 1.2
Lecture Notes 1.1 & 1.2
Introduction
DEFINITION OF DATA MINING?
Data Mining is defined as extracting information from huge sets of data. In other words, we can say that
data mining is the procedure of mining knowledge from data. The information or knowledge extracted so
can be used for any of the following applications
• Market Analysis
• Fraud Detection
• Customer Retention
• Production Control
• Science Exploration
Major Sources of data: -
Business –Web, E-commerce, Transactions, Stocks - Science – Remote Sensing, Bio informatics,
Scientific Simulation - Society and Everyone – News, Digital Cameras, You Tube * Need for turning data
into knowledge – Drowning in data, but starving for knowledge.
Definition of Data Mining?
Extracting and ‘Mining’ knowledge from large amounts of data. “Gold Mining from rock or sand” is same
as “Knowledge mining from data”
Other terms for Data Mining:
Knowledge Mining
Knowledge Extraction o Pattern Analysis
▶ Data processing
▶ Data cleaning (remove noise and inconsistent data)
▶ Data integration (multiple data sources maybe combined)
▶ Data selection (data relevant to the analysis task are retrieved from database)
Data transformation (data transformed or consolidated into forms)
▶ appropriate for mining)
(Done with data preprocessing)
▶ Data mining (an essential process where intelligent methods are applied to extract
data patterns)
▶ Pattern evaluation (identify the truly interesting patterns)
▶ Knowledge presentation (mined knowledge is presented to the user with visualization or
representation techniques)
An ER data model represents the database as a set of entities and their relationships.
It is a comparison of the general features of target class data objects with the
general features of objects from one or a set of contrasting classes.
The target and contrasting classes can be specified by the user, and the
corresponding data objects retrieved through database queries.
MINING FREQUENT PATTERNS, ASSOCIATIONS, AND CORRELATIONS:
Frequent patterns, as the name suggests, are patterns that occur frequently in data. There are
many kinds of frequent patterns, including item sets, sub sequences, and substructures.
A frequent itemset typically refers to a set of items that frequently appear together in a
transactional data set, such as Computer and Software.
Example: Association analysis. Suppose, as a marketing manager of
AllElectronics, you would like to determine which items are frequently purchased together
within the same transactions.