0% found this document useful (0 votes)

22 views

Adm Unit - 1

Uploaded by

anuragthippani8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Adm Unit - 1

Uploaded by

anuragthippani8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

UNIT-1

Introduction of data mining

B.Swathi
Assistant professor
Sr university
Fundamental Of Data Mining
➢The process of extracting information to
identify patterns, trends, and useful data that
would allow the business to take the data-
driven decision from huge sets of data is called
“Data Mining
Fundamental Of Data Mining
➢Data mining is one of the most useful techniques
that help entrepreneurs, researchers, and individuals
to extract valuable information from huge sets of
data. Data mining is also called Knowledge
Discovery in Database (KDD).

➢The knowledge discovery process includes Data

cleaning, Data integration, Data selection, Data
transformation, Data mining, Pattern evaluation, and
Knowledge presentation
Fundamental Of Data Mining
➢Data Mining is a process used by organizations to
extract specific data from huge databases to solve
business problems. It primarily turns raw data into
useful information.

➢The biggest challenge is to analyze the data to

extract important information that can be used to
solve a problem or for company development. There
are many powerful instruments and techniques
available to mine data and find better insight from it
data mining process
type of data mining:

➢ Relational Database: A relational database is a collection of

multiple data sets formally organized by tables, records, and columns
from which data can be accessed in various ways.

➢ Data warehouses: The huge amount of data comes from multiple

places such as Marketing and Finance. The extracted data is utilized
for analytical purposes and helps in decision- making for a business
organization.

➢ Data Repositories:The Data Repository generally refers to a

destination for data storage.

➢ For example, a group of databases, where an organization has kept

various kinds of information
type of data mining
➢Object-Relational Database: A combination of an
object-oriented database model and relational
database model is called an object-relational model.
It supports Classes, Objects, Inheritance, etc.
➢for example, C++, Java, C#, and so on.

➢Transactional Database: A transactional database

refers to a database management system (DBMS)
that has the potential to undo a database transaction
if it is not performed appropriately
Advantages of Data Mining

➢ Marketing / Retail
• Data mining helps marketing companies build models based on historical
data to predict who will respond to the new marketing campaigns such as direct
mail, online marketing campaign…etc.
➢ Finance / Banking
• Data mining gives financial institutions information about loan information
and credit reporting.
➢ Manufacturing
• By applying data mining in operational engineering data, manufacturers can
detect faulty equipment and determine optimal control parameters.
➢ Governments
• Data mining helps government agencies by digging and analyzing records
of financial transaction to build patterns that can detect money laundering or
criminal activities.
Disadvantages of data mining

➢ Privacy Issues
• The concerns about personal privacy have been increasing
enormously recently especially when the internet is booming with
social networks, e-commerce, forums, blogs….
➢ Security issues
• Security is a big issue. Businesses own information about their
employees and customers including social security numbers,
birthdays, payroll and etc
➢ Misuse of information/inaccurate information
• Information is collected through data mining intended for ethical
purposes can be misused. This information may be exploited by
unethical people or businesses to take the benefits of vulnerable
people or discriminate against a group of people.
The KDD Process

➢ .
Steps in KDD
1. Data Cleaning: Data cleaning is defined as removal of noisy
and irrelevant data from collection. Cleaning in case of Missing
values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data
transformation tools
2. Data Integration: Data integration is defined as heterogeneous data
from multiple sources combined in a common source (Data
Warehouse) . Data integration using Data Migration tools.
• Data integration using Data Synchronization tools.
• Data integration using ETL(Extract-Transformation-Load) process.
Steps in KDD
3. Data Selection: Data selection is defined as the process where data
relevant to the analysis is decided and retrieved from the data
collection. Data selection using Neural network.
• Data selection using Decision Trees.
• Data selection using Naive bayes.
• Data selection using Clustering, Regression, etc.

4. Data Transformation: Data Transformation is defined as the process

of transforming data into appropriate form required by mining
procedure. Data Transformation is a two step process:
• Data Mapping: Assigning elements from source base to destination
to capture transformations.
• Code generation: Creation of the actual transformation program.
Steps in KDD
5. Data Mining: Data mining is defined as clever techniques that are applied
to extract patterns potentially useful. Transforms task relevant data into
patterns.
• Decides purpose of model using classification or characterization.
6. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly
increasing patterns representing knowledge based on given
measures. Find interestingness score of each pattern.
• Uses summarization and Visualization to make data understandable by
user.
7. Knowledge representation: Knowledge representation is defined as
technique which utilizes visualization tools to represent data mining
results. Generate reports.
• Generate tables.
• Generate discriminant rules, classification rules, characterization rules,
etc.
Data Mining Architecture

• A data-warehouse is a heterogeneous collection of

different data sources organized under a unified
schema.
• There are 2 approaches for constructing data-
warehouse:
• Top-down approach
• Bottom-up approach
Data Mining Architecture
Data Mining Architecture
• External Sources –
External source is a source from where data is collected irrespective
of the type of data. Data can be structured, semi structured and
unstructured as well.
Stage Area –
Since the data, extracted from the external sources does not follow a
particular format, so there is a need to validate this data to load into
data warehouse
For this purpose, it is recommended to use ETL tool.
E(Extracted): Data is extracted from External data source.

• T(Transform): Data is transformed into the standard format.

• L(Load): Data is loaded into data warehouse after transforming it into

the standard format.
• Data Marts –
Data mart is also a part of storage component.
It stores the information of a particular
function of an organization which is handled by
single authority. There can be as many number
of data marts in an organization depending
upon the functions. We can also say that data
mart contains subset of the data stored in data
warehouse.
Data Mining Architecture
Data Mining Functionalities
• Association Analysis − It analyses the set of items
that generally occur together in a transactional dataset.
• Classification
• Prediction
• Clustering
• Outlier analysis − Outliers are data elements that
cannot be grouped in a given class or cluster
• Evolution analysis − It defines the trends for objects
whose behaviour changes over some time.
Data mining Task Primitive
• The use of data mining task primitives can provide a modular
and reusable approach, which can improve the performance,
efficiency, and understandability of the data mining process
Data mining Task Primitive
• The set of task relevant data to be mined: It
refers to the specific data that is relevant and
necessary for a particular task or analysis being
conducted using data mining techniques
• Example: Extracting the database name,
database tables, and relevant required attributes
from the dataset from the provided input
database.
Data mining Task Primitive
• Kind of knowledge to be mined: It refers to
the type of information or insights that are
being sought through the use of data mining
techniques.
• example, It determines the task to be
performed on the relevant data in order to mine
useful information such as classification,
clustering, prediction, discrimination, outlier
detection, and correlation analysis.
Data mining Task Primitive
• Background knowledge to be used in the
discovery process: It refers to any prior
information or understanding that is used to
guide the data mining process.
• example, The use of background knowledge
such as concept hierarchies, and user beliefs
about relationships in data in order to evaluate
and perform more efficiently.
Data mining Task Primitive
• interestingness measures and thresholds for
pattern evaluation: It refers to the methods and
criteria used to evaluate the quality and relevance of
the patterns or insights discovered through data
mining
• example: Evaluating the interestingness and
interestingness measures such as utility, certainty,
and novelty for the data and setting an appropriate
threshold value for the pattern evaluation
Data mining Task Primitive
• Representation for visualizing the discovered
pattern: It refers to the methods used to represent
the patterns Visualization techniques such as charts,
graphs, and maps are commonly used to represent
the data

• example Presentation and visualization of

discovered pattern data using various visualization
techniques such as barplot, charts, graphs, tables,
etc.
Major issues in Data Mining
Data processing in Data Mining
• Data Preprocessing
• Data preprocessing is the process of transforming raw data into an understandable
data.
• It is also an important step in data mining as we cannot work with raw data.
• Why is Data preprocessing important?
• Preprocessing of data is mainly to check the data quality. The quality can be
checked by the following
• Accuracy: To check whether the data entered is correct or not.
• Completeness: To check whether the data is available or not recorded.
• Consistency: To check whether the same data is kept in all the places that do or do
not match.
• Timeliness: The data should be updated correctly.
• Believability: The data should be trustable.
• Interpretability: The understandability of the data.
Major Tasks in Data Preprocessing
• Major Tasks in Data Preprocessing:
• Data cleaning
• Data integration
• Data reduction
• Data transformation
Major Tasks in Data Preprocessing
Major Tasks in Data Preprocessing
Major Tasks in Data Preprocessing
Data cleaning in Data Mining
• Data cleaning is the process of preparing raw data for analysis by removing bad
data, organizing the raw data, and filling in the null values. Ultimately, cleaning data
prepares the data for the process of data mining when the most valuable information
can be pulled from the data set .
1. Missing Data
a).Ignore the tuples:
This approach is suitable only when the dataset we have is
quite large and multiple values are missing within a tuple.
Missing Data
b).Fill the Missing values:
There are various ways to do this task. You can
choose to fill the missing values manually.
Age Experie Salary purchas Age Experien Salary purchase
nce ed ce d
25 50 0 25 1 50 0
27 3 1 27 3 80 1
29 5 110 1 29 5 110 1
31 7 140 0 31 7 140 0
33 9 170 1 33 9 170 1

11 200 0 35 11 200 0
Noisy Data
• Noisy data is a meaningless data that can’t be interpreted by
machines. It can be generated due to faulty data collection, data
entry errors etc.
• Binning Method: This method works on sorted data in order to
smooth it. The whole data is divided into segments of equal
size and then various methods are performed to complete the
task.
Regression

Here data can be made smooth by fitting it to a regression

function. The regression used may be linear (having one
independent variable) or multiple (having multiple independent
variables).
Clustering:

This approach groups the similar data in a

cluster. The outliers may be undetected or it
will fall outside the clusters.
Data Transformation in Data Mining

Normalization: is used to scale the data of an attribute

so that it falls in a smaller range, such as -1.0 to 1.0 or
0.0 to 1.0.
It is generally useful for classification algorithms. this
may lead to poor data models while performing data
mining operations. So they are normalized to bring all
the attributes on the same scale.
Example
Methods of Data Normalization

• Methods of Data Normalization

1).Decimal Scaling
2).Min-Max Normalization
3).z-Score Normalization(zero-mean
Normalization)
Decimal Scaling Method
It normalizes by moving the decimal point of values of the data.
To normalize the data by this technique, we divide each value of
the data by the maximum absolute value of data. The data value,
vi, of data is normalized to vi‘ by using the formula below :

where j is the smallest integer such that max(|vi‘|)<1.

Example –
Let the input data is: -10, 201, 301, -401, 501, 601, 701
To normalize the above data,
Step 1: Maximum absolute value in given data(m): 701
Step 2: Divide the given data by 1000 (i.e j=3)
Result: The normalized data is: -0.01, 0.201, 0.301, -0.401,
0.501, 0.601, 0.701
Min-max normalization

Min-max normalization - performs a linear transformation

on the original data.
Suppose that minA and maxA are the minimum and
maximum values of an attribute A.
Min-max normalization maps a value v of A to v1 in the
range [new minA; new maxA] by computing.
Example : Min-max normalization

• Suppose the minimum and maximum values

for the attribute income are $12,000 and
$98,000 To map income to the range [0.0,1.0]
• By min-max normalization , a value of
$73,600 for income is transformed to
Z-Score Normalization
• In z-score normalization (or zero-mean
normalization) the values of an attribute (A),
are normalized based on the mean of A and its
standard deviation
• A value, v, of attribute A is normalized to v’ by
computing.
Example : Z-Score Normalization

Suppose the mean and standard deviation of the

values for the attribute income are $54,000 and
$16,000 respectively.
With Z-Score normalization , a value of $73,600
for income is transformed to
Attribute Selection:
• Attribute Selection:
In this strategy, new attributes are constructed
from the given set of attributes to help the
mining process.
•
Discretization:
• Discretization:
This is done to replace the raw values of
numeric attribute by interval levels or
conceptual levels.
Concept Hierarchy Generation
• Concept Hierarchy Generation:
Here attributes are converted from lower level
to higher level in hierarchy. For Example-The
attribute “city” can be converted to “country”.
Data Reduction in Data Mining

• we uses data reduction technique. It aims to increase the

storage efficiency and reduce data storage and analysis costs.
• Data Cube Aggregation:
Aggregation operation is applied to data for the construction of
the data cube.
•
• Attribute Subset Selection: Attribute subset Selection is a
technique which is used for data reduction in data mining
process. Data reduction reduces the size of data so that it can be
used for analysis purposes more efficiently.
• Methods of Attribute Subset Selection-
1. Stepwise Forward Selection.
2. Stepwise Backward Elimination.
3. Combination of Forward Selection and Backward
Elimination.
4. Decision Tree Induction
• Stepwise Forward Selection: This procedure start
with an empty set of attributes as the minimal set.
• Initial attribute Set: {X1, X2, X3, X4, X5, X6} Initial
reduced attribute set: { }
• Step-1: {X1}
• Step-2: {X1, X2}
• Step-3: {X1, X2, X5}
• Final reduced attribute set: {X1, X2, X5}
• Stepwise Backward Elimination: Here all the
attributes are considered in the initial set of
attributes
• Initial attribute Set: {X1, X2, X3, X4, X5, X6} Initial
reduced attribute set: {X1, X2, X3, X4, X5, X6 }
• Step-1: {X1, X2, X3, X4, X5}
• Step-2: {X1, X2, X3, X5}
• Step-3: {X1, X2, X5}
• Final reduced attribute set: {X1, X2, X5}
• Combination of Forward Selection and
Backward Elimination: The stepwise forward
selection and backward elimination are
combined so as to select the relevant
attributes most efficiently. This is the most
common technique which is generally used for
attribute selection.
Decision Tree Induction
• Decision Tree Induction: This approach uses
decision tree for attribute selection. It
constructs a flow chart like structure having
nodes denoting a test on an attribute.
Numerosity Reduction
• Numerosity Reduction is a data reduction
technique which replaces the original data by
smaller form of data representation.
Dimensionality Reduction:
• Dimensionality reduction is the process in which we reduced
the number of unwanted variables, attributes, and.
Dimensionality reduction is a very important stage of data pre-
processing.

• Example:
Discretization in data mining

• Data discretization refers to a method of converting a huge number of data

values into smaller ones so that the evaluation and management of data
become easy.
• There are two forms of data discretization
• first is supervised discretization,
• second is unsupervised discretization.
• Supervised discretization refers to a method in which the class data is used.
• Unsupervised discretization refers to a method depending upon the way
which operation proceeds.
• It means it works on the top-down splitting strategy and bottom-up merging
strategy.
Example

• Suppose we have an attribute of Age with the

given values.
Feature Extraction
The process of feature extraction is useful when you need to reduce
the number of resources needed for processing without losing
important or relevant information. Feature Extraction aims to reduce
the number of features in a dataset by creating new features
from the existing ones (and then discarding the original features).
These new reduced set of features should then be able to summarize
most of the information contained in the original set of features.

Example:
Feature selection
• Feature selection refers to the process of reducing the inputs
for processing and analysis, or of finding the most meaningful
inputs. A related term, feature engineering (or feature
extraction), refers to the process of extracting useful
information or features from existing data.
Feature construction
Feature construction involves transforming a given set of input features
to generate a new set of more power ful features which are then used for
prediction. This may be done either to compress the dataset by reducing the
number of features or to improve the prediction performance.

горелка
No ratings yet
горелка
192 pages
PP-PI Configuration Steps
50% (2)
PP-PI Configuration Steps
15 pages
CPHQ Exam Practice 3
100% (4)
CPHQ Exam Practice 3
13 pages
Module-1 DM
No ratings yet
Module-1 DM
15 pages
DM Module1
No ratings yet
DM Module1
15 pages
New Note
No ratings yet
New Note
23 pages
L_1 Data Mining
No ratings yet
L_1 Data Mining
17 pages
Data Minng
No ratings yet
Data Minng
20 pages
KM Notes Unit-3
No ratings yet
KM Notes Unit-3
20 pages
CS-DM MODULE -1
No ratings yet
CS-DM MODULE -1
27 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Unit 1 Data Warehouse and Data Mining
No ratings yet
Unit 1 Data Warehouse and Data Mining
13 pages
data mining introduction
No ratings yet
data mining introduction
52 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining: The Basic Concept
No ratings yet
Data Mining: The Basic Concept
23 pages
full and correct notes for FDS-6th bca
No ratings yet
full and correct notes for FDS-6th bca
83 pages
Lesson 1
No ratings yet
Lesson 1
32 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
DWM 4
No ratings yet
DWM 4
23 pages
Dadm (1) Sidra
No ratings yet
Dadm (1) Sidra
9 pages
Data mining
No ratings yet
Data mining
8 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
Notes DATA MINING MBA III
No ratings yet
Notes DATA MINING MBA III
8 pages
Lps Week 16 Iatb
No ratings yet
Lps Week 16 Iatb
5 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
DATA MINING UNIT-1
No ratings yet
DATA MINING UNIT-1
59 pages
Data Mining
No ratings yet
Data Mining
15 pages
data mining
No ratings yet
data mining
4 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DMDW
No ratings yet
DMDW
287 pages
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
Data Mining Unit 1(Msc Ds 3 Sem)
No ratings yet
Data Mining Unit 1(Msc Ds 3 Sem)
119 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Unit 1 Datamining
No ratings yet
Unit 1 Datamining
16 pages
DM NOTES
No ratings yet
DM NOTES
193 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
Unit 1
No ratings yet
Unit 1
27 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Advance Database With Lab: Professor & Head (Department of Software Engineering)
No ratings yet
Advance Database With Lab: Professor & Head (Department of Software Engineering)
5 pages
BDUD unit1
No ratings yet
BDUD unit1
100 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
89 pages
Data Mining
No ratings yet
Data Mining
11 pages
Unit 2 (DWDM)
No ratings yet
Unit 2 (DWDM)
40 pages
DM Mod 1
No ratings yet
DM Mod 1
17 pages
Motivation of Data Mining
No ratings yet
Motivation of Data Mining
4 pages
dm mod1
No ratings yet
dm mod1
29 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
DM Material
No ratings yet
DM Material
98 pages
Unit 1
No ratings yet
Unit 1
11 pages
Chap 1
No ratings yet
Chap 1
32 pages
Software
No ratings yet
Software
93 pages
Unit II Data Mining
No ratings yet
Unit II Data Mining
8 pages
module 1
No ratings yet
module 1
41 pages
Data Mining
No ratings yet
Data Mining
14 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Literature Review On Communication Strategy
100% (1)
Literature Review On Communication Strategy
7 pages
Review Related Literature Record Management System
No ratings yet
Review Related Literature Record Management System
6 pages
Managing Information Technology
No ratings yet
Managing Information Technology
18 pages
L6 CTSE STO023 EPA AG Issue 2.1 - New Amended Version of Apprentice Handbook (6) (Part Version - Extracts Only)
No ratings yet
L6 CTSE STO023 EPA AG Issue 2.1 - New Amended Version of Apprentice Handbook (6) (Part Version - Extracts Only)
7 pages
03-Defining and Refining The Problem
100% (1)
03-Defining and Refining The Problem
26 pages
Mass Media Research: An Introduction - 9th Edition
No ratings yet
Mass Media Research: An Introduction - 9th Edition
18 pages
SMD PowerInductors
No ratings yet
SMD PowerInductors
87 pages
Cross Cultural Communication Term Paper
100% (1)
Cross Cultural Communication Term Paper
7 pages
Swagger - Urban Youth Culture Consumption and Social Positioning Open Access
No ratings yet
Swagger - Urban Youth Culture Consumption and Social Positioning Open Access
27 pages
Gju Study Plan - Bida Business Intelligence and Data Analytics v5
No ratings yet
Gju Study Plan - Bida Business Intelligence and Data Analytics v5
31 pages
Performance of MNC at Amazon
No ratings yet
Performance of MNC at Amazon
66 pages
Causes of Reading Difficulties in English 2nd Language in Grade 4 at A School in Katima Mulilo Circuit
No ratings yet
Causes of Reading Difficulties in English 2nd Language in Grade 4 at A School in Katima Mulilo Circuit
10 pages
Value Engineering
71% (14)
Value Engineering
51 pages
Nursing Clinical Decision Making A Literature Review
No ratings yet
Nursing Clinical Decision Making A Literature Review
24 pages
A Project Report On Knowledge Management
No ratings yet
A Project Report On Knowledge Management
6 pages
Module
No ratings yet
Module
16 pages
Essentials of Public Health Communication (Essential Public Health) - ISBN 0763771155, 978-0763771157
100% (33)
Essentials of Public Health Communication (Essential Public Health) - ISBN 0763771155, 978-0763771157
23 pages
English Quarter 2 Week 1: Capsulized Self-Learning Empowerment Toolkit
No ratings yet
English Quarter 2 Week 1: Capsulized Self-Learning Empowerment Toolkit
10 pages
Value Stream Mapping
No ratings yet
Value Stream Mapping
5 pages
Computers in Human Behavior: Yu-Chen Chen, Rong-An Shang, Ming-Jin Li
No ratings yet
Computers in Human Behavior: Yu-Chen Chen, Rong-An Shang, Ming-Jin Li
13 pages
11.vol. 0002www - Iiste.org Call - For - Paper No. 2 - Editorial Board
No ratings yet
11.vol. 0002www - Iiste.org Call - For - Paper No. 2 - Editorial Board
5 pages
Certainty Risk and Uncertainty
No ratings yet
Certainty Risk and Uncertainty
4 pages
White Paper On Bio-Psycho-Social Applica PDF
No ratings yet
White Paper On Bio-Psycho-Social Applica PDF
120 pages
Sop Assignment 13
No ratings yet
Sop Assignment 13
18 pages
Concept Note - DAGYAW 2021 (As of August 2 - 3PM)
No ratings yet
Concept Note - DAGYAW 2021 (As of August 2 - 3PM)
8 pages
bio350_v5_wk1_lab_safety_lab_report_observational
No ratings yet
bio350_v5_wk1_lab_safety_lab_report_observational
3 pages
Congreso Ewbs Ultimo Conceptos Claves
No ratings yet
Congreso Ewbs Ultimo Conceptos Claves
83 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Adm Unit - 1

Uploaded by

Adm Unit - 1

Uploaded by

UNIT-1

Introduction of data mining

➢The knowledge discovery process includes Data

➢The biggest challenge is to analyze the data to

➢ Relational Database: A relational database is a collection of

➢ Data warehouses: The huge amount of data comes from multiple

➢ Data Repositories:The Data Repository generally refers to a

➢ For example, a group of databases, where an organization has kept

➢Transactional Database: A transactional database

4. Data Transformation: Data Transformation is defined as the process

• A data-warehouse is a heterogeneous collection of

• T(Transform): Data is transformed into the standard format.

• L(Load): Data is loaded into data warehouse after transforming it into

• example Presentation and visualization of

Here data can be made smooth by fitting it to a regression

This approach groups the similar data in a

Normalization: is used to scale the data of an attribute

• Methods of Data Normalization

where j is the smallest integer such that max(|vi‘|)<1.

Min-max normalization - performs a linear transformation

• Suppose the minimum and maximum values

Suppose the mean and standard deviation of the

• we uses data reduction technique. It aims to increase the

• Data discretization refers to a method of converting a huge number of data

• Suppose we have an attribute of Age with the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.