Adm Unit - 1
Adm Unit - 1
B.Swathi
Assistant professor
Sr university
Fundamental Of Data Mining
➢The process of extracting information to
identify patterns, trends, and useful data that
would allow the business to take the data-
driven decision from huge sets of data is called
“Data Mining
Fundamental Of Data Mining
➢Data mining is one of the most useful techniques
that help entrepreneurs, researchers, and individuals
to extract valuable information from huge sets of
data. Data mining is also called Knowledge
Discovery in Database (KDD).
➢ Marketing / Retail
• Data mining helps marketing companies build models based on historical
data to predict who will respond to the new marketing campaigns such as direct
mail, online marketing campaign…etc.
➢ Finance / Banking
• Data mining gives financial institutions information about loan information
and credit reporting.
➢ Manufacturing
• By applying data mining in operational engineering data, manufacturers can
detect faulty equipment and determine optimal control parameters.
➢ Governments
• Data mining helps government agencies by digging and analyzing records
of financial transaction to build patterns that can detect money laundering or
criminal activities.
Disadvantages of data mining
➢ Privacy Issues
• The concerns about personal privacy have been increasing
enormously recently especially when the internet is booming with
social networks, e-commerce, forums, blogs….
➢ Security issues
• Security is a big issue. Businesses own information about their
employees and customers including social security numbers,
birthdays, payroll and etc
➢ Misuse of information/inaccurate information
• Information is collected through data mining intended for ethical
purposes can be misused. This information may be exploited by
unethical people or businesses to take the benefits of vulnerable
people or discriminate against a group of people.
The KDD Process
➢ .
Steps in KDD
1. Data Cleaning: Data cleaning is defined as removal of noisy
and irrelevant data from collection. Cleaning in case of Missing
values.
• Cleaning noisy data, where noise is a random or variance error.
• Cleaning with Data discrepancy detection and Data
transformation tools
2. Data Integration: Data integration is defined as heterogeneous data
from multiple sources combined in a common source (Data
Warehouse) . Data integration using Data Migration tools.
• Data integration using Data Synchronization tools.
• Data integration using ETL(Extract-Transformation-Load) process.
Steps in KDD
3. Data Selection: Data selection is defined as the process where data
relevant to the analysis is decided and retrieved from the data
collection. Data selection using Neural network.
• Data selection using Decision Trees.
• Data selection using Naive bayes.
• Data selection using Clustering, Regression, etc.
11 200 0 35 11 200 0
Noisy Data
• Noisy data is a meaningless data that can’t be interpreted by
machines. It can be generated due to faulty data collection, data
entry errors etc.
• Binning Method: This method works on sorted data in order to
smooth it. The whole data is divided into segments of equal
size and then various methods are performed to complete the
task.
Regression
• Example:
Discretization in data mining
Example:
Feature selection
• Feature selection refers to the process of reducing the inputs
for processing and analysis, or of finding the most meaningful
inputs. A related term, feature engineering (or feature
extraction), refers to the process of extracting useful
information or features from existing data.
Feature construction
Feature construction involves transforming a given set of input features
to generate a new set of more power ful features which are then used for
prediction. This may be done either to compress the dataset by reducing the
number of features or to improve the prediction performance.