data mining
data mining
Ø Databases
Ø Text files
Ø Social networks
Ø Computer simulation
• Knowledge prediction
Uses known data to forecast future trends,
events for example, stock market predictions
Steps in data mining
1. Data Integration
This involves combining data residing in different sources and
providing users with a unified or combined view of these data.
2. Data Selection
3. Data cleaning
Data cleaning is the process of detecting and correcting corrupt or
inaccurate records from a set, table or database and refers to
identifying incomplete, incorrect, inaccurate or irrelevant part of data
and replacing, modifying or deleting the dirty data.
4. Data transformation
Data transformation converts a set of data values
from the data format of a source data system into
the data format of a destination data system
5. Data mining
Here techniques are applied to extract data or
patterns of interest of which decisions will be made.
6. Pattern evaluation
In Pattern evaluation patterns are identified and
analyzed based on given measures.
7. Knowledge presentation
This is the final phase in which the discovered
knowledge is visually represented to the user.
This phase uses understandable techniques to
help users understand and interpret the data
mining results.
Data mining diagram based on a
Knowledge Discovery in databases
Advantages of data mining
• Marketing or Retailing
Data mining helps marketing companies build
models based on historical data to predict who will
respond to the new marketing campaign etc.
t h r o u g h t h e r e s u l t s m a r ke t s w i l l h a v e a n
appropriate approach to selling profitable products
to target customers.
• Governments
Data mining helps government’s agencies by
digging and analyzing records of the financial
transaction t build patterns that can detect
money laundering or criminal activities.
Disadvantages of data mining
• Privacy issues
The use of the internet with social networks, e-commerce,
forums, blogs etc. raise a lot of privacy concerns, people are
afraid of their personal information is collected and used in an
unethical way that potentially causes them trouble.
• Security issues
Businesses own information about their employees and
customers including social security numbers, birthdays,
payroll etc. incase hackers access and steal the data of
customers so much personal information may lead to an
unsafe environment especially if the information obtained
involves finances.
• Misuse of information
Information may be exploited by unethical
people or businesses to take advantage of
vulnerable people or discriminate against a
group of people