Data Warehouse
Data Warehouse
Data Warehouse
Frequently data in Data Warehouses is heavily denormalised, summarised and/or stored in a dimensionbased model but this is not always required to achieve acceptable query response times
Data WarehouseSubject-Oriented
Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
6
OLAP/ Multi-dimensional analysis: Relational databases store data in a two dimensional format; tables of data represented by rows and columns. Multi-dimensional analysis, commonly referred to as OnLine Analytical Processing (OLAP), offer an extension to the relational model to provide a multi-dimensional view of the data. These tools allow users to drill down from summary data sets into the specific data underlying the summaries. Data Mining: The process of analyzing business data in the data warehouse to find unknown patterns or rules of information that you can use to tailor business operations. Data mining predicts future trends and behaviors, allowing businesses to make proactive, knowledge driven decisions.
What is OLAP
Basic idea: converting data into information that decision makers need
Concept to analyze data by multiple dimension in a structure called data cube
Advantages
The data warehouse addresses these factors and provides many advantages including: Improved end-user access to a wide variety of data Increased data consistency Additional documentation of the data Potentially lower computing costs and increased productivity Providing a place to combine related data from separate sources Creation of a computing infrastructure that can support changes in computer systems and business structures Empowering end-users to perform any level of ad-hoc queries or reports without impacting the performance of the operational systems
Fraud detection
Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?
Data mining
Process of semi-automatically analyzing large databases to find patterns that are:
valid: hold on new data with some certainity novel: non-obvious to the system useful: should be possible to act on the item understandable: humans should be able to interpret the pattern
Applications
Banking: loan/credit card approval
predict good customers based on old customers
Targeted marketing:
identify likely responders to promotions
Applications (continued)
Medicine: disease outcome, effectiveness of treatments
analyze patient disease history: find relationship between diseases