OLTP
OLTP
OLTP
and query processing is called On Line transaction Processing (OLTP) systems. Ex. Day to day operations of organizations, such as purchasing, inventory, manufacturing, banking, payroll registration, and accounting. OLTP System deals with operational data. Operational data are those data involved in the operation of a particular system. Example: In a banking System, you withdraw amount from your account. Then Account Number, Withdrawal amount, Available Amount, Balance Amount, Transaction Number etc are operational data elements. In an OLTP system data are frequently updated and queried. So quick response to a request is highly expected. Since the OLTP systems involves large number of update queries, the database tables are optimized for write operations. To prevent data redundancy and to prevent update anomalies the database tables are normalized. Normalization makes the write operation in the database tables more efficient. Operational data are usually of local relevance. It involves queries accessing individual tuple(individual record).These type of queries are termed as point queries. Examples for OLTP Queries: What is the Salary of Mr. John? Withdraw Money from Bank Account : It performs update operation if money is withdrawn from account. What is the address and email id of the person who is the head of maths department? What is OLAP? Online Analytical Processing System (OLAP) Basic idea: converting data into information that decision makers need Concept to analyze data by multiple dimension in a structure called data cube OLAP designates a category of applications and technologies that allows the collection, storage, manipulation and reproduction of multidimensional data, with the goal of analysis. History:In 1993, E. F. Codd came up with the term online analytical processing (OLAP) in his paper title Providing on-line analytical processing using user analysts the term OLAP seems perfect to describe databases designed to facilitate decision making (analysis) in an organization
Purpose of OLAP: To derive summarized information from large volume database To generate automated reports for human view
Examples for OLAP Queries: How is the profit changing over the years across different regions ? Is it financially viable to continue the production unit at location X?
What and Why OLAP? OLAP enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a variety of possible views of data. While OLAP systems can easily answer who? and what? questions, its ability is to answer what if? and why? type questions that distinguishes them from general-purpose query tools. The types of analysis available from OLAP range from basic navigation and browsing (referred to as slicing and dicing) , to calculations, to more complex analysis such as time series and complex modeling. OLAP Applications:Finance: Budgeting, activity-based costing, financial performance analysis, and financial modeling. Sales: Sales analysis and sales forecasting. Marketing: Market research analysis, sales forecasting, promotions analysis, customer analysis, and market/customer segmentation. Manufacturing: Production planning and defect analysis. OLAP Benefits:Increased productivity of business end-users, IT developers, and consequently the entire organization. Reduced backlog of applications development for IT staff by making end-users self-sufficient enough to make their own schema changes and build their own models. Retention of organizational control over the integrity of corporate data as OLAP applications are dependent on data warehouses and OLTP systems to refresh their source data level. Improved potential revenue and profitability by enabling the organization to respond more quickly to market demands. OLTP Online Transaction (Operational System) System OLAP Processing Online Analytical (Data Warehouse) System Processing
Source of data
Consolidation Operational data; OLTPs are the original source of the comes from data. Databases To control and run fundamental business tasks
Purpose of data
To help with planning, problem solving, and decision support Multi-dimensional views of various kinds of business activities
Short and fast inserts and updates initiated by end Periodic long-running batch jobs users refresh the data Relatively standardized and simple queries Returning Often complex queries involving relatively few records aggregations Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Queries
Processing Speed
Space Requirements
Database Design
Instead of regular backups, some Backup religiously; operational data is critical to run environments may consider simply Backup and Recovery the business, data loss is likely to entail significant reloading the OLTP data as a monetary loss and legal liability recovery method
Schema Pronounce skee-ma, the structure of a database system, described in a formal language supported by the database management system (DBMS). In a relational database, the schema defines the tables, the fields in each table, and the relationships between fields and tables. Schemas are generally stored in a data dictionary. Although a schema is defined in text database language, the term is often used to refer to a graphical depiction of the database structure.
Types of Schemas:In database: Hierarchical model Network model Relational model (RDBMS)
Star schema The star schema architecture is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of fact table and the points of the star are the dimension tables. Usually the fact tables in a star schema are in third normal form(3NF) whereas dimensional tables are de-normalized. Despite the fact that the star schema is the simplest architecture, it is most commonly used nowadays and is recommended by Oracle. In a relational database, denormalization is an approach to speeding up read performance (data retrieval) in which the administrator selectively adds back specific instances of redundant data after the data structure has been normalized. A denormalized database should not be confused with a database that has never been normalized.
Fact Tables:- A fact table typically has two types of columns: foreign keys to dimension tables and measures those that contain raw numeric items that represent relevant business facts. A fact table can contain fact's data on detail or aggregated level, so it tends to be very large. Dimension Tables:- A dimension table is a structure usually composed of one or more hierarchies that categorizes data. If a dimension hasn't got a hierarchies and levels it is called flat dimension or list. These tables are joined to the fact table using foreign key references. Dimension tables are generally small in size then fact table. Typical fact tables store data about sales while dimension tables data about geographic region(markets, cities) , customers, products, time.
Characteristics of star schema: Simple structure -> easy to understand schema Great query effectives -> small number of tables to join Relatively long time of loading data into dimension tables -> de-normalized The most commonly used in the data warehouse implementations -> widely supported by a large number of business intelligence tools
Snowflake schema:It is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. "Snowflaking" is a method of normalising the dimension tables in a star schema. When it is completely normalised along all the dimension tables, the resultant structure resembles a snowflake with the fact table in the middle. The principle behind snowflaking is normalisation of the dimension tables.
Snow-flake Schema
Star Schema
No redundancy and hence more easy to Has redundant data and hence less easy maintain and change to maintain/change
Ease of Use
More complex queries and hence less Less complex queries and easy to easy to understand understand
Query Performance
More foreign keys-and hence more Less no. of foreign keys and hence lesser query execution time query execution time
Type of Datawarehouse
Good to use for datawarehouse core to Good for datamarts with simplify complex relationships relationships (1:1 or 1:many) (many:many)
simple
Joins
Fewer Joins
Dimension table
It may have more than one dimension Contains only single dimension table for table for each dimension each dimension
When to use
When dimension table is relatively big When dimension table contains less in size, snowflaking is better as it number of rows, we can go for Star reduces space. schema.
Normalization/ Normalization
De-
Dimension Tables are in Normalized Both Dimension and Fact Tables are in form but Fact Table is still in DeDe-Normalized form Normalized form Bottom up approach Top down approach
Data model
Cube A cube is a multidimensional structure that contains information for analytical purposes; the main constituents of a cube are dimensions and measures. Dimensions define the structure of the cube that you use to slice and dice over, and measures provide aggregated numerical values of interest to the end user. As a logical structure, a cube allows a client application to
retrieve values, of measures, as if they were contained in cells in the cube; cells are defined for every possible summarized value. A cell, in the cube, is defined by the intersection of dimension members and contains the aggregated values of the measures at that specific intersection. Benefit of Using Cubes:- A cube provides a single place where all related data, for analysis, is stored. 3-D Cube:-
Example:-
Representation of Multi-Dimensional Data:OLAP database servers use multi-dimensional structures to store data and relationships between data. Multi-dimensional structures are best-visualized as cubes of data, and cubes within cubes of data. Each side of a cube is a dimension.
Multi-dimensional databases are a compact and easy-to-understand way of visualizing and manipulating data elements that have many inter-relationships. The cube can be expanded to include another dimension, for example, the number of sales staff in each city.
The response time of a multi-dimensional query depends on how many cells have to be added on-the-fly. As the number of dimensions increases, the number of cubes cells increases exponentially.
Multi-dimensional OLAP supports common analytical operations, such as: Consolidation: involves the aggregation of data such as roll-ups or complex expressions involving interrelated data. For example, branch offices can be rolled up to cities and rolled up to countries. Drill-Down: is the reverse of consolidation and involves displaying the detailed data that comprises the consolidated data. Slicing and dicing: refers to the ability to look at the data from different viewpoints. Slicing and dicing is often performed along a time axis in order to analyze trends and find patterns.
OLAP Implementation Multidimensional OLAP (MOLAP) Relational OLAP (ROLAP) Hybrid OLAP (HOLAP)
Multi-dimensional OLAP (MOLAP):- MOLAP tools use specialized data structures and multidimensional database management systems (MDDBMS) to organize, navigate, and analyze data. To enhance query performance the data is typically aggregated and stored according to predicted usage. MOLAP data structures use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management. The development issues associated with MOLAP: Only a limited amount of data can be efficiently stored and analyzed. Navigation and analysis of data are limited because the data is designed according to previously determined requirements. MOLAP products require a different set of skills and tools to build and maintain the database.
Relational OLAP (ROLAP):- ROLAP is the fastest-growing type of OLAP tools. ROLAP supports RDBMS products through the use of a metadata layer, thus avoiding the requirement to create a static multi-dimensional data structure. This facilitates the creation of multiple multi-dimensional views of the two-dimensional relation. To improve performance, some ROLAP products have enhanced SQL engines to support the complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema. The development issues associated with ROLAP technology: Performance problems associated with the processing of complex queries that require multiple passes through the relational data. Development of middleware to facilitate the development of multi-dimensional applications. Development of an option to create persistent multi-dimensional structures, together with facilities o assist in the administration of these structures.
HOLAP:- a hybrid of ROLAP and MOLAP can be thought of as a virtual database whereby the higher levels of the database are implemented as MOLAP and the lower levels of the database as ROLAP. HOLAP tools provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. HOLAP tools deliver selected data directly from DBMS or via MOLAP server to the desktop (or local server) in the form of data cube, where it is stored, analyzed, and maintained locally is the fastestgrowing type of OLAP tools.
The issues associated with HOLAP tools: The architecture results in significant data redundancy and may cause problems for networks that support many users. Ability of each user to build a custom data cube may cause a lack of data consistency among users. Only a limited amount of data can be efficiently maintained.
MOLAP (Multidimensional Online Analytical Processing) The MOLAP storage mode causes the aggregations of the partition and a copy of its source data to be stored in a multidimensional structure in Analysis Services when the partition is processed.
ROLAP (Relational Online Analytical Processing) The ROLAP storage mode causes the aggregations of the partition to be stored in indexed views in the relational database that was specified in the partitions data source.
This MOLAP structure is highly optimized to maximize query performance. The storage location can be on the computer where the partition is defined or on another computer running Analysis Services. Because a copy of the source data resides in the multidimensional structure, queries can be resolved without accessing the partitions source data. Query response times can be decreased substantially by using aggregations. The data in the partitions MOLAP structure is only as current as the most recent processing of the partition.
Unlike the MOLAP storage mode, ROLAP does not cause a copy of the source data to be stored in the Analysis Services data folders. Instead, when results cannot be derived from the query cache, the indexed views in the data source are accessed to answer queries.
HOLAP (Hybrid Online Analytical Processing) The HOLAP storage mode combines attributes of both MOLAP and ROLAP. Like MOLAP, HOLAP causes the aggregations of the partition to be stored in a multidimensional structure in an SQL Server Analysis Services instance. HOLAP does not cause a copy of the source data to be stored. For queries that access only summary data in the aggregations of a partition, HOLAP is the equivalent of MOLAP.
Query response is generally slower with ROLAP storage than with the MOLAP or HOLAP storage modes. Processing time is also typically slower with ROLAP. However, ROLAP enables users to view data in real time and can save storage
Queries that access source datafor example, if you want to drill down to an atomic cube cell for which there is no aggregation datamust retrieve data from the relational database and will not be as fast as they would be if
space when you are working with large datasets that are infrequently queried, such as purely historical data.
the source data were stored in the MOLAP structure. With HOLAP storage mode, users will typically experience substantial differences in query times depending upon whether the query can be resolved from cache or aggregations versus from the source data itself.