The document discusses the seven main components of a data warehouse architecture: data sourcing tools, a metadata repository, database technology, data marts, applications and tools, administration and management, and an information delivery system. It describes each component in detail and explains their purpose and functions in building and managing the data warehouse environment. The metadata repository plays a central role in integrating, maintaining, and allowing users to understand and access the contents of the data warehouse.
The document discusses the seven main components of a data warehouse architecture: data sourcing tools, a metadata repository, database technology, data marts, applications and tools, administration and management, and an information delivery system. It describes each component in detail and explains their purpose and functions in building and managing the data warehouse environment. The metadata repository plays a central role in integrating, maintaining, and allowing users to understand and access the contents of the data warehouse.
The document discusses the seven main components of a data warehouse architecture: data sourcing tools, a metadata repository, database technology, data marts, applications and tools, administration and management, and an information delivery system. It describes each component in detail and explains their purpose and functions in building and managing the data warehouse environment. The metadata repository plays a central role in integrating, maintaining, and allowing users to understand and access the contents of the data warehouse.
The document discusses the seven main components of a data warehouse architecture: data sourcing tools, a metadata repository, database technology, data marts, applications and tools, administration and management, and an information delivery system. It describes each component in detail and explains their purpose and functions in building and managing the data warehouse environment. The metadata repository plays a central role in integrating, maintaining, and allowing users to understand and access the contents of the data warehouse.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 26
Data warehousing Components
Lecture-3,4,5
Dr. Shweta Sharma
School of Computing Information Technology Manipal University Jaipur India Data warehouse components Data warehouse Architecture and its seven components 1. Data sourcing, clean-up, transformation, and migration tools 2. Metadata repository 3. Warehouse/database technology 4. Data marts 5. Application & tools 6. Data warehouse administration and management 7. Information delivery system Data warehouse is an environment, not a product which is based on relational database management system that functions as the central repository for informational data. The central repository information is surrounded by number of key components designed to make the environment is functional, manageable and accessible. 1. Operational and external data (Data sourcing, clean-up, transformation, and migration tools) The data source for the data warehouse is coming from operational applications. The data entered into the data warehouse transformed into an integrated structure and format. The transformation process involves conversion, summarization, filtering, and condensation. The data warehouse must be capable of holding and managing large volumes of data as well as different structures of data structures over time. 2. Metadata repository Conti… • Metadata repository is an integral part of a data warehouse system. It has the following metadata − • Definition of data warehouse − It includes the description of structure of data warehouse. The description is defined by schema, view, hierarchies, derived data definitions, and data mart locations and contents. • Business metadata − It contains has the data ownership information, business definition, and changing policies. • Operational Metadata − It includes currency of data and data lineage. Currency of data means whether the data is active, archived, or purged. Lineage of data means the history of data migrated and transformation applied on it. • Data for mapping from operational environment to data warehouse − It includes the source databases and their contents, data extraction, data partition cleaning, transformation rules, data refresh and purging rules. • Algorithms for summarization − It includes dimension algorithms, data on granularity, aggregation, summarizing, etc. 3. Data warehouse DBMS It is used for maintaining, managing, and using the data warehouse. It is classified into two: 1. Technical Metadata: It contains information about data warehouse data used by warehouse designers, administrators to carry out development and management tasks. It includes, • Info about data stores • Transformation descriptions. • That is mapping methods from operational DB to warehouse DB • Warehouse Object and data structure definitions for target data. • The rules used to perform clean up, and data enhancement. • Data mapping operations. • Access authorization, backup history, archive history, info delivery history, data acquisition history, data access etc. Conti… 2. Business Metadata: It contains info that gives info stored in the data warehouse to users. It includes. • Subject areas, and info object type including queries, reports, images, video, audio clips etc. • Internet home pages. • Info related to info delivery system. • Data warehouse operational info such as ownerships, audit trails etc., • Meta data helps the users to understand the content and find the data. • Meta data are stored in separate data stores which is known as informational directory or Meta data repository which helps to integrate, maintain and view the contents of the data warehouse. Conti… The following lists the characteristics of info directory/ Metadata: • It is the gateway to the data warehouse environment. • It supports easy distribution and replication of content for high performance and availability. • It should be searchable by business-oriented keywords. • It should act as a launch platform for end users to access data and analysis tools It should support the sharing of info. • It should support scheduling options for requests. • IT should support and provide an interface to other applications. • It should support end-user monitoring of the status of the data warehouse environment. • g lists the characteristics of info directory/ Metadata 4. Data marts A Data Mart is focused on a single functional area of an organization and contains a subset of data stored in a Data Warehouse. A Data Mart is a condensed version of a Data Warehouse and is designed for use by a specific department, unit, or set of users in an organization. E.g., Marketing, Sales, HR, or finance. It is often controlled by a single department in an organization. Extremely urgent user requirement. • The absence of a budget for a full-scale data warehouse strategy. • The decentralization of business needs. • The attraction of easy to use tools and mind sized project Why do we need Data Mart? • Data Mart helps to enhance user’s response time due to reduction in volume of data • It provides easy access to frequently requested data. • Data mart are simpler to implement when compared to corporate Datawarehouse. At the same time, the cost of implementing Data Mart is certainly lower compared with implementing a full data warehouse. • Compared to Data Warehouse, a datamart is agile. In case of change in model, datamart can be built quicker due to a smaller size. • A Datamart is defined by a single Subject Matter Expert. On the contrary data warehouse is defined by interdisciplinary SME from a variety of domains. Hence, Data mart is more open to change compared to Datawarehouse. • Data is partitioned and allows very granular access control privileges. • Data can be segmented and stored on different hardware/software platforms. Types of Data Mart • There are three main types of data mart: 1.Dependent: Dependent data marts are created by drawing data directly from operational, external or both sources. 2.Independent: Independent data mart is created without the use of a central data warehouse. 3.Hybrid: This type of data marts can take data from data warehouses or operational systems. 1. Dependent Data Mart • A dependent data mart allows sourcing organization’s data from a single Data Warehouse. It is one of the data mart example which offers the benefit of centralization. If you need to develop one or more physical data marts, then you need to configure them as dependent data marts. 2. Independent Data Mart • An independent data mart is created without the use of a central Data warehouse. This kind of Data Mart is an ideal option for smaller groups within an organization. • An independent data mart has neither a relationship with the enterprise data warehouse nor with any other data mart. In an Independent data mart, the data is input separately, and its analyses are also performed autonomously. 3.Hybrid Data Mart • A hybrid data mart combines input from sources apart from the Data warehouse. This could be helpful when you want ad-hoc integration, like after a new group or product is added to the organization. • It is the best data mart example suited for multiple database environments and a fast implementation turnaround for any organization. It also requires the least data cleansing effort. Hybrid Data mart also supports large storage Steps in Implementing a Datamart 5. Application & Tools • Data mining queries are useful for many purposes. You can: • Apply the model to new data, to make single or multiple predictions. You can provide input values as parameters, or in a batch. • Get a statistical summary of the data used for training. • Extract patterns and rules, or generate a profile of the typical case representing a pattern in the model. • Extract regression formulas and other calculations that explain patterns. • Get the cases that fit a particular pattern. • Retrieve details about individual cases used in the model, including data not used in the analysis. • Retrain a model by adding new data, or performing cross-prediction. Conti… Its purpose is to provide info to business users for decision making. There are five categories: • Data query and reporting tools. • Application development tools. • Executive info system tools (EIS). • OLAP tools. • Data mining tools Query and reporting tools are used to generate query and report. • There are two types of reporting tools. They are: • Production reporting tool used to generate regular operational reports Desktop report writer are inexpensive desktop tools designed for end users. Conti… • Managed Query tools: used to generate SQL query. It uses Meta layer software in between users and databases which offers a point-and-click creation of SQL statement. This tool is a preferred choice of users to perform segment identification, demographic analysis, territory management and preparation of customer mailing lists etc. • Application development tools: This is a graphical data access environment which integrates OLAP tools with data warehouse and can be used to access all db systems. • OLAP Tools: are used to analyze the data in multi dimensional and complex views. To enable multidimensional properties it uses MDDB and MRDB where MDDB refers multi dimensional data base and MRDB refers multi relational data bases. • Data mining tools: are used to discover knowledge from the data warehouse data also can be used for data visualization and data correction purposes. 6. Data warehouse administration and management The management of data warehouse include… • Security and priority management. • Monitoring updates from multiple sources. • Data quality checks Managing and updating meta data Auditing and reporting data warehouse usage and status Purging data Replicating, sub setting and distributing data Backup and recovery Data warehouse storage management which includes capacity planning, hierarchical storage management and purging of aged data etc. 7. Information delivery system • It is used to enable the process of subscribing for data warehouse info. • Delivery to one or more destinations according to the specified scheduling algorithm Data extraction, clean up, transformation and migration • A proper attention must be paid to data extraction which represents a success factor for a data warehouse architecture. When implementing data warehouse several the following selection criteria that affect the ability to transform, consolidate, integrate and repair the data should be considered: Timeliness of data delivery to the warehouse. • The tool must have the ability to identify the particular data and that can be read by conversion tool. • The tool must support flat files, indexed files since corporate data is still in this type. • The tool must have the capability to merge data from multiple data stores. • The tool should have specification interface to indicate the data to be extracted • The tool should have the ability to read data from data dictionary. • The code generated by the tool should be completely maintainable. • The tool should permit the user to extract the required data. • The tool must have the facility to perform data type and character set translation. • The tool must have the capability to create summarization, aggregation and derivation of records. • The data warehouse database system must be able to perform loading data directly from these tools Benefits of data warehousing • Data warehouse usage includes, – Locating the right info – Presentation of info – Testing of hypothesis – Discovery of info – Sharing the analysis The benefits can be classified into two: Tangible benefits (quantified / measureable):It includes, – Improvement in product inventory • – Decrement in production cost • – Improvement in selection of target markets • – Enhancement in asset and liability management. Intangible benefits (not easy to quantified): It includes, • – Improvement in productivity by keeping all data in single location and eliminating rekeying of data • – Reduced redundant processing • – Enhanced customer relation