Fall 2013 Assignment Program: Bachelor of Computer Application Semester 6Th Sem Subject Code & Name Bc0058 - Data Warehousing
Fall 2013 Assignment Program: Bachelor of Computer Application Semester 6Th Sem Subject Code & Name Bc0058 - Data Warehousing
Fall 2013 Assignment Program: Bachelor of Computer Application Semester 6Th Sem Subject Code & Name Bc0058 - Data Warehousing
For example, each regional sales manager in a company may wish to produce a monthly
summary of the sales per region. Because the reporting server contains data at the same level of
detail as the OLTP system, the entire month's data is summarized each time the report is
generated. The result is longer-running queries that lower user satisfaction.
Additionally, many organizations store data in multiple heterogeneous database systems.
Reporting is more difficult because data is not only stored in different places, but in different
formats.
Data warehousing and online analytical processing (OLAP) provide solutions to these problems.
Data warehousing is an approach to storing data in which heterogeneous data sources (typically
from multiple OLTP databases) are migrated to a separate homogenous data store. Data
warehouses provide these benefits to analytical users:
Data is organized to facilitate analytical queries rather than transaction processing.
Differences among data structures across multiple heterogeneous databases can be
resolved.
Data transformation rules can be applied to validate and consolidate data when data is
moved from the OLTP database into the data warehouse.
Security and performance issues can be resolved without requiring changes in the
production systems.
Sometimes organizations maintain smaller, more topic-oriented data stores called data marts.
In contrast to a data warehouse which typically encapsulates all of an enterprise's analytical
data, a data mart is typically a subset of the enterprise data targeted at a smaller set of users or
business functions.
Whereas a data warehouse or data mart are the data stores for analytical data, OLAP is the
technology that enables client applications to efficiently access the data. OLAP provides these
benefits to analytical users:
Pre-aggregation of frequently queried data, enabling a very fast response time to ad hoc
queries.
An intuitive multidimensional data model that makes it easy to select, navigate, and
explore the data.
A powerful tool for creating new views of data based upon a rich array of ad hoc
calculation functions.
Technology to manage security, client/server query management and data caching, and
facilities to optimize system performance based upon user needs.
The terms data warehousing and OLAP are sometimes used interchangeably. However, it is
important to understand their differences because each represents a unique set of technologies,
administrative issues, and user implications.
SQL Server Tools for Data Warehousing and OLAP
Microsoft SQL Server provides several tools for building data warehouses and data marts, and
OLAP systems. Using DTS Designer, you can define the steps, workflow, and transformations
necessary to build a data warehouse from a variety of data sources. After the data warehouse is
built, you can use Microsoft SQL Server OLAP Services, which provides a robust OLAP server
that can be used to analyze data stored in many different formats, including SQL Server and
Oracle databases.
extraction, Data transformation and loading data from storage. You have to decide are you going
to buy all these function from vendor or making some function customize with your own
company business needs.
5. Single Vendor or Best-of-Breed.
Choosing a single vendor solution has a few advantages: - High level of integration among the
tools - Constant look and feel - Seamless cooperation among components - Centrally managed
information exchange - Overall price negotiable
Advantages on choosing the best-of-breed vendor selection: - Could build an environment to fit
your organization - No need to compromise between database and support tools - Select
products best suited for the function
3 Explain Source Data Component and Data Staging Components of Data Warehouse
Architecture.
ANSWER: Data Warehouse Architecture
Different data warehousing systems have different structures. Some may have an ODS
(operational data store), while some may have multiple data marts. Some may have a small
number of data sources, while some may have dozens of data sources. In view of this, it is far
more reasonable to present the different layers of a data warehouse architecture rather than
discussing the specifics of any one system.
In general, all data warehouse systems have the following layers:
The picture below shows the relationships among the different components of the data
warehouse architecture:
This refers to the information that reaches the users. This can be in a form of a tabular /
graphical report in a browser, an emailed report that gets automatically generated and sent
everyday, or an alert that warns users of exceptions, among others. Usually an OLAP tool and/or
a reporting tool is used in this layer.
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. A logical
data model would be an example of something that's in the metadata layer. A metadata tool is
often used to manage metadata.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job
status, system performance, and user access history.
Online Extraction: In online extraction the data is extracted directly from the source system.
The extraction process connects to the source system and extracts the source data.
Offline Extraction: The data from the source system is dumped outside of the source system
into a flat file. This flat file is used to extract the data. The flat file can be created by a routine
process daily.
5 Define the process of Data Profiling, Data Cleansing and Data Enrichment.
ANSWER: Data quality is a critical factor for the success of enterprise intelligence initiatives.
Bad data on one system can easily and rapidly propagate to other systems. If information shared
across the organisation is contradictory, inconsistent or inaccurate, then interactions with
customers, suppliers and others will be based on inaccurate information, resulting in higher
costs, reduced credibility and lost business.
SAS Data Integration provides a single environment that seamlessly integrates data quality
within the data integration process, taking users from profiling and rules creation through
execution and monitoring of results. organisations can transform and combine disparate data,
remove inaccuracies, standardise on common values, parse values and cleanse dirty data to
create consistent, reliable information.
Rules can be built quickly while profiling data, and then incorporated automatically into the
data transformation process. This speeds the development and implementation of cleansed
data. A workflow design environment facilitates the easy augmentation of existing data with new
information to increase the usefulness and value of all enterprise data.
Key Benefits
Speeds the delivery of credible information by embedding data quality into batch and
real-time processes.
Reduces costly errors by preventing the propagation of bad data and correcting mistakes
at the source.
Keeps data current and accurate with regular auditing and cleansing.
standardises data from multiple sources and reduces redundancy in corporate data to
support more accurate reporting, analysis and business decisions.
Adds value to existing data by generating and/or appending information from other
sources.
Key Features
Database/data warehouse/data mart cleansing through a variety of techniques, including
standardization, transformation and rationalization, while maintaining an accurate audit trail.
Create reusable data quality business rules that are callable through custom exits,
message queues and Web services.
Data summarization. Compress large static databases into representative points making
them more amenable for subsequent analysis.
Support for more than 20 worldwide regions with specific language awareness and
localizations.
Sharing metadata throughout the suite from a single metadata repository creates
accurate, consistent, and efficient processes.
Changes that you make to source systems can be quickly identified and propagated
throughout the flow of information.
You can identify downstream changes and use them to revise information in the source
systems.
You can track and analyze the data flow across departments and processes.
Metadata is shared automatically among tools.
Glossary definitions provide business context for metadata that is used in jobs and
reports.
Data stewards take responsibility for metadata assets such as schemas and tables that
they have authority over.
By using data lineage, you can focus on the end-to-end integration path, from the design
tool to the business intelligence (BI) report. Or you can drill down to view any element of
the lineage.
You can eliminate duplicate or redundant metadata to create a single, reliable, version
that can be used by multiple tools.
Managing metadata
The metadata repository of IBM InfoSphere Information Server stores metadata from
suite tools and external tools and databases and enables sharing among them. You can
import metadata into the repository from various sources, export metadata by various
methods, and transfer metadata assets between design, test, and production repositories.