Characteristics and Functions of Data Warehouse
Characteristics and Functions of Data Warehouse
warehouse
Last Updated : 22 Oct, 2018
Prerequisite – Data Warehousing
Data warehouse can be controlled when the user has a shared way of explaining the
trends that are introduced as specific subject. Below are major characteristics of data
warehouse:
1. Subject-oriented –
A data warehouse is always a subject oriented as it delivers information about a theme
instead of organization’s current operations. It can be achieved on specific theme.
That means the data warehousing process is proposed to handle with a specific theme
which is more defined. These themes can be sales, distributions, marketing etc.
A data warehouse never put emphasis only current operations. Instead, it focuses on
demonstrating and analysis of data to make various decision. It also delivers an easy
and precise demonstration around particular theme by eliminating data which is not
required to make the decisions.
2. Integrated –
It is somewhere same as subject orientation which is made in a reliable format.
Integration means founding a shared entity to scale the all similar data from the
different databases. The data also required to be resided into various data warehouse
in shared and generally granted manner.
A data warehouse is built by integrating data from various sources of data such that a
mainframe and a relational database. In addition, it must have reliable naming
conventions, format and codes. Integration of data warehouse benefits in effective
analysis of data. Reliability in naming conventions, column scaling, encoding
structure etc. should be confirmed. Integration of data warehouse handles various
subject related warehouse.
3. Time-Variant –
In this data is maintained via different intervals of time such as weekly, monthly, or
annually etc. It founds various time limit which are structured between the large
datasets and are held in online transaction process (OLTP). The time limits for data
warehouse is wide-ranged than that of operational systems. The data resided in data
warehouse is predictable with a specific interval of time and delivers information
from the historical perspective. It comprises elements of time explicitly or implicitly.
Another feature of time-variance is that once data is stored in the data warehouse then
it cannot be modified, alter, or updated.
4. Non-Volatile –
As the name defines the data resided in data warehouse is permanent. It also means
that data is not erased or deleted when new data is inserted. It includes the mammoth
quantity of data that is inserted into modification between the selected quantity on
logical business. It evaluates the analysis within the technologies of warehouse.
Data Warehousing
Difficulty Level : Easy
Last Updated : 28 Jun, 2021
Background
A Database Management System (DBMS) stores data in the form of tables, uses ER
model and the goal is ACID properties. For example, a DBMS of college has tables for
students, faculty, etc.
A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is
typically collected from multiple heterogeneous sources like files, DBMS, etc. The goal
is to produce statistical results that may help in decision makings. For example, a college
might want to see quick different results, like how is the placement of CS students has
improved over the last 10 years, in terms of salaries, counts, etc.
Need of Data Warehouse
An ordinary Database can store MBs to GBs of data and that too for a specific purpose.
For storing data of TB size, the storage shifted to Data Warehouse. Besides this, a
transactional database doesn’t offer itself to analytics. To effectively perform analytics,
an organization keeps a central Data Warehouse to closely study its business by
organizing, understanding, and using its historic data for taking strategic decisions and
analyzing trends.
Data Warehouse vs DBMS
Example Applications of Data Warehousing
Data Warehousing can be applied anywhere where we have a huge amount of data and
we want to see statistical results that help in decision making.
Social Media Websites: The social networking websites like Facebook, Twitter,
Linkedin, etc. are based on analyzing large data sets. These sites gather data related to
members, groups, locations, etc., and store it in a single central repository. Being a
large amount of data, Data Warehouse is needed for implementing the same.
Banking: Most of the banks these days use warehouses to see the spending patterns
of account/cardholders. They use this to provide them special offers, deals, etc.
Government: Government uses a data warehouse to store and analyze tax payments
which are used to detect tax thefts.
There can be many more applications in different sectors like E-Commerce,
telecommunications, Transportation Services, Marketing and Distribution, Healthcare,
and Retail.
KDD process
1. Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from
collection.
Cleaning in case of Missing values.
Cleaning noisy data, where noise is a random or variance error.
Cleaning with Data discrepancy detection and Data transformation tools.
2. Data Integration: Data integration is defined as heterogeneous data from multiple
sources combined in a common source(DataWarehouse).
Data integration using Data Migration tools.
Data integration using Data Synchronization tools.
Data integration using ETL(Extract-Load-Transformation) process.
3. Data Selection: Data selection is defined as the process where data relevant to the
analysis is decided and retrieved from the data collection.
Data selection using Neural network.
Data selection using Decision Trees.
Data selection using Naive bayes.
Data selection using Clustering, Regression, etc.
4. Data Transformation: Data Transformation is defined as the process of transforming
data into appropriate form required by mining procedure.
Data Transformation is a two step process:
Data Mapping: Assigning elements from source base to destination to capture
transformations.
Code generation: Creation of the actual transformation program.
5. Data Mining: Data mining is defined as clever techniques that are applied to extract
patterns potentially useful.
Transforms task relevant data into patterns.
Decides purpose of model using classification or characterization.
6. Pattern Evaluation: Pattern Evaluation is defined as identifying strictly increasing
patterns representing knowledge based on given measures.
Find interestingness score of each pattern.
Uses summarization and Visualization to make data understandable by user.
7. Knowledge representation: Knowledge representation is defined as technique which
utilizes visualization tools to represent data mining results.
Generate reports.
Generate tables.
Generate discriminant rules, classification rules, characterization rules, etc.
Note:
KDD is an iterative process where evaluation measures can be enhanced, mining can
be refined, new data can be integrated and transformed in order to get different and
more appropriate results.
Preprocessing of databases consists of Data cleaning and Data Integration.
Benefits Of A Data Warehouse
1. Enables Historical Insight
No business can survive without a large and accurate storehouse of historical
data, from sales and inventory data to personnel and intellectual property
records. If a business executive suddenly needs to know the sales of a key
product 24 months ago, the rich historical data provided by a data warehouse
make this possible.
Also important, a data warehouse can add context to this historical data by
listing all the key performance trends that surround this retrospective research.
This kind of efficiency cannot be matched by a legacy database.
2. Enhances Conformity And Quality Of Data
Your business generates data in myriad different forms, including structured
and unstructured data, data from social media, and data from sales campaigns.
A data warehouse converts this data into the consistent formats required by
your analytics platforms. Moreover, by ensure this conformity, a data
warehouse ensures that the data produced by different business divisions is at
the same quality and standard – allowing a more efficient feed for analytics.
3. Boosts Efficiency
It’s very time consuming for a business user or a data scientist to have to gather
data from multiple sources. It’s far more advantageous for this data to be
gathered in one place, hence the benefit of a data warehouse.
Additionally, if for instance your data scientist needs data to run a fast report,
they don’t need to get the assistance from tech support to perform this task. A
data warehouse makes this data readily available – in the correct format –
improving efficiency of the entire process.
4. Increase The Power And Speed Of Data Analytics
Business intelligence and data analytics are the opposite of instinct and
intuition. BI and analytics require high quality, standardized data – on time and
available for rapid data mining. A data warehouse enables this power and
speed, allowing competitive advantage in key business sectors, ranging from
CRM to HR to sales success to quarterly reporting.
5. Drives Revenue
A tech pundit opined that “data is the new oil,” referring to the high dollar
value of data in today’s world. Creating more standardized and better quality
data is the key strength of a data warehouse, and this key strength translates
clearly to significant revenue gains. The data warehouse formula works like
this: Better business intelligence helps with better decisions, and in turn better
decisions create a higher return on investment across any sector of your
business.
Most important, these revenue gains build on themselves over time, as better
decisions strengthen the business.
In short, a high quality, fully scalable data warehouse can be seen as less of a
cost and more of an investment – one that adds exponential value like few other
investments that businesses make.
6. Scalability
The top key word in the cloud era is “scalable” and a data warehouse is a
critical component in driving this scale. A topflight data warehouse is itself
scalable, and also enables greater scalability in the business overall.
That is, today’s sophisticated data warehouse are built to scale, handling ever
more queries as the business grows (though this will require more supporting
hardware). Additionally, the efficiency in data flow enabled by a data
warehouse greatly boosts a business’s growth – this growth is the core of
business scalability.
7. Interoperates With On-Premise And Cloud
Unlike the legacy databases of yesteryear, today’s data warehouses are built
with multicloud and hybrid cloud in mind. Many data warehouses are now fully
cloud-based, and even those that are built for on-premise typically will
interoperate well with the cloud-based portion of a company’s infrastructure.
As an additional important side point: this cloud-based focus also means that
mobile users are better able to access the data warehouse – this is beneficial for
sales reps in particular.
8. Data Security
A number of key advances in data warehouse have enhanced their security,
which enhances the overall security of company data. Among these advances
are techniques like a “slave read only” set up, which blocks malicious SQL
code, and encrypted columns, which protects confidential data.
Some businesses set up custom user groups on their data warehouses, which
can include or exclude various data pools, and even give permission on a row
by row basis.
9. Much Higher Query Performance And Insight
The constant business intelligence queries that are part of today’s business can
put a major strain on an analytics infrastructure, from the legacy databases to
the data marts. Having a data warehouse to more effectively handle queries
removes some of the pressure on the system.
Furthermore, since a data warehouse is specifically geared to handle massive
levels of date and myriad complex queries, it’s the high functioning core of any
business’s data analytics practice.
10. Provides Major Competitive Advantage
This is absolutely the bottom line benefit of a data warehouse: it allows a
business to more effectively strategize and execute against other vendors in its
sector.
With the quality, speed and historical context provided by a data warehouse,
the greater insight in data mining can drive decisions that create more sales,
more targeted products, and faster response times.
In short, a data warehouse improves business decision making, which in turn gives
any business a key competitive advantage
Let’s assume that a super market chain has not implemented a data warehouse and
eventually the supermarket finds it very difficult to analyze what products are sold, what
is not selling, when does the sale go up, what is the age group of customers who are
buying a particular product and several other queries. This is the first step of attracting
challenges because a decision has to be made as to whether, a particular product is a
hit among 18-25 age group or not? In case it is analyzed that the selling value has
subsided, steps have to be taken to analyze the issue surrounding it.
Talking about the strategic value given to a company, let’s take an example of
procurement. Every company procures certain products from a supplier like laptops,
desktops etc. Before making a purchase, the company contacts the supplier in order to
negotiate about the price and inquiring about the terms. How sure is the company about
the supplier adhering to the terms of the contract? After the purchase is made, the
supplier always gives an invoice. If the invoice shows that the discount hasn’t been
given as agreed, and doesn’t match the terms of the contract, then the two could
discuss on the same.
Data Warehousing Certification Training
Course Type
Real-life Case Studies
Assignments
Lifetime Access
Explore Curriculum
Hence, the sole reason for a company to have a data warehouse is to have the extra
edge. It is gained by taking smarter decisions in a smarter manner. This is possible if
executives responsible for such decisions have this data at their disposal. There was a
time when fact-based decisions and experience-based decisions were much more
prevalent. Moving away from that we have entered into an area, where fact-based
decisions have gained importance in our lives.
There are certain questions asked to a manager or executive and he has to answer this
to get an extra edge over his competitors. These questions may not be needed to run a
business but are needed for the survival and growth of the business.
The second sub set question is how many customers have given a feedback of
excellent, how many averages and how many bad? Then there is another column on
comments which will be required for the next question; this will be the comments or
improvement areas highlighted by customers. It can be identified as to why these
questions are asked. All these three questions combined give a picture of the customer
service and what improvements are needed.