0% found this document useful (0 votes)
475 views144 pages

Best Practice in Data Management

This document provides an overview of best practices for implementing a business intelligence (BI) architecture. It describes the typical layers in a BI architecture including the reporting layer, data warehouse/mart layer, data integration layer, and operational data store layer. For each layer, it discusses relevant deliverables, templates, and frequently asked questions. The chapter aims to provide a roadmap for setting up an effective BI architecture and references additional resources on specific topics.

Uploaded by

Jo Walter
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
475 views144 pages

Best Practice in Data Management

This document provides an overview of best practices for implementing a business intelligence (BI) architecture. It describes the typical layers in a BI architecture including the reporting layer, data warehouse/mart layer, data integration layer, and operational data store layer. For each layer, it discusses relevant deliverables, templates, and frequently asked questions. The chapter aims to provide a roadmap for setting up an effective BI architecture and references additional resources on specific topics.

Uploaded by

Jo Walter
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 144

Best Practice in Data Management 1

Best Practice in Data Management

Barry Williams

Best Practice in Data Management 2 Chapter 1. Welcome.............................................................................................3 Chapter 2. BI Architectures ....................................................................................5 Chapter 3. Data Architectures for the Future..............................................................42 Chapter 4. Data Management Getting Started ...........................................................51 Chapter 5. Data Warehouses and Data Marts...............................................................56 ....................................................................................................................56 Chapter 6. Design Patterns for Data Models ...............................................................67 Chapter 7. Enterprise Data Models...........................................................................81 ....................................................................................................................81 ....................................................................................................................82 Chapter 8. Knowledge Management .........................................................................92 Chapter 9. MetaData Management .........................................................................106 Chapter 10. Quality of a Data Model ......................................................................115

First Edition: London, 2011 ISBN : 978-1-4660-0746-8

Barry Williams Principal Consultant Database Answers Ltd. London, England

Best Practice in Data Management 3

Chapter 1. Welcome
1.1 What is this?
This is a collection of Essays on Best Practice in Data Management. Data Management is like a Slowly-Changing Dimension It changes imperceptibly and then after about a year (on the average) you realize that the landscape has changed. Our intention in this book is to capture and define Best Practice at a particular point in time and then keep it up-to-date with new versions of the Book every quarter. We have started with the Topics that we find come up most often in questions to our Database Answers Web Site. I hope you enjoy this Book and would be very pleased to have your comments at barryw@databaseanswers.org. Also, please contact us if there is a topic that you would like to see included.

1.2 Why is it important?


Data Management covers a wide range of Topics and each of them is the subject of discussion in many organisations trying to clarify the As-Is, the To-Be and how to get from the first to the second.

Our intention with this book is to provide a reference for the selected Topics in Data Management.

1.3 What Will I Learn?


You will learn:

An understanding of what each Topic involves. Some Templates to use to get started Which organisations and individuals provide Thought Leadership

1.4 Topics
In this book, we cover some important topics in Data Management, including : Business Intelligence (BI) Checking the Quality of a Data Model

Best Practice in Data Management 4


Data Warehouses and Data Marts

Knowledge Management Metadata Management

Best Practice in Data Management 5

Chapter 2. BI Architectures
Here is the link for LinkedIn Group on BI :-

http://www.linkedin.com/groups/Business-Intelligence-Group-23006

2.1 Purpose of this Chapter

This Chapter provides a Road Map for the implementation of a BI Architecture.


2.2 Layers in the BI Architecture
This diagram shows the four Layers in the Architecture, where each Layer is supported with a set of Best Practice specifications, Data Models and Templates.

BI and Reporting Layer

Data Warehouse and Mart Layer

Data Integration Layer Single Version of the Truth Enterprise Data Model - ETL

Operational Data Store Layer CRM Operations Billin g Etc.

Best Practice in Data Management 6

2.3 Layers and Deliverables


This table shows the deliverables and artefacts at each Stage in the end-to-end BI Architecture.

BI Architecture

Business Requirements and Design Walkthrough


Deliverables - Best Practice and Templates

Reporting Layer

Report Templates

Data Mart Layer

Generic Design and Models for Data Marts

Data Warehouse

Data Integration Layer

Data Transformation Mapping to DWH in line with Enterprise Data Model

Data Quality (Validation and Cleanup) Data Profile

Operational Data Store Layer

Data Staging Area

Best Practice in Data Management 7 2.4 The Reporting Layer


2.5.1 Best Practice Best Practice involves identifying the User Requirements, formalising these in a document which

is agreed to and signed off before development work begins

BI and Reports take data from Data Marts and many of the same considerations apply when it comes to determining Best Practice. One difference is that is necessary to have a clearer understanding of the business operations and how the right kind of Performance Reports can provide insight to the business users. This leads to the need for a management education process to be in place so that the evolution of Performance Reports can be planned in a logical manner, from basic summaries, to KPIs, Dashboards and so on. 2.5.2 Data Models This Data Models shows the Data Warehouse for the FBP 3 Project. It is called a Reporting Database for consistency with the previously existing Project documentation.
Atrtributes in black are Headings or Dimensions, and in red are Data Items. Reporting_Database Regions Region Region_Name Banner Banner Banner_Name Record_ID Country (FK) Region (FK) Banner (FK) Department (FK) LP Page (FK) Store Number (FK) Time_Period (FK) Currency (FK) Cash Inventory Cash Sales Clearance Unit Sales Disrtro Units Financial Data Inventory Inventory Ticket Plans Inventory Ticket Actuals Financial Data - Till MD Data Merch Need Unit Production Targets Unit Sales - Plans Unit Sales - Actual TY Data LY Data Other Data Departments Department Department_Name

LP Pages LP Page LP Page Name Countries Country Country_Name

Stores Store Number Store Name

Calendar Time_Period Week Number Month Number

Currency_Codes Currency Currency_Name

Best Practice in Data Management 8


2.5.3 Templates This Template is taken from a Sales Audit Report

Sales and Refunds for April 1st. 2011


Excluding Settled Transactions TRAN Reg Date 6001760501001907996 6001760501006665185 6001760501006665193 6001760501006665201 6001760501006665219 6001760501006665227 6001760501007852014 Txn 27/06/11 27/06/11 27/06/11 27/06/11 27/06/11 27/06/11 27/06/11 Tran Type 50100 50100 50100 50100 50100 50100 50100 2 2 2 2 2 2 8 User ID 7996 5185 5193 5201 5219 5227 2014 Amount Issue Issue Issue Issue Issue Issue Issue hlobo hlobo hlobo hlobo hlobo hlobo masmith

6001760501008109703 6001760501008109893 6001760501010717113 6001760501011277653

27/06/11 27/06/11 27/06/11 27/06/11

50100 50100 50100 50301

8 8 2 1

9703 9893 7113 2166

Issue Issue Issue AdjustDec

masmith masmith hlobo syassir

6001760501011566451 6001760514000081507 6001760514000081515 6001760514000081523 6001760514000081549

27/06/11 27/06/11 27/06/11 27/06/11 27/06/11

50100 50100 50100 50100 50100

5 2 2 2 2

6451 1507 1515 1523 1549

Issue Issue Issue Issue Issue

hchiwere hlobo hlobo hlobo hlobo

Best Practice in Data Management 9


2.5.4 Frequently Asked Questions What.1 What are Performance Reports ?

This Stage produces and delivers Performance Reports for management Report Templates supported by the appropriate Generic software are required.

What.2 - What is Business Intelligence ? This Stage produces and delivers BI and Performance Reports to management : It must be responsive to requests for change. Users requirements are always evolving Therefore the approach and supporting software must be flexible Report Templates supported by the appropriate Generic software are required.

This Stage delivers Performance Reports that meet the requirements of all levels of management. There is a need to be responsive to requests for change. User Requirements are always evolving. Therefore the approach and reporting software tool must be flexible.

A sensible approach is to develop Reporting Templates supported by the appropriate Generic Software.

What.3 How do we assess our User Report Maturity Level ? 1) Blank Template

Template Name Date User Category Weekly Totals Traffic Lights Dashboards KPIs Other

Best Practice in Data Management 10


2) Completed Template

Template Name Date User Category Finance HR Operations

Report Maturity Level April 1st. 2010 Weekly Totals Common Common Common Traffic Lights In use Aware Unaware Dashboards In use Unaware Unaware KPIs In use Aware Unaware Other Mashups

2.2.2.2 Availability of Models and Data Marts 1) Blank Template This Template is used to keep track of the availability of Master Data Models and Data Marts. Template Name Date Category Customers Finance Products Purchase Orders Stores Warehouses Master Data Models Data Marts

Best Practice in Data Management 11


2) Completed Template Template Name Date Category Finance HR Operations Data Model Availability March 18th. 2010 Master Data Models N/A N/A N/A Data Marts N/A N/A N/A

Movements NCTS Products Customer Warehouses

N/A Available SEED DTR but needs work DTR but needs work

N/A N/A N/A N/A N/A

Best Practice in Data Management 12


2.2.2.3 Performance Reports 1) Blank Template

Report Name Date Produced Product Name Week 1 Date <Value in s> Week 2 Date <Value in s> Week 3 Date <Value in s> Week 4 Date <Value in s> Total <Value in s>

Weekly Totals Grand Total

<Value in s>

<Value in s>

<Value in s>

<Value in s>

<Value in s> <Value in s>

2) Completed Template These figures are fictitious. Report Name Date Produced Product Name Beer Cigarettes Cigars & cigarillos Leaded Petrol Unleaded Petrol Value of Weekly product Movements March 18th. 2010 Dec 6th 2009 40,000 50,000 25,000 90,000 100,000 Dec 13th 2009 60,000 60,000 30,000 91,000 120,000 Dec 20th 2009 70,000 70,000 31,000 92,000 133,000 Dec 27th 2009 80,000 80,000 32,000 93,000 140,000 Total 160,000 160,000 118,000 366,000 490,000

Best Practice in Data Management 13

Weekly Totals Grand Total

205,000

361,000

396,000

425,000 1,300,000

Best Practice in Data Management 14


These questions are from this page : http://www.databaseanswers.org/bi_plus_performance_reports.htm

Why.1 - Why is this Stage important ? The value and benefits of Reports are always a major part of the justification of the cost of designing and installing a Database.

How 1 How do we get started ? Here is a short Tutorial

Step 1. Determine if Users are ready for KPIs,Traffic Lights and Dashboards. Step 2. Check availability of Master Data Models Step 3. Check availability of Data Marts Step 5. Check availability of Report Specifications and SQL Views for Reports Step 6. Perform Gap Analysis to identify any missing data that must be sourced. Step 7. Analyse common aspects of requirements for Performance Reports

There are three Templates in this Section :1. 2. 3. User Report Maturity Level Availability of Master Data Models and Data Marts Templates for Performance Reports

These questions are taken from this page : http://www.databaseanswers.org/bi_plus_performance_reports_questions.htm

Here's another Kick-Start Tutorial : Step 1. Assess the level of Maturity of the Users concerning KPIs, Dashboards,etc.. Step 2. Check availability of Master Data Models and Data Marts Step 3. Check availability of Report Specifications and SQL Views for Reports Step 5. Tailor the Approach accordingly Step 6. Aim for results in 6 months and interim results in 3 months

If you have a Question that is not addressed here, please feel free to email us your Question at barryw@databaseanswers.org.

How.2 - How do we measure progress in Business Intelligence ? Check for : a Statement of User Requirements ideally with specifications of Templates Software Design Patterns.

Best Practice in Data Management 15

How.3 - How do I combine Excel data in my Reports ? Data in Excel Spreadsheets is structured in tabular format which corresponds exactly to the way in which data is stored in relational database. Also Spreadsheets are commonly used and the data frequently needs to be integrated with other data within an organization. Therefore we would expect to find a wide range of solutions are available to solve this problem. Here is a small sample : An ODBC connection can be established for a spreadsheet. Informatica allows Spreadsheets to be defined as a Data Source. Microsofts SQL Server Integrated Services also lets Excel be defined as a Source. Oracle provides a facility to define EXTERNAL table which can be Spreadsheets. Salesforce.com provides their Excel Communicator. How.4 - How do you meet your Chief Executives Report requirements ? In order to always respond to this situation appropriately, it is necessary to have an Information Catalogue, a Data Architecture and Data Lineage. The solution then involves the following Steps :Step 1) Produce a draft Report for the Chief Execs approval Step 2) Trace the lineage and perform a gap analysis for all new data items. Step 3) Talk to the Data Owners and establish when and how the data can be made available. Step 4) Produce a Plan and timescale Step 5) Review your Plan with the Chief Exec and obtain this agreement and formal sign-off. Step 6) Deliver !!!

How.5 - How do I produce Integrated Performance Reports ? Reports for Senior Management fall into two categories : Standard Reports On-demand reports For Standard Reports it is possible to define Templates. For On-demand Reports, the aim is to define a flexible approach to be able to respond to changes to Requirements in a timely manner. The key action here is to establish a unified Reporting Data Platform. This will involve aspects previously discussed, including MDM, CMI and will certainly involve Data Lineage. Senior Management will want to take a view of the integrated data and not focus on details of derivation. Therefore, we have to follow the MDM approach with Data Lineage for each item in the Integrated Performance Reports. Key Performance Indicators (KPIs) Question : What are Key Performance Indicators (KPIs) Key Performance Indicators (KPIs) are in common use and represent one aspect of Best Practice.

Best Practice in Data Management 16


A variation of this approach are Key Quality Indicators,(KQIs) which are used to monitor and manage Data Quality. Dashboards and Scorecards are often used in association with KPIs and KQIs.

2.5 Data Mart Layer


This diagram shows the four major Stages in delivering a Single View of the Truth :-

2.6.1 Best Practice An Enterprise Data Warehouse is a repository of all the data within the enterprise that is used in Performance Reports. It is intended to guarantee a Single Version of the Truth . It typically stores very detailed data, such as Sales Receipts, and can maintain historical data going back many years. A Data Mart, on the other hands, stores data for specific functional areas, such as Purchases and Sales, and the data is usually limited in timescale and might even have a limited life span. A Data Warehouse can be either a single very large Repository of Data or it can be built as an interlocking set of Data Marts.

Best Practice in Data Management 17 Each Data Mart would store data related to a separate business area, such as Sales, or for a specific Report or family of Reports. In passing, we can mention that there are two well-known authorities in the broad field of Data Warehouses and Data Marts. The first is Bill Inmon, who favours the first approach of large, all-encompassing Data Warehouses. The second is Ralph Kimball, who favours related Data Marts. Inmon and Kimball both write well and present convincing arguments for their points of view. A sensible approach is to start with a single Data Warehouse and then to create Data Marts for specific business requirements as they occur. In order to link Data Marts, they need to share the same values for the same Dimensions, such as Stores or Products. These are called Conformed Dimensions. Without Conformed Dimensions, it is impossible to compare and accumulate related values in Data Marts.

2.6.2 An Agile Approach

An Agile Approach is very important because it is inevitable that user requirements will change from time to time. We can predict three Phase in the evolution of User Requirements :1. Give me everything 2. Give me these Reports on a regular basis and give me an ad-hoc Enquiry facility. 3. I want integrated KPIs and Dashboards 4. I want to be notified automatically if I have any situation requiring urgent attention In order to meet these Requirements, we need to put in place an Agile Data Warehouse with flexible Data Marts and an integrated BI Toolkit.

Best Practice in Data Management 18


2.6.3 Conformed Dimensions A Conformed Dimension is one that has the same value across all Subject Areas. Conformed Dimensions are therefore often Master Data. Although, of course, a Conformed Dimension is not necessarily Master Data,

The best example is probably Dates. In this example, the Date field is a Conformed Dimension for the Purchasing and Sales Data Marts, but Suppliers and Stores are not. Ticket Dimension Suppliers Purchasing Calendar Sales Stores Dimension Data Mart Dimension Data Mart Dimension -Ticket Number Supplier -Date PO - Date -Date of Sale Store Number ID Issued -Store -Supplier ID Number -PO ID

Best Practice in Data Management 19


2.6.4 Conformance Analysis

This table conveys the levels of conformance within a Dimension by grouping the base Dimension with conformed rollup totals. The two let-hand columns are Dimensions and the top column shows Facts. The Yes fields indicate the Dimensions that have to be conformed in order for the analyses to be valid. They show that if we have Product-level data then the Product is a conformed dimension. They are illustrative for discussion purposes.

Order s Date Day Week Month Product Product Category Organisatio n Warehous e Store Division Yes Yes Yes

Shipmen ts Yes

Inventori es

Sales Yes

Returns Yes

Demand Forecast

Yes

Yes

Yes Yes Yes

Yes

Yes

Yes

Yes Yes

Yes

Yes

Yes

Best Practice in Data Management 20


2.6.5 Data Models 2.6.6.1 Phase 2 Data Marts This shows how the data in the simple example Data Warehouse can be made available in two separate Data Marts one for Gift Cards data and the other for Damaged Goods data.

2.6.6.2 Sample Data Marts

This shows the design for three representative Online Shopping Data Marts
Data Mart 1 for Online Shopping Record_ID Shopping_Date Total_Amount Data Mart 2 for Online Shopping Record_ID Shopping_Date_Time Product Type Total_Amount Data Mart 3 for Online Shopping Record_ID Shopping_Date_Time Customer_ID Total_Amount

Best Practice in Data Management 21


2.6.6.3 Template for Generic Data Mart Design

This design shows two date fields because that is a common pattern. However, there could be more or less than two. In a similar way, it shows six Dimensions and six Facts for illustrative purposes but these, of course, could be any number of attributes.
Ref_Calendar Day_Date_Time Week_Number Month_Number Year_Number

Dimension_1 Dimension_1 Dimension_1_Description

Data Mart - Generic Design Record_ID Date_1 (FK) Date_2 (FK) Dimension_1 (FK) Dimension_2 (FK) Dimension_3 (FK) Dimension_4 (FK) Dimension_5 (FK) Dimension_6 (FK) Fact_1 Fact_2 Fact_3 Fact_4 Fact_5 Fact_6

Dimension_4 Dimension_4 Dimension_4_Description

Dimension_2 Dimension_2 Dimension_2_Description

Dimension_5 Dimension_5 Dimension_5_Description

Dimension_3 Dimension_3 Dimension_3_Description

Dimension_6 Dimension_6 Dimension_6_Description

Best Practice in Data Management 22

2.6.6.3 Sample Template (in Word)

CALENDAR Day Number (PK) PRODUCT TYPES Product Type Code CUSTOMER TYPES Customer Type Code

DATA_MART_FACTS Fact ID (PK) Date Movement ID Product ID Product Type Code Customer ID

MOVEMENTS Movement ID PRODUCTS Product ID CUSTOMERS Customer ID

2.6.6 Frequently Asked Questions

What is a Data Mart ? A Repository of total and detailed data with a standard structure This structure is usually a Facts Table where all the data for analysis is held, together with a number of associated Dimension Tables. Generic software is used, support by common Report Templates How do we get Started ? Step 1. Understand the Users Data Requirements Step 2. Determine the available Data Step 3. Reconcile standards, reference data Step 5. Establish a common view of the Data Platform Step 6. Choose the product or use bespoke SQL Step 7. Design the Templates and agree design with Users Step 8. Populate the Templates with sample data

What.1 : What is a Data Mart ? These questions are from this page : http://www.databaseanswers.org/data_marts.htm

Data Marts are a Repository of summary, total and detailed data to meet User Requirements for Reports. They always have a standard structure, called Dimensional Data Models, which means that it is possible to use Generic Software and adopt a common Approach based on Templates. Describing a Data Mart is a good way to get User buy-in because they can easily be explained in a logical manner which is very user-friendly.

Best Practice in Data Management 23


A Data Mart is a Repository of total and detailed data to meet User Reports requirements. It always a standard structure which means can have generic software and a common approach based on Report Templates A Data Mart design is simple and can be described to get User buy-in

What.2 : What are Data Mart Templates ? Data Marts have a common design of Dimension fields and Facts. Templates are important because they represent a tremendous Kick-Start approach to the design of Data Marts for a specific business area. They are produced by exploiting the common design of Dimensions and Facts. A range of Data Mart diagrams is available in the Case Studies on the Database Answers Web Site.

Why.1 : Why is this Stage important ? It provides a single point of reference for all the data available within the organisation for producing Reports How.1 : How do we get Started ? These questions are form this page : http://www.databaseanswers.org/data_marts_questions.htm

To get started, follow these Steps : Get a broad understanding of Users Data Requirements Establish a common view of the Data Platform Determine the available Data Reconcile standards, reference data Choose the product or use bespoke SQL Use Templates and agree design with Users Populate Templates with sample data Get sign-off on demo specs in 1 month, aim for results for champion in 3 months and final results in 6 months. Adjust timescales in light of experience

How.2: How do we measure progress with Data Marts ? Check the level of Users understanding. Check for existence of Templates.

How.3: How do I improve the performance of my Data Mart ? Every DBMS produces what is called an Execution Plan for every SELECT statement.

Best Practice in Data Management 24


The steps to improving the performance involve checking this Execution Plan against the Indexes that exist, and making sure that the Query Optimizer has used the appropriate Indexes to obtain the best performance. This is a specialized area where DBAs spend a lot of their time when they are looking after production databases where speed is a mission-critical factor. Data Marts are always created to support Business Intelligence, which includes Performance Reports, Balanced Scorecards, Dashboards, Key Performance Indicators and so on. Best practice always requires user involvement and a generic design to support a flexible approach to meeting changing requirements. Users will always want changes to their first specifications of their requirements. The insight that they obtain from the first Reports helps them identify more precisely what their long-term requirements will be. Therefore flexibility is important. A well-designed Data Mart will anticipate the areas where flexibility is required. The design process should always follow two steps : Production of generic design for the Data Mart Implementation of the design with a specific Data Mart software product.

Best Practice in Data Management 25

2.6 Data Integration Layer


2.7.1 Best Practice This Layer contains four major areas of functionality :* The Data Warehouse * Enterprise Data Model * Master Data Management / Single Version of the Truth * ETL (Extract, Transform and Load)

2.7.2 The Canonical Enterprise Data Model This offers a Design Pattern that can apply to any high-level System or Business Function that is providing data to be loaded into the Data Warehouse.

CUSTOMERS

PRODUCTS / SERVICES

EVENTS

ORGANISATION

SUPPLIERS

DOCUMENTS

Best Practice in Data Management 26


2.7.3 Phase 1 Data Warehouse This shows a simple example of a Data Warehouse holding data for Gift Cards and Vendor Compliance Infractions involving Damaged Goods all the data available in one single source.

2.7.4 Data Warehouse ERD This diagram shows the Entities that contribute to the Data Landscape for the Warehouse.
Addresses_

Customers_

Documents_

Ref_Address_Types Ref_Colours Data_Warehouse_Facts_

Events

Ref_Customer_Types Ref_Document_Types

Products_ Ref_Event_Types Staff Ref_Payment_Methods Stores Suppliers_ Suppliers_Addresses_ Ref_Sizes Ref_Supplier_Status

Warehouses

Suppliers_Products_

Best Practice in Data Management 27


2.7.4 Generic Data Warehouse This shows the design of the Generic DWH
DWH_Data_Types DWH_Data_Type_Code DWH_Data_Type_Name eg Gift Card Basic Data eg Gift Card Totals eg Inventory Levels eg Vendor Compliance Totals eg Online Shopping Totals eg Product Markdowns eg Promotions Data eg Raising Purchase Orders eg Transportation of Merchandise

A Generic Data Warehouse

Data Warehouse Phase 1 Generic Fact_ID DWH_Data_Type_Code (FK) Customer_ID (FK) Document_ID (FK) Event_ID (FK) Product_ID (FK) Reporting_Day_Date_Time (FK) Staff_ID (FK) Store_ID (FK) Supplier_ID (FK) Warehouse_ID (FK) Date_From Date_To Dimension_1 (FK) Dimension_2 (FK) Dimension_3 (FK) Amount Count Fact_1 Fact_2 Fact_3 Fact_4 Fact_5 Fact_6

Ref_Calendar Day_Date_Time Week_Number Month_Number Year_Number Dimension_1 Dimension_1 Dimension_1_Description Dimension_2 Dimension_2 Dimension_2_Description Dimension_3 Dimension_3 Dimension_3_Description

Best Practice in Data Management 28


2.7.5 Specific Data Warehouse This diagram shows the Generic Design above applied to Gift Card and Vendor Compliance data in a Retail organisation, (shown in red).
Data Mart for Gift Cards Fact_ID Gift_Card_Type (FK) Date_From (FK) Date_To (FK) Total_Card_Count Total_Card_Amount Data Mart for Vendor Compliance Ref_Calendar Day_Date_Time Week_Number Month_Number Year_Number Fact_ID Damage_Category_Code (FK) Date_From (FK) Date_To (FK) Damaged_Goods_Count Damaged_Goods_Amount

Ref_Gift_Card_Type Gift_Card_Type Gift_Card_Type_Description

Ref_Damage_Categories Damage_Category_Code Damage_Category_Description

DWH_Data_Types DWH_Data_Type_Code DWH_Data_Type_Name eg Gift Card Basic Data eg Gift Card Totals eg Vendor Compliance Totals

Data Warehouse Phase 1 Fact_ID DWH_Data_Type_Code (FK) Customer_ID (FK) Document_ID (FK) Event_ID (FK) Product_ID (FK) Reporting_Date_Time (FK) Staff_ID (FK) Store_ID (FK) Supplier_ID (FK) Warehouse_ID (FK) Damage_Category_Code (FK) Gift_Card_Type (FK) Total_Amount Total_Count Total_Card_Amount Total_Card_Count Damage_Goods_Amount Damage_Goods_Count

Best Practice in Data Management 29


2.7.6 Data Model for a Single Version of the Truth This involves establishing Master Catalogues for Customers, Products and Stores. The Data Model for the Customer Master Catalogue is shown here and is described in more detail in a separate document. This shows the tables involved in maintaining a Customer Master Index.
Customers and a Customer Master Index July 29th. 2011 Ref_Customer_Types Customer_Type_Code Customer_Type_Description eg Anonymous eg TK Maxx Care

Customer_Master_Index Customer_Master_ID Address_ID Customer_Type_Code (FK) PAF_File_ID (FK) Date_of_Verification

PAF_File_TBC PAF_File_ID PAF_File_Addtress

Customer_Data_Feeds Source_Customer_ID Customer_Data_Feed_ID Data_Source_Name (FK) Customer_Master_ID (FK)

Data_Sources Data_Source_Name Data_Source_Description eg Amex, TK Maxx Care

Credit_Card_Feed Source_Customer_ID (FK) Credit_Card_Feed_Details

Amex_Card_Feed Amex_Card_Number (FK) Amex_Card_Details

TK_Maxx_Care TK_Maxx_Care_ID (FK) TK_Max_Care_Details

Best Practice in Data Management 30

2.7.7 ETL Template for Validation Specifications This shows validation for the Event in the EDM DATA ITEM
event_id

TYPE VALIDATION
Unique Internal Identifier for each specific Integer Customer Event. Unique External ID for each specific Event. Text (15) Text (15) Optional and cannot be validated. Reference Data.

COMMENT
For example, a Customer makes an Appointment. For example, a Housing Repair Job Number.

Event Reference Number

Event Type

For example, Make an Appointment.

Associate Contact Establishment

Integer Link to Associate Table. Integer Link to Contact table. Reference Data. For example, a Court for Youth Offenders or a School for Pupils. For example, a Social Worker.

Integer Staff Optional, but links to Staff Table if Integer specified. Optional but links to Supplier Table if Integer specified. Optional or specified Address. Integer Event Outcome Text (15) Text (15) Reference Data.

Supplier

For example, a Housing Contractor for repairs. For example, where an Offence took place. For example, Satisfactory.

Event Address

Event Status

Reference Data.

For example, Cancelled, Pending.

Event Start Date & time Date Event End Date & time

Mandatory, < Today and Now.

Mandatory

Validation is > Start Date Date

Optional

Best Practice in Data Management 31


2.7.7 ETL Template for Entity Mapping Source Table Activity Aspect Complaint Complaint_Peo ple Contact Cost PersonInv Letters User UserGroups EDM Entity Event Event Event Customer Contact Event_Not es Customer Documents Staff Team Comment For example, send a Letter or carry out an Investigation. These are Issues Contains repeated Options for Gender, Handling Investigator, Stages etc.. Includes Complainants and Contacts, such as Edwina Currie. People contacted with regard to Complaints. For example, Compensation to a Complainant. Includes non-Customers, eg Contacts who have not made Complaints.

Teams of Staff, equivalent to Teams of Social Workers.

5.1.2.9 Blank Template for Data Profiling

DATA ITEM

DESCRIPTION

MIN VALUE

MAX VALUE

MOST COMMON VALUE

COMMENTS

5.1.2.10 Completed Template for Data Profiling

DATA ITEM Withdrawn Date

DESCRIPTION Date Customers Approval withdrawn

MINVAL Dec-311998

MAX VALUE Jan-12010

MOST COMMON VALUE Jun-15-2008

COMMENTS

Best Practice in Data Management 32


5.1.2.11 Blank Templates for Data Validation

DATA ITEM

DESCRIPTION

Nullable

RULES

DATE

% QUALITY

5.1.2.12 Completed Template for Data Validation

DATA ITEM Withdrawn Date

DESCRIPTION Date Customers Approval is withdrawn

Nullable Yes

RULES >Start Date

DATE

% QUALITY

5.1.2.13 Blank Templates for Mapping Specifications

ETL Transformations Project Title Known As: Development End date: Additional Comments: Trigger Source (eg Table) Data Item Data Type Target (eg XML File) Data Item Data Typ e Job Schedul e Rule Specificatio n

Best Practice in Data Management 33

Best Practice in Data Management 34


5.1.2.14 Completed Template for Mapping Specifications

Specifications taken from migrating sample Customer data. Mapping Specifications Project Title: Creation of a Data Extract for Customers Date: April 1st. 2010 Additional Comments: Stakeholders. Trigger Source These Specifications are subject to review by

When CUSTOMERS.DAT_VAL = SYSDATE (Include DB type and name) Data Type Target Field Name Column Data Type Transf Rule

Table

Column

Table

Office CUSTOMERS CUSTOMERS CUSTOMERS CUSTOMERS CUSTOMERS CUSTOMERS ID DAT_VALID PHON_NUMBER FAX_NUMBER TELEX_NUMBER E_MAIL_ADDRESS NVARCHAR2(8) DATE NVARCHAR2(35) NVARCHAR2(35) NVARCHAR2(35) NVARCHAR2(70) Country CUSTOMERS CUSTOMERS CUSTOMERS CUSTOMERS CUSTOMERS COUNTRY_ID TRADING_ROLE POST_CODE REG_CODE GEO_INF_CODE NVARCHAR2(2) NVARCHAR2(1) NVARCHAR2(9) NVARCHAR2(3) NVARCHAR2(8) OFFICE Code CHAR(2) Copy As is OFFICE Unique ID CHAR(8)

Best Practice in Data Management 35


2.7.3 Frequently Asked Questions

What 1 What is Data Integration ? Data Integration provides a one view of the truth for things of importance to the organisation, such as Customers, Products and Movements. It includes Data Quality, Master Data Management and mapping specifications. Here is the Web Link for the Questions on the Database Answers Web Site : http://www.databaseanswers.org/data_integration.htm Data Integration is concerned with combining data from various Sources into one consistent stream. It provides an essential Single View of Data, for example, a Single View of a Customer. It also provides a natural point at which Data Quality can be addressed. At this Stage, Data Quality can be assessed and a Single View of a Customer can be achieved. When Data Quality is of a uniform good quality, it can be integrated and made available as a consistent View. This will be supported using a Glossary, as described in the Information Catalog Stage. The current incarnation of Data Integration is Master Data Management,(MDM). Data Integration provides a one view of the truth for things of importance to the organisation, such as Traders, Products and Movements. It provides a natural point at which data quality can be addressed. When Data is of uniform good quality it can be integrated and made available as a consistent View. This leads naturally to Master Data Management,(MDM). Details of the Integration, such as mapping specifications, are held in a Glossary, which is described in Stage 7. Some key points : Data Integration is concerned with combining data from various Sources into one consistent stream. It provides an essential Single View of Data, for example, a Single View of a Customer. It also provides a natural point at which Data Quality can be addressed. At this Stage, Data Quality can be assessed and a Single View of a Customer can be achieved. When Data Quality is of a uniform good quality, it can be integrated and made available as a consistent View. This will be supported using a Glossary, as described in the Information Catalog Stage. The current incarnation of Data Integration is Master Data Management,(MDM). Data Integration provides a Single View of the Truth for the things of importance to the organisation, such as Traders, Products and Movements. It provides a natural point at which data quality can be addressed. When Data is of uniform good quality it can be integrated and made available as a consistent View. This leads naturally to Master Data Management,(MDM).

Best Practice in Data Management 36


Details of the Integration, such as mapping specifications, are held in a Glossary, which is described in Stage 7.

What.2 - What is Master Data Management (MDM) ? One of the major components in Master Data Management (MDM) is Customers. MDM can be defined a Providing a Single View of the Things of Importance within an organisation Master Data Management applies the same principles to all the Things of Interest in an organisation. This can typically include Employees, Products and Suppliers. We have discussed A Single View of the Customer and MDM involves the same kind of operations as a CMI. That is, identification and removal of duplicates, and putting in place to eliminate duplicates in any new data loaded into the Databases. There is a wide choice of software vendors offering MDM products. De-duplication and Address validation is a niche market in this area. On the Database Answers Web Site, there is a Tutorial on Getting Started in MDM There is a sister Web Site devoted to the topic of MDM-As-a-Service

What.3 - What are Conceptual, Logical and Physical Data Models ? Wikipedia has some useful entries on Conceptual Models, Logical Models and Data Models. Conceptual Data Models do not conventionally show Foreign Keys and are very useful for making clear the Entities and Relationships in a Data Model without any Keys or Attributes. They are very useful for discussing Requirements with Users because they show only the basics. Logical Data Models add Foreign Keys and Attributes. They are very useful for publishing a complete statement of the data involved. Physical Data Models are very close to the Database design. They are very useful for discussions between the Data Analyst, DBAs and developers.

What.4 : What does ETL stand for ? Wikipedia has an entry on ETL which is worth a look. ETL stands for Extract, Transform and Load. Extract means Extracting data from Data Sources. Transform covers many tasks, including O Selection of the data of interest o Validation and clean-up of the selected data o Changing the format and content of the data o Loading into the designated Target.

Best Practice in Data Management 37


In practice, there are three options for implementing ETL: Develop bespoke SQL Use a commercial package, such as Informatica or Microsofts Integration Services Some combination of these two. For example, developing basic SQL to clarify the Requirements and then looking for a commercial product to meet the Requirements. What.5 - What is Data Lineage ? Data Lineage can be defined as the ability to the trace the derivation of all items of data that appear in any important Performance Reports and Management Information. That includes : Who owns the original source data What validation and transformations are applied to the data in its life cycle

Why.1 - Why is this Stage important ? It provides one view of the truth It offers a point at which Data Integrity can be measured and User involvement obtained to improve Quality until it meets User standards.

How.1 - How do we get started ? Data Profiling is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started. For example, replace LTD by LIMITED (or vice versa), and & by AND. The Design Approach requires Data Models for the areas of the within Scope. It will also require Generic Data Models to support one view of the truth for major entities, such as Traders or Customers. This one view will be implemented as Master Data Management (MDM). Get a broad understanding of the data available Establish a common view of the Data Platform Get a broad understanding of Data Sources Determine the available Data Choose the MDM product Determine strategy for Clouds e.g. Reference Data available globally o o o o In 1 month, produce Generic Data Models In 3 months, confirm GDM with sample data and Facilitated Workshops and choose MDM product. In 6 months, implement MDM and publish GDM and CMI on the Intranet. Adjust timescales in light of experience

Data Integration covers a number of Steps, each of which can have its own Templates. Examples are included here for Data Profiling and Mapping Specifications.

Best Practice in Data Management 38


Step 1. Start with Data Profiling because it is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started. For example, replace LTD by LIMITED (or vice versa), and & by AND. Step 2. Determine the available Data Models for major areas of the enterprise. Step 3. Determine whether Generic Data Models are available to support one view of the truth for major entities, such as Customers or Offices. This one view approach will be implemented as Master Data Management. Step 5. Establish a common view of the Data Platform Reference Data, Customers, products, Movements and so on. Step 6. Determine the available Data Step 7. Choose an MDM product or decide on in-house SQL development.

How.2 - How do we follow Best Practice These Steps define a Tutorial of Best Practice : Step 1. Define the Target which is usually a Single View Data Model. Step 2. Define the Data Sources Step 3. Define the Mapping Specifications from the Sources to the Target. Step 5. Define the Data Platform Step 6. Identify Standards to be followed.

This Tutorial is described in detail in a separate document, entitled Data_Integration_Tutorial.doc These questions come from this page : http://www.databaseanswers.org/data_integration_questions.htm If you have a Question that is not addressed here, please feel free to email us your Question. How.3 - How do we measure progress in Data Integration ? Look for the existence of the following items : Generic Data Models An Enterprise Data Platform Identify the Data Sources Selection of a MDM Product Implementation of a Customer Master Index or appropriate alternative

Best Practice in Data Management 39


How.4 - How do we get started ? Data Profiling is a good starting-point for determining the quality of the data and drafting some simple validation and transformation that can be used to get started. For example, replace LTD by LIMITED (or vice versa), and & by AND. The Design Approach requires Data Models for the areas of the within Scope. It will also require Generic Data Models to support one view of the truth for major entities, such as Traders or Customers. This one view will be implemented as Master Data Management (MDM). Get a broad understanding of the data available Establish a common view of the Data Platform Get a broad understanding of Data Sources Determine the available Data Choose the MDM product Determine a Strategy for Clouds e.g. Reference Data available globally o o o In 1 month, produce Generic Data Models In 3 months, confirm GDM with sample data and Facilitated Workshops and choose MDM product. In 6 months, implement MDM and publish GDM and CMI on the Intranet.

Adjust timescales in light of experience Data Integration covers a number of Steps, each of which can have its own Templates. Examples are included here for Data Profiling and Mapping Specifications.

How.5 - How do I establish a Strategy for Data Quality ? A successful Strategy for Data Quality as an Enterprise Issue must include both organization and technical aspects. Typical Organization aspects are : Commitment from senior management Establishing the slogan Data Quality is an Enterprise Issue as a top-down edict. Identification of the Top 20 Applications and Data Owners across the Enterprise Agree sign-off procedures with Data Owners and Users Technical aspects Establish Key Quality Indicators (KQIs), for example Duplicate Customers records Agree target Data Quality percentage Define KQI Reports and dashboards Develop SQL to measure KQIs Define procedures to improve KQIs How.6 - How do I handle multiple types of Databases ? This could include Oracle, SQL Server and DB2. The key to handling multiple types of Database is to thing of them in terms of an Integrated

Best Practice in Data Management 40


Data Platforms, where all types of data are presented in a common fashion. This then defines the logical requirement. There is a then a number of options to physically meet this logical requirement. The Enterprise-level option is to use an appropriate commercial product, such as Informatica

2.7 Operational Data Store Layer


2.8.1 Best Practice Best Practice involves establishing documentation standards. For example, each table would be preceded by MIR_ to indicate that the structure of the table Mirrors the structure of the Data Source. A Staging Area will need to be established.

2.8.2 Blank Template

SOURCES

CONTACT

TYPE

DATA ITEMS

COMMENTS

2.8.3 Sample Completed Template

SOURCES CRM Billing Operation s

CONTACT Joe Bloggs Joe Bloggs

TYPE TBD Oracle DB

DATA ITEMS Customers Stores

COMMENTS Golden Source Official Source

Best Practice in Data Management 41


2.8.4 Frequently Asked Questions What.1 What are Data Sources ? A Repository for all major Applications, Databases, Spreadsheets and so on. Data and information related to each Stage in the Best Practice Road Map This includes details of People, Roles and Responsibilities, Applications, Databases

These questions are from this page : http://www.databaseanswers.org/data_sources.htm

Data Sources include all major places where important data is created or used, including : Applications Databases Spreadsheets XML files, and so on

It also includes Information related to each Stage in the Best Practice Road Map on People, Roles and Responsibilities. This Information is stored within an Information Catalog. A Repository record Data Sources for all major Applications, Databases, Spreadsheets and so on, data and information related to each Stage in the Best Practice Road Map This includes details of People, Roles and Responsibilities, Applications, Databases

Why.1 - Why Is this Stage important ? It is important because it provides the starting-point for a range of utilities which combine to provide very valuable functionality.

How.1 How do we get started ? These are the four basic Steps :Step 1. Agree the initial content and revise at regular intervals. Step 2. Identify individuals responsible for data gathering and dissemination. Step 3. Start with a bottom-up Approach and focus on working documents, such as Invoices or Movement Authorisations. .Step 4 Follow a top-down Approach and focus on Reports.

Best Practice in Data Management 42

Chapter 3. Data Architectures for the Future


Here si the LinkedIn Group for Data Architecture Professionals :0 1 3.1 Introduction
The Approach is as follows :1) Establish overall Data Architecture integrated Structured and Unstructured Data 2) Use Design Patterns eg Roles, User Profiles with Access Privileges and Publish/Subscribe 3) Design a Generic Skeleton Data Architecture and integrate data from sources, eg SAP or Oracle RMS.

3.2 Implementation
Within any large organisation there is a range of structured and unstructured data, including Databases, Email and documents stored in facilities such as Sharepoint. Sharepoint can be used for :o o o Project Documentation Minutes of Meetings Policies and Procedures

Email Archives with Search (eg Epsilon)] Registered users with profiles o Publish and Subscribe

3.3 A Skeleton Data Architecture


This diagram shows a suitable Skeleton Data Architecture that can provide a framework for the early development.

Best Practice in Data Management 43

Best Practice in Data Management 44 3.4 A Detailed Data Architecture


This diagram shows more clearly how Web Services could be used for CRUD operations.

Best Practice in Data Management 45 3.5 A Layered Data Architecture


This diagram shows four possible Layers in a Data Architecture :-

Reporting Layer

Data Warehouse and Mart Layer

Data Integration Layer Single Version of the Truth Enterprise Data Model - ETL

Operational Data Store Layer CRM Orders Billing Etc.

Best Practice in Data Management 46 3.6 Progress towards a Layered Data Architecture
This diagram shows the overall Architecture :-

BI Architecture

Business Requirements and Design Walkthrough


Deliverables - Best Practice and Templates

Reporting Layer
(BI, eg Ladder Plan and FBP)

Report Templates

Data Mart Layer

Generic Design and Models for Data Marts

Model for Data Warehouse


Atomic Data Layer
(linked to EDM)

Data Transformation Mapping to DWH in line with Enterprise Data Model

Data Quality (Validation and Cleanup) Data Profile

Operational Data Store Layer

Data Staging Area

Best Practice in Data Management 47 3.7 Evolution of the Data Integration Layer
There will be a number of Phases in the development of the Data Integration Layer (IDL). Six types of facilities can be added, in the sequence shown in the box.

These facilities can be added in four possible Phases :Phase 1 - Design a Thin Slice from end-to-end for a typical Business Scenario. Phase 2 Add Entities from analysis of the business functions involved. Phase 3 - Address Data Quality including data profiling, data consistency, correctness and so on. Phase 4 Enhance features in MDM, Data Warehouse and Web Services.

Specific features within the Data Integration Layer are as follows :-

Web Services

Flexible Data Warehouse


Single Version of the Truth

Atomic Data Layer


(linked to EDM)

Master Data Management Data Quality

Data Profiling

Best Practice in Data Management 48 3.8 Progress towards a Layered Data Architecture
This diagram is a simple way to keep track of progress towards the implementation of your Data Architecture. The yellow areas indicate specific Deliverables that have been produced.

Reporting Layer tores, etc. (BI, eg Ladder Plan and


FBP)

Data Mart Layer Data Integration Layer

Template produced for Sales Audit Report Generic Design completed

Work to be Completed Work to be Completed

Data Work to be Warehouse Completed design Enterprise Data Model Work to be Completed V.2 (EDM) design produced Mapping Template Work to be for EDM produced Completed Design produced for Single Version of the Truth for Customers, Organisation, Products and Stores Template produced for Data Work to be Completed Quality Validation and Clean-up Template Work to be Completed produced for Data eg Information Catalogue Profiling Design produced Generic Review to be completed

Operational Data Store Layer

Best Practice in Data Management 49 3.6 The Methodology


This is the Overall Generic Skeleton Data Architecture. It is very important because it shows on one page the entire range of Data Management activities, starting with Data Sources and ending with Data Marts for BI and end-user Reports.

Stages and Data Sources in the Lifecycle of a Customer ... Plan Forecasting System Buy
KPI=Vendor Delivery

Deliver

Inventory

Sell eg POS

Analyse Performance Reporting

ETL Data Quality, Mapping Entities and Fields to EDM

Enterprise Data Model (EDM) Single Version of the Truth Mapping from EDM to DWH

Data Warehouse (DWH)

Data Mart 1

Data Mart 2

Data Mart 3, etc.

Rpt Family 12

Rpt Family 22

Rpt Family 3, etc.

Best Practice in Data Management 50

3.7 A Data Architecture for the Future


This Section describes an Architecture that can be used as a target To-Be for the future. This diagram shows more clearly how Web Services would be used with an ESB providing an Abstraction Layer. Data Virtualisation is a concept in Implementation that can be defined as the process of providing users with a Business View of the data in an organisation, which conceals the technical details of stored data, such as location, storage structure, APIs and storage technology.

Ubiquitous Data Available at Any Time, and Any Place with Any Device

Reference Data in the Clouds - Currencies, Languages

Master Data in the Clouds - Products/Services, Customers

Web Services Design Enterprise Service Bus (ESB) Enterprise Data Model Data Virtualisation (Middleware) IBM, Informatica, Microsoft, Oracle, etc.

Implementation

Data Sources / Databases CSV, ODBC, Oracle, SAP, SQL Server, etc.

Best Practice in Data Management 51

Chapter 4. Data Management Getting Started


Here is the LinkedIn Group for Enterprise Data Management :-

http://www.linkedin.com/groups?home=&gid=142267&trk=anet_ug_hm

And the Group for Master Data Management :-

http://www.linkedin.com/groups/Master-Data-Management-Interest-Group-53314

4.1 Introduction
This Chapter presents a Strategy for Enterprise Data Management. It presents some reasons why you should have a Strategy and it then recommends a tried and tested approach to putting a Strategy in place.

4.2 Why do you need a Strategy ?


Are you having trouble getting to grips with all the data and information in your organisation ? Then you need a Strategy for Data Management These are the Benefits : You can be sure that your organisation is fully compliant with all the statutory regulations on data protection and governance (Data Governance) The performance of your organisation is reported correctly and consistently (Performance Reports) You can trust all your data to be accurate and up-to-date when you make your business decisions (Data Governance) You are provided with a 'Single Version of the Truth' in your organisation (Data Integration) You can access any piece of data at any time from any device or location to support decision-making (Data Integration) Your data is protected and secure. (Data Governance) You will know where all your corporate data is stored (Information Catalogue)

4.3 How do you get Started ?


There are four simple steps to getting started :Step 1. Establish the As-=Is Use our Road Map to check out where you stand in the Maturity Model Step 2. Determine the To-Be Then plan your own Strategy for Data Management. Step 3. Download our Strategy to help you in your planning. Step 5. Finally, send us an Email to let us know how you get on.

Best Practice in Data Management 52 4.4 Establish the As-Is


4.4.1 The Maturity Model We have defined a Maturity Model for any organisation in its Data Management practice. In this Model there are four possible States :1. Nothing is in place 2. Basic 3. Average 4. Ideal

These four States are shown in this diagram, along with an indication of the characteristics of each Stage:-

4.4.2 Self-Assessment This Section provides more details for the As-Is evaluation. It can be used as a method of Self-Assessment for you to identify where an organisation is situated in the maturity of each Component in Enterprise Data Management.

STAGE Nr.
1) Data Sources

BASIC
Knowledge in the heads of individuals.

AVERAGE
Top 20 Applications known with list of Data Sources and Owners. Basic Data Dictionary in place.

IDEAL
Agile development with refactoring techniques.

No Data Models and poor documentation of links between code and databases. 2) Data Integration Ad-hoc integration using bespoke SQL Scripts

Some Templates established and commercial Tools in

Data Models and signoff by DBAs on all changes. User access and sign-off for Data Dictionary MDM approved, data owner sign-off,Data Quality is an Enterprise

Best Practice in Data Management 53


use. Software Tools linked to the Data Dictionary issue. Clear and reconciled top-down and bottomup views of data. Data Architecture and Data Models for Sources and Targets. MDM is in place

Nobody understands Master Data Management (MDM) 2) Data Quality DQ level Policy 3) Performance Reports 4) Mashups 5) Data Governance

MDM is planned

Could be improved Under consideration One-off, often independent Dept. Spreadsheets None None

Sufficient Under implementation Independent Maps, KPIs and drill-down to detailed Reports Isolated development No end-to-end agreement.

6) Information Catalogue

Uncoordinated Word documents and Spreadsheets

Stand-alone Tool

Accurate, valid and relevant Established as Enterprise Issue Integrated Maps, KPIs and drill-downs for Chief Exec Users aware Procedures published, Roles and Responsibilities and Sign-off all in place. Data Lineage known and auditable. Provided over the Intranet for User SignOff

Best Practice in Data Management 54 4.5 Determine the To-Be


This Strategy Planner will help you in deciding where to want to get to in the medium-term future. It is sensible to go one step at a time. In other words, if your As-Is is that nothing is in place, then you should aim for the Basic State. Once you achieve the Basic State, you can aim for the Average State, and after that the Ideal State. It is important to move at a pace that is acceptable to your organisation. This pace will affect your overall timetable. You need buy-in from all stakeholders and interested parties, including management, users and IT people.

Best Practice in Data Management 55 4.6 Building Blocks in the Road Map
This diagram shows the six Building Blocks in the Road Map. They start at the top with Data Governance and then go down to the Information Catalogue which is used to record all the important information the Data Management within an organisation.

Best Practice in Data Management 56

Chapter 5. Data Warehouses and Data Marts


Here is the link for LinkedIn Group on Data Warehousing Professionals :-

http://www.linkedin.com/groups/Data-Warehousing-Professionals-Group-is-124955.S.39415157

and Data Warehouse Architecture :-

http://www.linkedin.com/groups/Data-Warehouse-Architecture-1377377? gid=1377377&mostPopular=&trk=tyah

And here is highly recommended The Data Warehousing Institute (TDWI) :-

http://tdwi.org/

5.1 Introduction
This document describes a Data Warehouse Strategy. It is divided into three Chapters a Business perspective, a Data Architecture perspective and a Plan. The proposal is to start with Credit Cards in a Retail organisation as the first Project. The format of sample transaction data can be drafted, and sample Reports are also available. Therefore, it will be possible to define both the Data Source and sample User Requirements and to design the Data Architecture that will integrate these two.

5.2 A Business Perspective


5.3.1 Scope Our objective is to put in place an Integrated Data Layer to provide data which is integrated across the organisation. This will provide a Single View of the Truth to ensure that BI and Performance Reports for senior management contain consistent data. This will also support Sarbanes-Oxley and other Statutory requirements.

Plan

Buy

Promot e

Sell

Analyse

LadderPlan

POTS

POCC Inventory (SIMS)

POS

Sales Audit

Best Practice in Data Management 57

5.3.2 A Strategy for Data Warehousing This document describes a Data Warehouse Strategy. It recommends two Phases :Phase 1. Installing a single Data Warehouse to store all enterprise Data We will also install tuning software to monitor performance to predict when we need to move to Phase 3. Phase 3. Adding Data Marts dedicated for specific functional areas We show examples of the analysis of Card Cards and of Damaged Goods supplied by Vendors

5.3.3 Template for User Requirements There are two aspects to the specification of User Requirements : The totals required, and other derived figures

For example, total count and value of Card Card usage

The headings under which the totals are required.

Total for all Card Cards

For example, the requirement might be for the total number of transactions and their value on Card Card on a weekly basis in a specific period of time. In this way, a sample Report layout can be drafted for approval by the User.

Report Name : Card Card Weekly Report Period starting : January 1st. 2011 Week 1 Credit Card Count Total Value Week 2 Week 3 Week 4 Week 5 Week 6

5.3.4 Benchmarks This Section establishes Benchmarks that will be used to evaluate predicted performance from a Netezza Appliance,

Best Practice in Data Management 58


A three-second response time is acceptable. Size of User Community = Roles include : Enquiries and average number of Users and Enquiries per day are estimated.

Best Practice in Data Management 59 5.3 A Data Architecture Perspective


This diagram shows the four major Stages in delivering the Single View of the Truth :-

5.4 What are Data Marts and Data Warehouses ?


An Enterprise Data Warehouse is a repository of all the data within the enterprise that is used in Performance Reports. It is intended to guarantee a Single Version of the Truth . It typically stores very detailed data, such as Sales Receipts, and can maintain historical data going back many years. A Data Mart, on the other hands, stores data for specific functional areas, such as Purchases and Sales, and the data is usually limited in timescale and might even have a limited life span. A Data Warehouse can be either a single very large Repository of Data or it can be built as an interlocking set of Data Marts. Each Data Mart would store data related to a separate business area, such as Sales, or for a specific Report or family of Reports. In passing, we can mention that there are two well-known authorities in the broad field of Data Warehouses and Data Marts. The first is Bill Inmon, who favours the first approach of large, all-encompassing Data Warehouses.

Best Practice in Data Management 60


The second is Ralph Kimball, who favours related Data Marts. Inmon and Kimball both write well and present convincing arguments for their points of view. A sensible approach is to start with a single Data Warehouse and then to create Data Marts for specific business requirements as they occur. In order to link Data Marts, they need to share the same values for the same Dimensions, such as Stores or Products. These are called Conformed Dimensions. Without Conformed Dimensions, it is impossible to compare and accumulate related values in Data Marts.

5.5 An Agile Approach


An Agile Approach is very important because it is inevitable that user requirements will change from time to time. We can predict three Phase in the evolution of User Requirements :5. Give me everything 6. Give me these Reports on a regular basis and give me an ad-hoc Enquiry facility. 7. I want integrated KPIs and Dashboards 8. I want to be notified automatically if I have any situation requiring urgent attention In order to meet these Requirements, we need to put in place an Agile Data Warehouse with flexible Data Marts and an integrated BI Toolkit.

5.6 Conformed Dimensions


A Conformed Dimension is one that has the same value across all Subject Areas. Conformed Dimensions are therefore often Master Data. Although, of course, a Conformed Dimension is not necessarily Master Data, The best example is probably Dates. In this example, the Date field is a Conformed Dimension for the Purchasing and Sales Data Marts, but Suppliers and Stores are not. Ticket Dimension Sales Data Stores Purchasing Calendar Suppliers Mart -Date of Dimension Number -Ticket Data Mart Dimension Dimension Sale -Store Store Number -Date PO - Date Supplier Number Issued ID -Supplier ID -PO ID

Best Practice in Data Management 61

5.8.1 Conformance Analysis This table conveys the levels of conformance within a Dimension by grouping the base Dimension with conformed rollup totals. The two let-hand columns are Dimensions and the top column shows Facts. The Yes fields indicate the Dimensions that have to be conformed in order for the analyses to be valid. They show that if we have Product-level data then the Product is a conformed dimension. They are illustrative for discussion purposes.

Orders Date Day Week Month Product Product Category Organisation Warehouse Store Division Yes Yes Yes

Shipments Yes

Inventories

Sales Yes

Returns Yes

Demand Forecast

Yes

Yes

Yes Yes Yes

Yes

Yes

Yes

Yes Yes

Yes

Yes

Yes

Best Practice in Data Management 62 5.7 Data Models


5.9.1 Data Warehouse ERD This diagram shows the Entities that contribute to the Data Landscape for the Warehouse.
Addresses_

Customers_

Documents_

Ref_Address_Types Ref_Colours Data_Warehouse_Facts_

Events

Ref_Customer_Types Ref_Document_Types

Products_ Ref_Event_Types Staff Ref_Payment_Methods Stores Suppliers_ Suppliers_Addresses_ Ref_Sizes Ref_Supplier_Status

Warehouses

Suppliers_Products_

Best Practice in Data Management 63


5.9.2 Data Warehouse Generic This diagram shows the Entities that contribute to the Data Landscape for the Warehouse.
DWH_Data_Types DWH_Data_Type_Code DWH_Data_Type_Name eg Gift Card Basic Data eg Gift Card Totals eg Vendor Compliance Totals

Data Warehouse Phase 1 Generic Fact_ID DWH_Data_Type_Code (FK) Customer_ID (FK) Document_ID (FK) Event_ID (FK) Product_ID (FK) Reporting_Day_Date_Time (FK) Staff_ID (FK) Store_ID (FK) Supplier_ID (FK) Warehouse_ID (FK) Date_From Date_To Dimension_1 (FK) Dimension_2 (FK) Dimension_3 (FK) Amount Count Fact_1 Fact_2 Fact_3 Fact_4 Fact_5 Fact_6

Ref_Calendar Day_Date_Time Week_Number Month_Number Year_Number Dimension_1 Dimension_1 Dimension_1_Description Dimension_2 Dimension_2 Dimension_2_Description Dimension_3 Dimension_3 Dimension_3_Description

Best Practice in Data Management 64


5.9.3 Data Warehouse Example of Specific Design This diagram shows the Generic Design above applied to Card Card and Vendor Compliance data. Fields in red are specific to Credit Card processing within Retail organisations.
Data Mart for Gift Cards Fact_ID Gift_Card_Type (FK) Date_From (FK) Date_To (FK) Total_Card_Count Total_Card_Amount Data Mart for Vendor Compliance Ref_Calendar Day_Date_Time Week_Number Month_Number Year_Number Fact_ID Damage_Category_Code (FK) Date_From (FK) Date_To (FK) Damaged_Goods_Count Damaged_Goods_Amount

Ref_Gift_Card_Type Gift_Card_Type Gift_Card_Type_Description

Ref_Damage_Categories Damage_Category_Code Damage_Category_Description

DWH_Data_Types DWH_Data_Type_Code DWH_Data_Type_Name eg Gift Card Basic Data eg Gift Card Totals eg Vendor Compliance Totals

Data Warehouse Phase 1 Fact_ID DWH_Data_Type_Code (FK) Customer_ID (FK) Document_ID (FK) Event_ID (FK) Product_ID (FK) Reporting_Date_Time (FK) Staff_ID (FK) Store_ID (FK) Supplier_ID (FK) Warehouse_ID (FK) Damage_Category_Code (FK) Gift_Card_Type (FK) Total_Amount Total_Count Total_Card_Amount Total_Card_Count Damage_Goods_Amount Damage_Goods_Count

Best Practice in Data Management 65 Appendix 5.A Some Statistics


5.A.1 Introduction This Section defines the information required to do study of the performance of a Netezza Appliance for some typical data volumes and User demands.

5.A.2 Volumes These statistics can be used in an evaluation of the performance of a Data Warehouse Appliance, such as IBMs Netezza or Oracles Exadata.

VOLUMES PHASE
Plan Buy

SYSTEM
Ladder Plan POTS POCC

FREQUENCY
Weekly

DAILY

WEEKLY MONTHLY

ANNUAL

Promote Sell Analyse POS Sales Audit Minute-by-Minute

5.A.3 Growth in Volume This table shows some predicted growth in the volume of data that will need to be stored in the Data Warehouse.

PHASE
Plan Buy

SYSTEM
Ladder Plan POTS POCC

DAILY

WEEKLY MONTHLY

FIVE YEAR TOTAL

Promote Sell Analyse POS Sales Audit

Best Practice in Data Management 66

5.A.4 User Workload You will need to provide more specifics are here and these can be obtained from discussion with Users. This table shows some typical Enquiries and Reports that Users might want to run. PHASE Plan Buy Promote Sell Analyse POS Sales Audit Minute-by-Minute Total Daily Sales SUM(Sales) GROUP BY Day SYSTEM Forecasting PO Tracking FREQUENCY Weekly ENQUIRIES DATA

Best Practice in Data Management 67

Chapter 6. Design Patterns for Data Models


Heres an interesting discussion on LinkedIn What is your favourite Data Modelling Tool ? :-

http://www.linkedin.com/groupItem? view=&srchtype=discussedNews&gid=988447&item=29745443&type=member&trk=e ml-anet_dig-b_pd-ttl-cn

And heres the LinkedIn Group on Data Modelling and Meta Data Management:-

http://www.linkedin.com/groups?home=&gid=988447&trk=anet_ug_hm

6.1 Why are they important ?


Design Patters are very helpful for Quality Assurance of Data Models. This applies particularly to Models produced by Third Parties.

6.2 Concepts
6.2.1 One-to-Many Relationships A Customer can place many Demands or Orders for Products. This defines a One-to-Many Relationship.

A Data Modeller would say For every Customer, there are many Demands or Orders. This is shown in a Data Model as follows :-

6.3.2 Many-to-Many Relationships We can also say that a Order can request many Products.

Best Practice in Data Management 68


A Data Modeller would say A Order can request many Products, and each Product can be in many Orders. This defines a Many-to-Many Relationship and is shown in a Data Model as follows :-

Many-to-Many Relationship cannot be implemented in Relational Databases. Therefore we resolve this many-to-many into two one-to-many Relationships, which we show in a Data Model as follows :-

When we look closely at this Data Model, we can see that the Primary Key is composed of the Order_ID and Product_ID fields. This reflects the underlying logic, which states that every combination of Order and Product is unique. In the Database, this will define a new record. When we see this situation in a Database, we can say that this reflects a many-to-many Relationship. However, we can also show the same situation in a slightly different way, which reflects the standard design approach of using a surrogate key as the Primary Key and showing the Order and Product IDs simply as Foreign Keys.

The benefit of this approach is that it avoids the occurrence of too many Primary Keys if more dependent Tables occur where they cascade downwards. The benefit of the previous approach is that it avoids the possibility of orphan records in the Products in a Order table.

Best Practice in Data Management 69


In other words, invalid records that have invalid Order ID and/or Product ID values.

6.3.3 Rabbits Ears We start with the definition of a Unit, which at its simplest, looks like this :In this case, we use a meaningless ID which is simply a unique number.

Then we think about the fact that every Unit is part of a larger organisation. In other words, every Unit rreports to a higher level within the overall organisation.

Fortunately, we can show this in a very simple and economical fashion by creating a relationship that adds a parent ID to every Unit. This is accomplished by adding a relationship that joins the table to itself.

This is formally called a Recursive or Reflexive relationship, and informally called Rabbits Ears, and it looks like this :-

Best Practice in Data Management 70

The Unit at the very top of organisation has no-one to report to, and a Unit at the lowest level does not have any other Unit reporting to it. In other words, this relationship is Optional at the top and bottom levels. We show this by the small letter O at each end of the line which marks the relationship.

6.3.3 Inheritance Inheritance is a very simple and very powerful concept. We can see examples of Inheritance in practice when we look around us every day.

For example, when we think about Houses, we implicitly include Bungalows and Ski Lodges, and maybe even Apartments, Beach Huts and House Boats.

In a similar way, when we discuss Aircraft we might be talking Rotary and Fixed Wing Aircraft.

However, when we want to design or review a Data Model that includes Aircraft, then we need to analyse how different kinds of Aircraft are shown in the design of the Data Model.

We use the concept of Inheritance to achieve this. Inheritance is exactly what it sounds like. It means that at a high level, we identify the general name of the Thing of Interest and the characteristics that all of these Things share. For example, an Aircraft will have a name for the type of Aircraft, such as Tornado and it will be of a certain type, such as Fixed Wing or Rotary.

At the lower level of Fixed-Wing Aircraft, an Aircraft will have a minimum length for the runway that the Aircraft needs in order to take off.

Best Practice in Data Management 71

Best Practice in Data Management 72


This situation is shown in the following diagram :-

6.3.4 Reference Data Reference Data is very important. Wherever possible, it should conform to appropriate external standards, particularly national or international standards. For example, the International Standards Organization (ISO) publishes standards for Country Code, Currency Codes, Languages Codes and so on. For Materiel and Products, NATO has published the National Codification Bureau (NCB) code. This is in use within the UK MOD and is administered from Kentigern House, Glasgow.

Best Practice in Data Management 73


This diagram shows two examples of Reference data that might apply to Aircraft.

Best Practice in Data Management 74


6.3.5 Data Warehouses This Data Model is an Entity-Relationship-Diagram (ERD) for Customers and Orders :-

We could describe it in these terms :Customers place Orders for Products of different Types.

Best Practice in Data Management 75


This Data Model shows the corresponding Data Warehouse for Customers and Orders :-

The design of this Data Warehouse simply puts all data into a big basket

6.3.6 Reviewing the Design of a Data Warehouse The design of any Data Warehouse will conform to this patter with Dimensions and Facts. Dimensions correspond to Primary Keys in all the associated Tables (ie the Entities in the ERD) and the Facts are the derived values that are available.

Therefore, reviewing the Design of a Data Warehouse involves looking for this Design Pattern.

With one exception, the Relationships are optional because the Enquiries need not involve any particular Dimension.

The one exception to this rule is that the Relationship to the Calendar is mandatory because an Enquiry will always include a Date. Of course, an Enquiry might include all data since the first records, but the principle still applies.

Best Practice in Data Management 76


The purpose of the Data Warehouse is to make it easy to retrieve data in any combination in order to answer questions like this :-

Which Customers ordered the most Products ? Which were the most popular Products in the first week of April ? What was the average time it took to respond to Orders for Aircraft Engines ? How many Orders did we receive in May ?

6.3 Applications

Customers and Orders


The design of the ERD in the Chapter on Data Warehouses shows a typical Customers and Orders Data Model which represent a widespread kind of application.

Units and Orders


Here is a slightly different Model showing Units instead of Customers and highlighting the power of Rabbits Ears.

Best Practice in Data Management 77

Deliveries

A Simple Design Pattern


This Data Model is a simple Design Pattern that covers the activities of delivering items in a Order to a designated address. The process of reviewing a Data Model is to ask :How do I describe the Business Rules behind this Model ? In this case, we could say :A Customer can raise an Order for Products to be delivered to a specified Address.

Best Practice in Data Management 78

Best Practice in Data Management 79

A Complex Design Pattern


This shows a complex Pattern which adds Regular Orders.

Best Practice in Data Management 80

Maintenance

The scope of this Data Modelis the Maintenance of Assets by Third-Party Companies. The Business Rules state :* * * * * * An Asset can have a Maintenance Contract. An Asset consists of Asset Parts Faults occur with these Parts from time to time. Third Party Companies employ Maintenance Engineers to maintain these Assets. Engineers pay Visits which are recorded in a Fault Log. They correct the Faults and this is recorded in the Fault Log.

Best Practice in Data Management 81


This page on our Database Answers monitors the current players :-

http://www.databaseanswers.org/data_warehouse/index.htm

Chapter 7. Enterprise Data Models


7.1 How to use an Enterprise Data Model
This Chapters describes how to use an Enterprise Data Model to load data into the Enterprise Data Warehouse in a consistent manner to ensure a Single View of the Truth.

7.2 A Canonical Enterprise Data Model (EDM)


This Model is a generic Design Pattern for the business activities of any commercial organisation that sells Products and Services to Customers.

In many organisations have an EDM which is a great deal more complex than this one. We are using this one because it is very simple and therefore useful for this Chapter but at the same time, it applies to a wide range of complex organisations.

CUSTOMERS

PRODUCTS / SERVICES

EVENTS

ORGANISATIO N

SUPPLIERS

DOCUMENTS

Examples are : An example of a Document is a Sales Receipt. An example of an Event is a Customer making a Purchase using a Card Card. An example of the Organisation is a Store An example of an Organisation is an Associate

Best Practice in Data Management 82 7.3 A Library of Design Patterns


7.5.1 A Customer Purchase This Section shows how our Canonical Enterprise Data Model (EDM) defines a Generic Design Pattern for a Retail organisation. This diagram provides a template. Arrows point from Children to parents. Look for one of these Design Patterns to provide your starting-point. If you do not find one, please let us know so that we can develop one for you.

CUSTOMERS

PRODUCTS

EVENTS Customer Purchase

ORGANISATION - eg Staff

Payment Method eg Credit Card


7.5.2 Process Purchase Order 7.5.3 Online Sales 7.5.4 A new Application

DOCUMENTS - Sales Receipts

This Section discusses how the Generic Design Pattern applies to the processing of a Purchase Order (PO). Enter your new Line of Business (LOB) Application here. This could be POS, Inventory Control and so on.

Best Practice in Data Management 83 7.4 Create the LOB Business Data Model
7.6.1 Description The first Step is to create the LOB Business Data Model for the new area. This example provides a template. Create your own Business Data Model here, based on a Design Pattern.

CUSTOMERS
Anonymous

Tender Types
eg Card Cards - Card Nr - Date of Issue - Card Balance - Date last Used

Retail Transactions
- Transaction ID - Transaction Type - Card Nr - Store ID - Staff ID Date of Transaction Amount Adjustment Amount

ORGANISATION
- Staff - Stores

Transaction Types eg
Make purchase, Load money on card, make

Sales Receipts

Best Practice in Data Management 84


7.6.2 Step 7. Map the Entities The objective at this stage is to validate Entity-level mapping from the Data Source to the EDM. This table provides a Template. EDM Customers Documents Events Organisation Products Suppliers Put your Entity details in this column CREDIT CARD Customers Sales Receipts Retail Transactions Staff (Associates),Stores Adjustments, Purchases COMMENTS

Tender Type

Credit Card

Reference data

Best Practice in Data Management 85


7.6.3 Step 8. Map the Fields to the EDM This table provides a Template. EDM Table name : EVENTS EVENT Field Card Number Customer ID Event Date and Time Event Outcome Event Status Event Type Product ID Staff ID Store ID Supplier ID Dimension 1 Dimension 2 Dimension 3 Put your Field details in this column CREDIT CARD TRANSACTION - Field Card Number N/A Transaction Date and Time N/A N/A Make Payment N/A Employee Details Store Details N/A

Event Amount Event Count Fact 1 Fact 2 Fact 3

Card Balance N/A These three fields are Generic and provide future-proofing

Best Practice in Data Management 86

7.6.4 ERWin Data Model for new LOB This is where you design a more formal version of the Data Model shown above.

7.6.5 Define the Transaction Data 7.6.7.1 Structure of Transaction Data There will usually be a table of Transactions data to be migrated and loaded into the Data Warehouse. This table can be used as a Template. COLUMN NAME
Acct TranStatus TranRespCode TranType CardType Store ChainID Terminal Invoice StrTranNo AuthCode TranDate Trantime Amount TranAmount Swipe GcUsed_On Reconcile Reconcile_info GcTran_Amount McTran_Amount Notes Reason User_Updated

DATA TYPE
varchar varchar Int varchar varchar Int varchar Int decimal decimal varchar datetime varchar decimal decimal varchar varchar varchar varchar decimal decimal varchar varchar varchar

DESCRIPTION LENGTH
Account 20 2 4 50 10 4 10 4 9 9 10 8 10 9 9 2 10 2 100 9 9 200 200 50 User Updated GC Tran Amount MC Tran Amount Notes Reason Authorisation Code Transaction Date Transaction Time Amount Transaction Amount Swipe GC Used On Reconcile Reconcile Info Invoice Str Transaction Number Card Type Store Number Chain ID Terminal Transaction Status Transaction Response Code Transaction Type

COMMENT
Card Number ?

DIM/FACT
DIM DIM DIM

Ref/ Master
Ref Ref Ref Ref Master

Issue, Load, Sale

DIM DIM DIM DIM

The Register or Till For Bulk Activation Generated Register receipt Always generated

DIM DIM DIM DIM FACT FACT Ref

? ? Cash or Key

FACT FACT DIM DIM DIM

Used in Bulk Activation

DIM FACT FACT Not used ?

Standard Codes exist

Not used DIM

Best Practice in Data Management 87


UserId Promo_ID Actioncode Flag1 Flag2 Flag3 Flag4 Flag5 OrderStatus Skey varchar varchar int varchar varchar varchar varchar varchar varchar int 12 10 4 10 10 10 10 10 Order Status 100 4 S Key Used in Bulk Activation User ID Promotion ID Action Code DIM DIM DIM Not used Not used Not used Not used Not used DIM Not used

7.6.7.2 Define the Transaction Data to be Migrated This table provides a Template. TBC=To Be Confirmed. DESCRIPTION Action Code Card Number Card Type Chain ID GC Used On Invoice Order Status Staff ID Store Number Swipe Till (Terminal) Transaction Status Transaction Response Code Transaction Type Promotion ID Notes Reason Reconcile Reconcile Info Swipe User Updated User ID Str Transaction Number Authorisation DIM/FACT DIM DIM DIM TBC DIM TBC DIM TBC DIM DIM DIM DIM DIM DIM DIM Not used ? Not used DIM DIM DIM DIM DIM FACT FACT REQUIRED Y/N (TBC) Y Y Y N N (TBC) N Y Y Y N N (TBC) Y Y Y Y N N N N N N (TBC) N (TBC) Y N

Best Practice in Data Management 88


Code Transaction Date Transaction Time Amount Transaction Amount GC Tran Amount MC Tran Amount

FACT FACT FACT FACT FACT FACT

Y Y Y Y Y Y

7.5 Data Warehouses


7.7.1 Card Card DWH This shows the first draft of the Data Warehouse design. DESCRIPTION Data Type Card Number Card Type Order Status Staff ID Store Number Transaction Status Transaction Type Promotion ID Transaction Date Transaction Time Amount Transaction Amount GC Tran Amount MC Tran Amount DIM/FACT DIM DIM DIM DIM TBC DIM DIM DIM DIM FACT FACT FACT FACT FACT FACT REQUIRED Y/N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y

Best Practice in Data Management 89

7.7.2 ERWin Model for Data Warehouse This Data Model shows the first draft of the Data Warehouse.
Ref_Calendar_ Day_Date Year_Nr Month_Nr

Data Warehouse for Gift Card Data Barry Williams June 27th 2011

Data_Categories_ Data_Category_Code Data_Category_Description eg Gift Card Transactions Cards_ Card_Number Card_Details EVENTS_ Event_ID Event_Details Promotions_ Promotion_ID Promotion_Details Staff_ Staff_ID Staff_Details eg Associate eg Buyer, Manager Stores_ Store_ID Store_Details

Data_Warehouse_Facts_jun27th Fact_ID Card_Number (FK) Card_Type_Code (FK) Customer_ID Data_Category_Code (FK) Day_Date (FK) Dimension_Code Document_ID Event_ID (FK) Event_Outcome_Code Event_Type_Code (FK) Order_Status_Code (FK) Promotion_ID (FK) Staff_ID (FK) Store_ID (FK) Tender_Type_Code (FK) Transaction_Date (FK) Transaction_Time (FK) Transaction_Type_Code (FK) Amount Transaction Amount GC Tran Amount MC Tran Amount Totals, Graphs, Trends Other Derived Figures Ref_Card_Type_ Card_Type_Code Card_Type_Description Ref_Event_Type_ Event_Type_Code Event_Type_Description eg Customer makes Purchase Ref_Order_Status_ Order_Status_Code Order_Status_Description Ref_Tender_Type_ Tender_Type_Code Tender_Type_Description eg Gift Card - SVC Ref_Transaction_Status_ Transaction_Status_Code Transaction_Status_Description Ref_Transaction_Type_ Transaction_Type_Code Transaction_Type_Description

Best Practice in Data Management 90 7.6 Reports and Data Marts


7.8.1 Sample Sales Report Layout The figures in this Report are fictitious and bear no relationship to reality.

DATE

REGION

PRODUCT

SALES (000s)

REFUNDS

June-2011 June-2011 June-2011

North South East

Mens Blue Superior Mens Blue Superior Mens Blue Superior

1,2 34 3.34 5 6.45 6 2 3 1

Put your Report details here.

Best Practice in Data Management 91


7.8.2 Data Model (Word)

Month DIMension

Region DIMension

Product DIMension

Data Mart for Sales Report Year Number - Month Region - Product Sales Amount Refund Count

7.8.3 Data Model (ERWin)

Ref_Year_ Year_Nmber

Ref_Month_ Month_Number

DataMart_for_Aging _Report Year_Nmber (FK) Month_Number (FK) Card_Count GC_Balance MC_Balance Total_Balance

Best Practice in Data Management 92

Chapter 8. Knowledge Management


8.1 What is Knowledge Management (KM) ?
Wikipedia says (http://en.wikipedia.org/wiki/Knowledge_management) Knowledge Management (KM) comprises a range of strategies and practices used in an organization to identify, create, represent, distribute, and enable adoption of insights and experiences. Such insights and experiences comprise knowledge, either embodied in individuals or embedded in organizational processes or practice.

The key aspects are the insights and experience. These are specific to a particular problem or area of activity. The way that they are structured and the manner in which they can be searched and enhanced are important for their usability.

KM is built on a Knowledgebase

which is a Repository of insights and experience.

This diagram shows how a Knowledgebase(KB) can be created to answer Questions about Data Management. It is shown on this page of the Database Answers Web Site :-

http://www.databaseanswers.org/A_Self_Service_Web_Site.htm

A Feedback facility is provided so that the Knowledgebase can be enhanced in response to the experience of the Users. A Personal Workspace makes it possible for specific Users to save their results over a series of sessions. In this way, they can learn from their experiences and the Workspace can be analysed to improve the performance in the future.

8.1.1 Why is it important ?


KM represents the ability for professionals to share experience, insights and solutions for common problems. They can work for the same organisation of different organisations. Very often they make contact indirectly, Some years ago I worked with a major commercial organisation in the UK that wanted to establish a KM function for CRM. The first problem was how to identify the interested parties. I suggested that we put an item on the Intranet Home Page asking interested parties to contact us.

Best Practice in Data Management 93


We received responses from 24 different units within the organisation ;-)

8.1.2 What is the State-of-the-Art?


The current State-of-the-Art is represented by LinkedIn, which is the Web Site that KM professionals use to make contact with their peer group :-

http://www.linkedin.com/

There is currently no Group specifically for KM, but there are over 1 million individual Groups, each of which is a KM Group dedicated to a specific area of interest.

8.2 An Architecture for a KM System

8.3 How to be a Knowledge Manager


The key to being a successful Knowledge Manager is to combine : An interest in the subject area, such as Data Management

Best Practice in Data Management 94


An understanding of the reasoning behind a typical Users Question and Answer Scenarios. To know what kind of information can easily be include in the Knowledge Base.

8.4 How to use Knowledge Management


KM can be used : to answer Questions to predict equipment failures to advise on solutions to specific problems to provide Contacts with expertise to offer Tutorials on complex Topics, su8ch as Data Migration.

Best Practice in Data Management 95 8.5 A Data Model for KM This Data Model shows how a Database could be designed to support KM. It shows that the Knowledge is related to Topics which have Properties and can include Contacts A typical Topic would be How do we do Data Migration ?

Best Practice in Data Management 96 8.5 Principles of KM


The principles behind KM include : Integration of all sources of Knowledge. This can include :o o o Text Content Audio and Videos Email

The accumulation of knowledge, ideas, experience, heuristics and what works The structure of Enquiries against the Knowledgebase (KB) The design of the KB which helps enrichment over time. The ability to tailor the KB to provide prepared results for specific Roles and Questions.

8.6 Mistakes to Avoid


Experience has shown that these are some typical mistakes to avoid. Rely on a Tool to do it all o Focus on people, process and technology, not just technology alone.

Put lipstick on a Pig o Trying to use search to conceal poor content

Believing that Build it and they will come o KM needs care and feeding by nurturing and marketing

Dont create Review Bottlenecks o Aim to get quality content reviewed and out quickly

Link each Case to at least one Solution Dont treat KM as a Project o It requires an ongoing commitment

Best Practice in Data Management 97 8.7 Tools for KM


Tools to support KM can be at different levels of sophistication :1) The most basic is simply documents. 2) The next level up is basic HTML, like this Question about Data Integration :http://www.databaseanswers.org/data_integration.htm Heres an example of FAQS on the Database Answers Website :-

http://www.databaseanswers.com/faqs.htm

3) The next level up is Database-driven Content, like this Tutorial on Master Data Management

http://www.databaseanswers.org/pi_best_practice_display/manual.asp?manual_id=BP_MDM

4) The next level up is a dedicated KM System, which can either be installed on premise or in the Clouds. This can be an adaptation of a System for a related purpose, such as Incident Reporting or Trouble Shooting. Two examples in the MOD would be the HP CASD System or Remedy from IBM. This approach is attractive because it is cost-effective and it provides an opportunity for clarification of the requirements. 5) The next level up is a Web Site on the Internet, such as Knowledge Plaza

http://www.knowledgeplaza.net/

This provides a one stop shop to store all file types and resources. 6) The top level is a Social Computing approach, which combines everything and offers very user-friendly facilities for people to provide Feedback. This includes Blogs, Wikis, Emails, Social Networks, like LinkedIn. This is discussed in a recent book entitled The Wisdom of Crowds.

Best Practice in Data Management 98

8.8 Self-Assessment
The Internet offers facilities for organisations to carry out a Self-Assessment to determine how they compare to other in their industry. Heres the Web Link - http://www.myknowledgeiq.com/?src=kat-web-hi Here is the result of the Analysis of typical Situation :-

Best Practice in Data Management 99 Appendix 8.A Specifications for a Task Log System 8.A.1 Introduction
This document is a starting-point for discussion and covers the Database design, Requirements and Screenshots.

8.A.2 Screenshots
8.A.10.1 Home Page This Screenshot shows a draft set of facilities for : Feedback and User Comments on any topic. Enquiries Entering and updating Tasks, Lessons Learned and Requirements.

Best Practice in Data Management 100

8.A.10.2 Enter a new Task


This shows a screenshot for the DCoE Task and Activity Log produced by Nicola Askham. This could be implemented by a Forms Manager that could generate an Email to a designated Email Address for each new Task. This Email could also be produced as a CSV file which could be loaded directly into a Spreadsheet.

Best Practice in Data Management 101 8.A.3 Requirements


These Requirements have been derived from the Data Model in the next Section. 8.A.6.1 Essential CASD Enter Enter Enter A new Task Lessons Learned Requirements IBMs Remedy Other (TBD)

Enquire Enquire Enquire

Tasks in Progress Lessons Learned Requirements

8.A.6.2 Optional CASD Enter, Update Generate automatically Enter, Update Processes Task_Processes Staff details IBMs Remedy Other (TBD)

8.A.6.3 Reference Data CASD Enter, Update, Delete Enter, Update, Delete Enter, Update, Delete Roles Task Outcomes Task Status IBMs Remedy Other (TBD)

Best Practice in Data Management 102 8.A.4 Data Model for the Task Log

Best Practice in Data Management 103 Appendix 8.B. Details of selected Products
8.B.1 Inspire KM Inspire KM focusses on Self-Service and aims to reduce customer support: If offers the following features which make it stand out from its competition :

A completely web-based self-help system Categorize information by area, product, etc Built-in feedback for customers and staff Customers store their own favourites list Popular search terms to find help Glossary of terms to define technical words

Active Response System integrates into any website form Searchable knowledge items and attachments (Microsoft Office and PDF) RSS feeds allow customers to instantly see new knowledge items

8.B.2 Knowledge Plaza Knowledge Plaza is an example of the current generation of Internet-based products that build on what has been learned and also incorporate Web 10.0 features of user involvement and feedback. Knowledge Plaza allows businesses of all sizes to capture what actually matters to them. That's the day to day flow of information and knowledge which needs to be saved, retrieved and used to make or support business decisions. Using many of the most talked about aspects of web 10.0 (tagging, social networks, RSS realtime dashboards and social search), Knowledge Plaza enables users to collaborate around information sources and contribute to an ever growing knowledge base. In search of the great search and customer experience, companies are turning to enterprise search (ES) or enterprise knowledge management (EKM). Both of these solutions are designed to help users search through large bodies of information, but the way in which they do so is fundamentally different and has a profound effect on the users search experience and success.

Best Practice in Data Management 104

8.B.3 OpenKM
OpenKM is an Open Source solution which focusses on Document Management. It supports Search Thesaurus and Workflow facilities. Here is the link for the Web Site :-

http://www.openkm.com/

Appendix 8.C. Checklist


The answers to these questions generate the Spider diagram shown above. They indicate the purpose of a KM programme. Is there a Knowledge Manager ? Is there a Centralised Repository of information ie a Knowledge Base ? Are Dashboards shared ? How does management think of KM ? o Whats that ? o Ill think about it / Knowledge is our mission Are there any metrics for KM or K Base ? o Individual metrics (emails) / Team effort Self-Service success, etc. How is success measured eg ROI ? Are individuals or Teams credited for authoring articles ? Is time to create Articles measured ? Is the organisations Web Site good for overall Content Messaging depth, relevance effectiveness ? How effective is the Web Site in usability ? Does Website return KB content ? Is there a dedicated portal for Customers enquiries that returns KB Content ? Does Self-Service return KB only, or KB plus other material ? Do you offer RSS Feeds or FAQs or Whats New ? What is the goal of Self-Service ? o To deflect inbound case volume o To help Customer research and select Products and Services o To handle Issues that Customers might not contact us with, thereby increasing loyalty and satisfaction Can Customers open Cases online and will KB provide answers automatically ? Is Agent experience automatically fed into the KB ? Can Customers provide feedback on KB ? Does Website offer supported Community ? What user features are available simple search, natural language search, search based on user intent ? Is KM Tool linked to CRM o Can link Case to a solution o Can start a new KB article from Case o Is it easy to create new Content in KB easy or requires KM expertise Optimising Customer experience o Search Results tuning

Best Practice in Data Management 105


o o o Customer experience analysis Process wizards Step user thru series of questions and answers to get to a resolution.

Does KB support global, multilingual content, including synchronising local content in a master document ? Focus o o o of delivery Assisted service or support phone, web, chat or email Self-Service Support

Complexity of Questions resolved by KB o Low-level - Issues resolved in 15 minutes o Medium Issues resolved in an hour or less o High issues resolved in a day or less o Very high not more than 25 issues resolved in a month

Best Practice in Data Management 106

Chapter 9. MetaData Management


Here is the LinkedIn Group for Metadata Management :-

http://www.linkedin.com/groups?gid=38815&mostPopular=&trk=tyah

9.1 What is Metadata ?


Metadata is commonly described as Data about Data. For example, if a Database holds data about Customer, then the Metadata tells us things about the Customer data, such as exactly what items we hold, how good it is, where it is derived from and so on. The Metadata can be used to answer questions like this : What data do we have ? What does it mean ? Where is it ? How did it get there ? How do I get it ?

It is clear from these questions that data is useless without Metadata. Data Profiling can involve these kinds of analyses : Domain : whether the data in the column conforms to the defined values or range of values it is expected to take o For example: ages of children in kindergarten are expected to be between 4 and 9. An age of 7 would be considered bad data.

Pattern: a Phone number for Inner London should start 02010. Frequency Counts: if most of our customers are in London then the largest number of occurrences of city should be London Useful Statistics include : o Minimum value o Maximum value o Average Business Rules for bad data would include o a Start Date later than an End Date. o A Start Date almost invariably wrong, such as a Start Date before the orgnaisation was in existence.

Best Practice in Data Management 107 9.2 Types of Tools that can used and how
Typical Tools for Data Profiling produce Reports about specific fields in Tables. These Reports look at values of any field in any Table in a Database. The most common products are available for Data Profiling as part of Data Quality. In other words, a common strategy starts with Data Profiling and builds a Metadata Registry based around the results of the kind of analysis obtained from Profiling. In this space, there are four different kinds of Vendors : The major players, such as IBM The niche players, such as Data Flux The Open Source players, such as DataCleaner and Talend Open Profiler In the Cloud Vendors, such as Informatica

Vendors in Gartners Leaders Quadrant for Data Quality include : Data Foundations (A Cool Vendor) DataFlux IBM Trillium

This is a representative list of Vendors that provides a good starting-point : Ab Initio Data Profiler Software Astera Software Business Data Quality Ltd Citrus Technology Datamartist Profiler Datiris Profiler MeasureGroup Automated Data Analysis SAP BusinessObjects Trillium Software

Best Practice in Data Management 108 9.3 How are Tools used ? Typically, these Tools are used by someone in an Administrator role, sitting at a PC and running software to analyse Data Sources. He or she will update the contents of the Registry as a result of the analysis. Additional content can be obtained by running queries on the System Catalogues for specific Databases. The Metadata Manager will also enter details of mapping between different Schemas. The diagram below shows Tomorrows Data Quality Architecture and is taken from a Presentation given by Barry Williams as the Data Architect for the London Borough of Ealing. The title was A Strategy for establishing Enterprise Data Quality. The Data Dictionary could be implemented as a Metadata Registry. DQ Admin could be a Data Architect in the role of Metadata Manager.

Best Practice in Data Management 109 9.4 Tools and Standards


ISO has established ISO Standard 11179 that applies to Registries and aims to facilitate the exchange of data between Metadata Registries. ISO/IEC 11179 claims that it is a standard for metadata-driven exchange of data in an heterogeneous environment, based on exact definitions of data. About ISO 11179, Wikipedia says :Organizations often want to exchange data quickly and precisely between computer systems using enterprise application integration technologies. Completed transactions are also often transferred to separate data warehouse and business rules systems with structures designed to support data for analysis. The industry de facto standard model for data integration platforms is the Common Warehouse Model (CWM). Data integration is often also solved as a data, rather than a metadata, problem, with the use of so called master data. ISO/IEC 11179 claims that it is a standard for metadata-driven exchange of data in an heterogeneous environment, based on exact definitions of data.

OneData MDR from Data Foundations is currently the only Registry which is 11179-compliant. The price of OneData is several hundred thousand pounds. It provides a useful benchmark against which other options can be evaluated. Here is the Web Link :o http://www.datafoundations.com/solutions_metadata_registries.jsp

9.5 Open Source Developments


9.9.1 Mike An initiative called Mike (Method for an Integrated Knowledge Environment is doing some good work in establishing a Toolkit and Open Source Methodology for MetaData Management :o o http://mike10.openmethodology.org/ http://mike10.openmethodology.org/wiki/Metadata_Management_Foundation_C apabilities_Component

Best Practice in Data Management 110


9.9.2 Universal Data Element Framework (EDEF)

Another promising initiative is UDEF, about which Wikipedia says :The Universal Data Element Framework (UDEF) provides the foundation for building an enterprise-wide controlled vocabulary. It is a standard way of indexing enterprise information that can produce big cost savings. UDEF simplifies information management through consistent classification and assignment of a global standard identifier to the data names and then relating them to similar data element concepts defined by other organizations. Though this approach is a small part of the overall picture, it is potentially a crucial enabler of semantic interoperability.

9.6 Data Profiling


9.10.1 What is Data Profiling ?

Data profiling is the process of examining the data in a database and collecting statistics about that data. The purpose of these statistics may be to: o o o o o Find out whether existing data can easily be used for other purposes Assess the Data Quality. Assess the challenges involved in integrating data Assess whether metadata accurately describes the actual values Understand data challenges early on, so that late project surprises are avoided. Establish an enterprise view of all data, for uses such as Master Data Management

Data Profiling can be a way to involve business users to provide context about the data, giving meaning to columns of data that were previously poorly defined by metadata and documentation.
9.10.2 Data Profiling Tools Here are two Open Software products for Data Profiling are :-

DataCleaner Talend Open Profiler

Commercial Vendors for data profiling software Ab Initio Data Profiler Software Astera Software Business Data Quality Ltd Citrus Technology

Best Practice in Data Management 111


Datamartist Profiler Datiris Profiler MeasureGroup Automated Data Analysis SAP BusinessObjects Trillium Software

9.10.3 Storage of Profiling Results

This section discusses the results of profiling activity and how they can be stored in such a way that they can be useful and used again in the future. A Metadata Registry can be used to record the agreements reached in discussions. An appropriate Registry Tool can provide controlled access on a Publish-and-Subscribe basis. Metadata can be used in many different ways and the ROI on the time invested in obtaining Metadata is very high. For example, answering Data Profiling questions such as : Which sources of physical data have the best quality for master data construction? Contention of physical data types across different instances of the same data For example, names with different lengths can lead to truncated names. Multiple meanings being attached to the same data in different data repositories. For example, does everybody agree to a common definition of a Customer or Product.

Also answering Data Governance questions such as : An overview of sensitive data and an opportunity to ensure that all instances are maintained under appropriate security regimes

Highlight areas of data ownership contention


Opportunity to develop master and master reference data

Best Practice in Data Management 112 9.7 Data Model for MetaData Management This Model shows that Metadata Items have Properties and each Property has a series of Values. For a Date, a typical value would be MIN and MAX. These are derived by Profiling and can be used to validate the data.

Best Practice in Data Management 113 9.8 Integrated Data Model for Knowledge and MetaData Management
This shows an Integrated Data Model that would support both Knowledge and Metadata. It can be used to evaluate the features available in a possible solution.

9.10.1 The Simplified Version This shows that all Information is composed of Facts that are related to a number of other Facts. Facts can be searched or indexed by Tags.

Best Practice in Data Management 114


9.10.2 The Complete Version

This clarifies the Types of Facts, the possible Formats and Types of Relationships. This Data Model could be used to evaluate the features offered by any possible System solution.

Best Practice in Data Management 115

Chapter 10. Quality of a Data Model


This Chapter describes how you can check the Quality of a Data Model.

10.1 Create a Top-Level Business Data Model


10.1.1 Types of Data Models All the Data Models that we will be discussing can be described as Entity-Relationship Diagrams, or ERDs. They all show Relationships between Entities or Tables. At the conceptual level, the Things of Interest, such as Organizations, are called Entities and at the Logical or Physical level they are called Tables, because they often appear as Tables in Databases. At the Physical level, Tables are given names in the plural, such as Organizations, whereas at the Conceptual level they often appear in the singular, that is Organization. At the Logical level they might be either singular or plural.

A Top-Level Business Data Model can be created using Microsoft Word and is intended for business users and a non-technical audience.

The other Models referred to in this document will always be created by a Data Modelling Tool such as ERWin or IBMs Rational Rose. They could be described as Conceptual, Logical or Physical Models. Conceptual Models show the Things of Interest which are in Scope, for example, Organizations and Products. They may or may not include Keys and will certainly not include physical Data Types, such as the length of character strings. They may include Many-to-Many relation ships. Logical Models will include Primary and Foreign Keys and often the Modelling Tool will provide a facility to generate a Physical Model form a Logical one. Physical Models are often close to the actual design of an operational Database. They will always show data types and field lengths. 1.2 Example of a simple Business Data Model This Model was created in Word and shows Organizations, Requisitions and Products. The flow of logic in a Data Models should go from top-left to bottom-right. This means that the more fundamental Things are on the top and to the left.

Best Practice in Data Management 116


This diagram is a good example.

Organizations

Stores

Requisitions

Products

Products in a Requisition

A Requisition can ask for an engine to be supplied, like this Hornet Engine for the US Navy :-

This version shows that Organizations and Products each have a hierarchy so that an Organization is part of a higher Organization. Similarly, a Product can be part of a more complex Product.

Best Practice in Data Management 117


Which of these two you choose to use will depend on the audience. In general, it is better to choose the simple option.

Organisatiopn s

Stores

Requisitions

Products

Products in a Requisition

Best Practice in Data Management 118


This diagram shows the organisational structure of the US Naval Air Systems Command which is a simple hierarchy:-

The diagram is shown on this page of the Navy Web Site for the Naval Air System Command

http://www.navair.navy.mil/index.cfm?fuseaction=home.display&key=OrganizationalStructure

10.2 Draft the Business Rules Business Rules are valuable because they define in plain English with business terminology the underlying relationships between the Terms that appear in a Data Model. The User community will then be able to agree and signoff the Rules. Here is a small example.

Nr D.1

TABLE Requisitions and Organizations

DESCRIPTION A Requisition must be raised by a valid Organization. Not every Organization will raise a Requisition.

Best Practice in Data Management 119


Therefore the Relationship must be one-to-many with a mandatory condition at the Organization end and an optional condition at the Requisition end. D.2 Requisitions and Products A Requisition must refer to valid Products. Therefore the Relationship must be one-to-many with a mandatory condition at the Product end and an optional condition at the Requisition end, because not every Product will appear in a Requisition. Products are kept in Stores

P.1

Products and Stores

Best Practice in Data Management 120


10.3 Draft a Glossary of Terms It is very important to establish agreed definitions of terms and words in common use. This is a small example. TERM Order DESCRIPTION A Request or Order for Products to be supplied to the Requesting Organization. COMMENT Outside the military environment, the word Order is also used to mean a request for a Product from a commercial organisation.

Requisition Product

A Request or Requisition for Products to be supplied to the Requesting Organization. An Asset that can be separately ordered. It can be a Component and a part of a larger Assembly. It can be very small, such as a Washer, or very large, such as a Tornado aircraft. Very large Products can be subject to a separate Requisition Process.

10.4 Check that the Data Model is correct There may be errors which have a simple explanation. For example, the incorrect use of the Modelling Tool. Any errors should be discussed and resolved with the Modeller and the Users.

This is where the Glossary and Business Rules are very valuable. 10.5 Review with Users At this point, review the Business Rules and the Glossary with Users and aim to get Sign-Off. Make any necessary changes to format and contents. 10.6 Check Normalised Design 10.8.1 Normalised Design This discussion applies to Entity-Relationship Diagrams (ERDs) and not to Data Warehouses. We will start by defining the Rules for Normalisation so that we can recognise cases where they have been broken.

Best Practice in Data Management 121


Rules for Normalisation Rule 1 : A little background is appropriate at this point. The theory which provides the foundation for Data Models and ERDs, was developed in 1970 by an Englishman called Ted Codd, who was a research scientist with IBM in California at the time. Here is the page on our Database Answers Web Site devoted to Codds work :-

http://www.databaseanswers.org/codds_page.htm

One of his rules can be summarised as :The Data in a Table must belong to the Key, the Whole Key and Nothing but the Key, so help me Codd.

This means, for example, that a record in a ORGANIZATIONS Table must contain data only about the Organization, and nothing about people in the Organization, or activities of the Organization. It might include things like the name of the Organization and when the Organization was founded. Check 1 : Can the values of every data item in a table be derived only from the Primary Key ?

Rule 2 : Another of Codds rules stated that derived data must not be included. For example, the headcount for an Organization would not be included in the Organizations Table because it can be derived by counting the records of members in the Organization. Check 2 : Can any data item be derived from other items ?

Rule 3 : There must be no repeating groups in a Table. The one uncomfortable exception is that Addresses. They are very often stored as a number of repeated lines called Address_Line_1, Address_Line_2, and so on. Check 3 : Do any column names repeat in the same table?

Best Practice in Data Management 122

Rule 4 : An item of data must only be in one Table. For example, the name of a Organization would appear only in the Organizations Table. Check 4 : Does the same item of data in appear in more than one table ?

10.8.2 Reference Data 10.8.10.1 Background A list should be made of the Reference Data referred to in a Data Model. When the list is complete it should be analysed for consistency. For example, there will not usually be any relationships between the Reference Data. However, if there are any, then they should be sensible and consistent. For example, a Town might be in a County which would be in a Country. These could all be classified a Reference Data which has relationships which should be validated.

Typical Reference Data could include Job Titles, and Types of Products. In passing, we should note that Organizations, Job Titles and Products are all examples of hierarchical structures. Job Titles will change only very, very rarely. However, when they are stored in a Table which is joined to itself then, of course, the Table will have a Recursive Relationship to itself. Therefore, wherever these occur, we would expect to find compact Data Models that include a great deal with compact and powerful structures. 10.8.10.2 Standards Any appropriate national, international Standards must be considered when values for Reference Data are decided. These include MOD, NATO and ISO standards. For example, ISO publishes standards for Country Codes and NATO maintains standards for Product classification. Therefore any Data Model relating to Products should consider this standard and where appropriate the necessary Tables should be added to the Model.

Best Practice in Data Management 123


10.10.3 Slowly-Changing Data The classic example of Reference Data which never changes is a Calendar. The values are predictable for hundreds of years ahead. There is a category in between which is usually called Slowly-Changing Data. This applies where the values of the data changes on roughly a six-monthly basis.

Data about Categories and Types is often fixed values but some can change infrequently. For example, a new Aircraft Type was introduced with Unmanned Aircraft. The values then became Fixed-Wing, Rotary and Unmanned. This would be an example of Slowly-Changing Data. This highlights the fact that what constitutes Reference Data can be subjective and may be defined differently in Data Models created by different people or organisations.

Best Practice in Data Management 124


10.10.4 Check Normalisation

Check Nr 1 2 3 4

GOOD Y N N N

OK (Y/N) ? N

DESCRIPTIO Can the values of every data item in a table be derived only from the Primary Key ? Can any data item be derived from other items ? Do any column names repeat in the same table? Does the same item of data in appear in more than one table ?

Best Practice in Data Management 125


10.7 Look for Design Patterns 10.10.1 Some Examples This Data Model shows examples of Design Patterns for One-to-Many and Many-to-Many Relationships, Reflexive Associations and Reference Data. PK stands for Primary Key and FK stands for Foreign Key.

PF, which is shown in the Products_in_a_Requisition Table, stands for Primary and Foreign Key. This is a Primary Key in one Table which is also a link to another Table, where is also a Primary Key.

Best Practice in Data Management 126


10.10.2 Inheritance More details are provided in Chapter 10. Concepts in the document entitled How to Understand a Data Model.

We use the concept of Inheritance where have Super-Types and Sub-Types. Inheritance is exactly what it sounds like.

It means that at a high level, we identify the general name of the Thing of Interest and the characteristics that all of these Things share.

For example, an Aircraft will have a name for the type of Aircraft, such as Tornado and it will be of a certain type, such as Fixed Wing or Rotary.

At the lower level of Fixed-Wing Aircraft, an Aircraft will have a minimum length for the runway that the Aircraft needs in order to take off.

This situation is shown in the following diagram :-

Best Practice in Data Management 127

Best Practice in Data Management 128


10.10.3 One-to-One Relationships We can remind ourselves that Rule 1 above states :The Data in a Table must belong to the Key, the Whole Key and Nothing but the Key, so help me Codd. One implication is that there should not be a One-to-One Relationship between two Tables in a Model because the data can be combined into one Table with the same Primary Key.

However, there is an exception to this which is when a one-off event can occur which involves a substantial amount of data. In that case, it would not be good to create a large number of fields which will be blank in the large majority of cases.

For example, when a Soldier joins the Army there might be data that is involved only with the joining details. The basic data for the Soldier will be part of his or her basic records such as Date of Birth and Place of Birth. If a separate Table exists for Joining Details then it would contain such things as date and place of joining. Then the Soldiers Table would have a One-to-One relationship with the Joining Details Table. In other words, it can sometimes be acceptable to see a One-to-One in a Data Model. If that happens, it is necessary to establish the associated Business Rules to clarify the conditions.

Best Practice in Data Management 129


10.8 Review any Data Warehouses This Section is relevant if the Data Model includes a Data Warehouse or Data Mart. A Data Warehouse can be a Star or a Snowflake Design. This diagram shows a typical Data Warehouse. It is a Star structure with only one dimension for the related Dimension Tables. The arrows point from Children to Parents.

Calendar
Day Date

Facts
Dimensions Date

Product Types
Product Type Code

Requisitions
Requisition ID

Job Titles
Job Title Code

Requisition ID Product ID Product Type Code Job Title Organization ID Facts of Data Date of Requisition Products in Requisitions Organization raising Requisition Averages, Counts, Totals KPIs and Dashboards Other Derived Figures

Products
Product ID

Organizations
Unit ID

Best Practice in Data Management 130


10.9 Check Naming Standards At this Step, we check for compliance with Naming Standards. For example, a typical standard might state that Field names should be specified with underscores linking related words and first letters in capitals, such as Organization_ID. In the absence of any explicit Standard, this should be the default. This is shown in the Model in Section 10.1 and also in this one.

We might say that Naming Standards are nice to have. In other words, they are not essential but they reflect Best Practice.

Best Practice in Data Management 131


10.10 Check for Consistent Data Types It is important to check for consistent Data Types and Lengths for two reasons :1. it avoids nasty surprises when a physical Database is generated from the Data Model 2. it is an indication of the professionalism of the manner in which the Data Model was produced, unless it has been reverse engineered from a Database, in which case these design considerations do not apply. For example, names should always be the same, or should be handled in a way that handles any differences in a way that ensures consistency. Typically, a longer name should be explicitly truncated to a shorter value where appropriate. In the absence of any explicit Standard, the default for Names or Address Lines should be VARCHAR(255) or VARCHAR2(255) for Oracle. For other character strings they should default to Memo or Text. 10.11 Check for Defaults We would like to see Default Values used wherever possible because they increase the discipline enforced by the Model and they indicate that a thorough analysis was carried out during the creation of the Data Model. For example, a Start Date could default to the current day, or the System Date. 10.12 Determine the Assurance Level This could be either :i. Acceptable ii. Acceptable with Reservations iii. Not Acceptable

Best Practice in Data Management 132 Appendix 10.A Checklist for Quality Assurance of a Data Model
This Checklist extends the basic concept of the Data Model Scorecard which was originated by Steve Hoberman. Nr 1 2 3 4 5 6 FEATURE Can the Scope of the Model be defined ? Can the Requirements be defined ? Does the Model meet the Requirements ? Is there a comprehensive Glossary of Terms ? Have comprehensive Business Rules been defined ? Normalisation Checks (Good value of Y or N) 1. Can the values of every data item in a table be derived only from the Primary Key ? (Y) 10. Can any data item be derived from other items ? (N) 10. Do any column names repeat in the same table? (N) 8. Does the same item of data appear in more than one table ? (N) 7 Is there compliance with any relevant Data Standards ? Desirable Not essential IMPORTANCE Essential Essential Essential Essential Essential Desirable Yes/No COMMENT There must be clear alignment of the Model with the business and the User community. Requirements must be defined. The Model must meet the Requirements. Very important that the value of a Glossary is recognised Very important that the value of Rules is recognised The Model might be OK even if it fails all of these Checks. The power and flexibility of the Relational Approach makes it possible to handle all sort of errors and still provide the foundation for an operational Database.

Best Practice in Data Management 133


8 Does the Model use relevant Design Patterns ? Desirable Might be OK even if Design Patterns cant be identified. But if it isnt laid out well it brings into question the professionalism of the creators of the Model. 9 Is the Model easy to read and understand ? Desirable The flow of logic in a Data Models should go from top-left to bottom-right. It might be OK even if it cant be read and understood. But if it isnt laid out well it brings into question the professionalism of the creators of the Model. 12 Can any repeating groups be identified ? Not critical Not critical because the Database will work OK. Usually indicates denormalisation for performance reasons. In other words, this can be acceptable for a Database design. For example, Address Lines often repeat. 12 13 --Can any derived data be identified ? Does the Model follow Naming Standards ? --Not critical Not critical TOTAL SCORE Not critical because the Database will work OK. Cosmetic but is a measure of the professionalism

Best Practice in Data Management 134


If the answers to all the Essential features is Yes then the Model is Acceptable.

If any of the Essential Questions have a No answer, which the Model is not Acceptable. Any No answers to Desirable or Not critical Questions do not affect the acceptability of the Model but mean that it could be improved.

RESULT
All Essential Features are Yes Any Essential Features are No

RATING
Acceptable Not Acceptable

COMMENT

The Essential items are top priority for improvement.

Typical Summary
The results of a of a typical Model might result in this summary :Reservations are that the documentation does not demonstrate that the Data Model meets the User Requirements. The Data Model shows some weaknesses which the supplier has agreed to address.

Follow-Up Remedial Action


A reasonable result of a QA analysis would be the identification of some problems that could be rectified fairly easily and quickly. This applies to things like documentation and naming standards. The appropriate Remedial Action will depend on the context and scope of the Data Model.

Best Practice in Data Management 135

For a Health Check :No action is required beyond the presentation of a Report because the QA is simply to establish the As-Is situation.

For a proposed Application :It is essential that the Model accurately meets the User Requirements. If it does not, then it must be corrected in discussion with the Users and the Modeller.

For Data Migration :It is essential that the Model is correct at the detailed level of Tables, Fields and Data Types.

Williams | Best Practice in Data Management 136

Appendix 10.B A Case Study


This Case Study provides an example of the Tutorial in action. It includes blank Templates and sample Templates which are guidelines. Step 10.B.1 Create a Top-Level Business Data Model 10.B.1.1 Background Lets assume that a Data Model has been provided by a Third-Party, such as a Delivery Partner. The first Step is to understand the Data Model by creating a Top-Level Business Data Model. Here is the Data Model from the Delivery Partner that we will use as an example :-

Williams | Best Practice in Data Management 137

10.B.1.2 Our Conclusions Our conclusions are that this is not a good Data Model. Reasons include : It contains Reference Data which is not appropriate at the Top Level There is no description of the functional area that the Model supports.

Our first activity therefore is to produce an equivalent Business Model that we like that we can use as the basis for discussion.

Corrective Actions include :1. Create a simple Business Data Model This should be a Model in Word that does not include Reference Data 2. 3. 4. 5. Produce a short description Create a Glossary of Terms Define the representative Business Rules Identify the intended Users and the Owners of the Model

10.B.1.3 Functional Description In this diagram, arrows point from Children to Parents. The Scope of the Data Model includes Requisitions for Products from Organizations. The Functional Description is a simple one-liner Organizations raise Requisitions for Products.

Williams | Best Practice in Data Management 138

10.B.1.4 A Specific Model in Word The Specific Version is consistent with the Generic Version and looks like this.

Ship

Central Stores

Officer

Department

Requisition

Product

10.B.1.5 A Generic Model in Word In this diagram, arrows point from Children to Parents. Our Generic Data Model (from Section 1.2) looks like this :-

Units

Stores

Requisitio ns Products in a Requisition

Products

Williams | Best Practice in Data Management 139

10.B.1.6 A Top-Level Generic Data Model This is a top-level Model that was created using a Data Modelling Tool. It shows useful detail, such as details of the Relationships. It also replaced a Many-to-Many with two One-to-Many Relationships.

This diagram is a generic version of the one below and is useful to help in providing a higher level context for lower-level, more specific Models. An additional level of detail shows the Rabbits Ears relationship that implements hierarchical relationships for Organizations and for Products.

This Model corrects an error in the original Model that we were given. The error was that a Ship has Departments independent of Officers.

Williams | Best Practice in Data Management 140


This diagram shows that Relationship correctly :-

Williams | Best Practice in Data Management 141

Step 10.B.2 Draft the Business Rules These Rules must be phrased in unambiguous English. Where possible, the English should make it possible to implement a Rule in a Data Model. For example, Rule 1 makes it clear that there is a One-to-Many Relationship between a Ship and an Officer.

1. A Ship is staffed with many Officers. 10. Ships Departments raise Requisitions. 10. A Requisition must be authorised by an Officer. 8. An Officer is assigned to one Ship at any point in time. 9. An Officer is assigned to one or many Ship during the course of their career.

Templates Here is a sample Template for Business Rules :Nr RELATES TO Customers, Requisitions BR.D.2 Customers, Requisitions BR.D.3 Requisitions, Products BR.D.4 Requisitions, Products John Doe A Product can appear in zero, one or many Requisitions. Therefore, there is a Many-to-Many Relationship between Requisitions and Products. John Doe A Requisition can refer to one or many Products. TBD A Requisition must be associated with a valid Customer. OWNER DEFINITION

BR.D.1

John Doe

A Customer can raise zero, one or many Requisitions.

Williams | Best Practice in Data Management 142

Step 10.B.3 Draft a Glossary of Terms This is a sample Template.

TERM Customer Requisition

AUTHOR John Doe John Doe

DEFINITION Any Organization that can raise a Requisition A request for Assets to be supplied. The format of a request can be an electronic message, a paper Form and so on.

Product

An Asset that can be supplied on request. It can be something small, like a Pencil, or something large, like an Printer. The term Equipment is reserved for major items, such as Printers. The word Asset is used to refer to smaller items, such as Pencils

Williams | Best Practice in Data Management 143

Step 10.B.4 Check that the Data Model is correct The Rules will help in determining whether the Model is correct. In this case, there is an error in that Ships are shown coming in between Ships and Departments. The reality is that Departments exists without Officers. This is corrected in the Top-Level Data Model in Section 1.10.. Step 10.B.5 Review with Users Review and revise as necessary. Step 10.B.6 Check Normalised Design The design looks normalised and therefore is acceptable. The Reference Data looks appropriate and is not related and therefore is acceptable. Step 10.B.7 Look for Design Patterns This Business Model shows these examples of Design Patterns : a One-to-Many Relationship between Ship and Office a Many-to-Many Relationship between Requisition and Product

It does not show Inheritance but in general we would not expect to find it. There are a number of reasons why Inheritance does not appear. For example : Inheritance is not appropriate in this case Inheritance does not show in a Data Model for a physical Database.

Step 10.B.8 Review any Data Warehouses In this Case Study, this Step is not necessary because we do not have a Data Warehouse or Data Mart. Step 10.B.9 Check Naming Standards Standards that are common include ;1. 2. 3. Any of Initial Capitals with lower case elsewhere for example, Organization_ID All capitals for example, ORGANIZATION_ID Lower case everywhere for example Organization_id these Standards is acceptable.

If no Standard has been established, then Number 1 is recommended.

Williams | Best Practice in Data Management 144


Step 10.B.10 Check for consistent Data Types This Check requires a Physical Data Model or some other document that includes this level of detail. The procedure then is to visually scan the documents or use the Domain feature of the Modelling Tool or perhaps SQL to look for discrepancies. The Domain feature allows you to define standard data types for any Data Item that occurs frequently and then use this Domain for every occurrence of the Data Item. For example, a Name could be defined as a Variable-Length Character string with a length of 259. Then whenever a Name appears in a Model, the Modeller can select this Domain as a convenient shorthand and also a simple way to enforce consistency.

It will be necessary to analyse any discrepancies and decide on a standard to resolve them. Step 10.B.11 Check for Defaults Default values are a powerful technique for adding values in a Data Model. They can be used enforce consistency. Probably the most common example is to specify that the date of entry and creation of a new record should be the current System Date. This applies to new Customers, Orders and the Date of any Payment or Adjustment and so on. Step 10.B.12 Determine the Assurance Level Appendix A defines the process to be followed and discussed appropriate remedial follow-up action.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy