0% found this document useful (0 votes)
12 views55 pages

BIA 5000 Introduction To Analytics - Lesson 2

Uploaded by

angelchang0126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views55 pages

BIA 5000 Introduction To Analytics - Lesson 2

Uploaded by

angelchang0126
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

INTRODUCTION

TO ANALYTICS
2023 – 2024
LESSON 2.
DATA LIFE CYCLE
Learning Objectives

• Name and understand the phases of the data lifecycle


• Identify the processes and activities of each phase
• Recognize DAMA Framework knowledge areas
• Interpret a simple context diagram
• Describe how analytics fits into DAMA framework
• Discuss good and bad data
• Interpret XML data format
Agenda

1. Data lifecycle phases and activities


2. Context diagram example
3. DAMA DMBOK knowledge areas
4. Qualities of good data; five C’s
5. XML data format
Does the data have a life cycle?

Discuss the article given out as home assignment.

What happens to the data?

Where does it come from?

Where does it go?


DATA LIFE CYCLE
Module 2
Data Life Cycle

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

Destruction
Data Life Cycle
Sourcing Collecting and capturing data values from various sources.
A.k.a Data capture/Data acquisition
Storage & Storing, maintaining and preparing data for usage.
preparation A.k.a. Storage & maintenance
Protection & Application of data to the tasks needed to operate the enterprise while
usage protecting the data.
A.k.a Permitted use of data

Sharing Sending data to users or entities that require the data for certain purposes,
both inside and outside the enterprise.
A.k.a. “publication”

Archiving Archiving data that is no longer actively used for a defined retention period.
Destruction Removal of every copy of data item from enterprise.
A.k.a. Purging / Permanently destroying
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage
• Obtain data externally
Sharing
• Create or enter data
• Receive and capture data signals
Archiving

Destruction
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing
• Move and store data
• Cleanse and enrich data Archiving
• Transform and synthesise data
• Integrate data from multiple sources Destruction
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving
• Apply data to enterprise tasks
• Protect, monitor and audit usage Destruction
• Search, classify and explore data
• Model and analyse data
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

• Data publication
Destruction
• Visualization
• Data sharing, moving and copying
• Delivering data products to customers
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

Destruction

• Copying data into archive


• Removing archived data from active environments
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

Destruction

• Permanently destroying data


Data life cycle: group discussions

Why do enterprises purge (destroy) data?


Later in the course

Module 7: Analytics • Phases in analytics projects – how do they relate to data life
project basics cycle

Module 8: Legislative & • Permitted uses of data


security issues
• Data protection

Module 9: Ethical • Ethical sharing of data


issues in analytics
Data Life Cycle – processes
What knowledge and skills are
Sourcing
needed to manage data
Storage & through its lifecycle?
Preparation
Protection
& Usage

Sharing

Archiving

Destruction
DMBOK KNOWLEDGE
AREAS
Module 2
DAMA and DMBOK

DAMA International is a not-for-profit, vendor-independent, global association of technical and


business professionals dedicated to advancing the concepts and practices of information and
data management.

DAMA DMBOK ®: Data Management Association (DAMA) Data Management Body of Knowledge

https://dama.org/content/body-knowledge
DMBOK Data Management
Knowledge Areas
Data Management is an
overarching term that
describes the processes
used to plan, specify,
enable, create, acquire,
maintain, use, archive,
retrieve, control, and purge
data. These processes
overlap and interact within
each data management
knowledge area.

DAMA DMBOK Framework


Data Governance

DMBOK Planning, oversight, and control over management of data and the use
Definition of data and data-related resources.

Processes & Enforce:


activities • Consistent definitions
• Rules
• Business metrics
• Policies and procedures on how to use data
• Reference data
• Data ownership
Data Architecture

DMBOK The overall structure of data and data-related resources as an integral


Definition part of the enterprise architecture

Processes & Define:


activities • Data needed to meet business needs
• Data, facts and dimensions
• Logical data models
• Enterprise data flows
Examine:
• Completeness and correctness of the source systems needed to obtain
data
Context Diagram — example

Service Customer Customer


Customer Self- requests Information Transaction
Relationship
Service App History
Management
Request status Customer
&notifications transactions
Customer Order
address status

Order
Management
Data Modeling & Design

DMBOK Analysis, design, building, testing, and maintenance of data structures


Definition

Processes & Design and build:


activities • Conceptual, logical and physical data modeling
• Master data modeling
• Modeling and design for different architectures (data warehouse, data
lake, cloud data storage etc.)
Data Storage & Operations

DMBOK Deployment and management of structured physical data assets storage


Definition

Processes & Manage:


activities • Building and operating data storage solutions
• Performance management, back-up and recovery of data assets
• Monitoring, archiving and purging of data assets
Data Security

DMBOK Ensuring privacy, confidentiality and appropriate access to data


Definition

Processes & Define:


activities • Privacy and security
• Access management
• Security governance (monitoring, audit, breach responses)
• Data protection (encryption)
Data Integration & Interoperability

DMBOK Acquisition, extraction, transformation, movement, delivery, replication,


Definition federation, virtualization and operational support of data assets

Processes & Manage:


activities • Data acquisition and movement
• Transformation
• Interoperability and integration
• Data migration and conversion
Documents & Content

DMBOK Storing, protecting, indexing, and enabling access to data found in


Definition unstructured sources (electronic files and physical records), and making
this data available for integration and interoperability with structured
(database) data
Processes & Govern:
activities • Content management (classification, tagging, indexing)
• Managing physical documents
• Managing electronic records (documents, images, scans, multimedia)
Reference & Master Data

DMBOK Managing shared data to reduce redundancy and ensure better data
Definition quality through standardized definition and use of data values

Processes & Govern:


activities • Establishing and managing systems of record
• Acquiring or creating systems of reference (business, spatial, market
data)
• Data business rules
Data Warehousing & Business Intelligence

DMBOK Managing analytical data processing and enabling access to decision


Definition support data for reporting and analysis

Processes & Govern:


activities • Data profiling and warehousing
• Data discovery, searching and querying
• Operational and analytical reporting
• Analytics
Metadata

DMBOK Collecting, categorizing, maintaining, integrating, controlling, managing,


Definition and delivering metadata

Processes & Manage:


activities • Business glossary / data dictionary
• Data classification
Describing data: metadata

Image credit: John O’Gorman


Metadata: information about data

Metadata: description of the data as it is created, stored, transformed, accessed


and consumed by the enterprise.
Business metadata: description of the data from business perspective
Business definition
Meaning
Source of the data
Technical metadata: description of the data as it is processed by software tools
Format
Size
Mapping

Sources: Textbook Chapter 4


Metadata: information about data

Metadata: description of the data as it is created, stored, transformed, accessed


and consumed by the enterprise.
Business metadata: description of the data from business perspective
Business definition
Meaning
Source of the data
Technical metadata: description of the data as it is processed by software tools
Format
Size
Mapping

Sources: Textbook Chapter 4


Metadata - example
Data Quality

DMBOK Defining, monitoring, maintaining data integrity, and improving data


Definition quality

Processes & Govern:


activities • Planning data quality
• Implementing data quality measures
• Monitoring data quality
Business Insights & Analytics: how does it fit in?
Sourcing

Storage &
Preparation

Protection
& Usage

Sharing

Archiving

Destruction
Business Insights & Analytics: how does it fit in?
Sourcing

Storage &
Preparation

Protection
& Usage

Sharing

Archiving

Destruction
GOOD AND BAD
DATA
Module 2
The five C’s of data

Clean data must be accurate, have no missing data points, conform


Clean to the format and contain no invalid entries

Consistent data must follow the same standard, definitions and use
Consistent the same codes and ranges of values to reflect the same meaning

Conformed data must be shareable across the same dimensions with


Conformed the same business meaning

Current data must be as recent as required for business purposes


Current

Comprehensive data must be sufficient and complete for the purpose


Comprehensive that this data is to be used for

Sources: Textbook Chapter 1


Can data be bad?

Where can bad data come from?

Provide an example of bad data from your personal or professional life.


https://www.dataquest.io/blog/advanced-data-cleaning-r-course/
XML DATA FORMAT
Module 2
Structured/Semi- Structured/Unstructured
Examples
Semi-
Structured Unstructured
Structured

Text
Numbers Social media
XML files
Categories Satellite images
Email
Codes Presentations
JSON messages
Dates PDFs
Digital photo files
Character strings Audio recordings
Accessible PDFs
Binary (True/False) Video
Website content
Rectangular datasets
(spreadsheets, database
tables)
XML Basics

XML (eXtensible Markup Language):


• Text-based format used to share data
• Markup language – uses tags to describe pieces of data
• Metalanguage - allows users to define their own markup languages
• A specification for storing information
• A specification for describing the structure of that information
• Has a well-defined structure – must follow a set of rules

Example: https://learning-oreilly-com.ezproxy.humber.ca/library/view/xml-visual-
quickstart/9780321602589/ch02.html
XML example

XML Basics by S. Banzal


XML structure

A root element is required


Every XML document must contain one, and only one, root element. This root
element contains all the other elements in the document.
All data (values) must be enclosed within tags
Every piece of data must have a defined place in an XML file within a starting
and a closing tag. Closing tag has the same name as starting tag, with ‘/’ in
front
Tags can have any names, but must describe the content
A user can pick any name for a tag however it should describe the element’s
purpose and contents.
Closing tags are required
Every element must have a closing tag.
XML structure
Tags can have attributes (zero to many)
Information contained in an attribute is considered metadata - information about the
data in the element, as opposed to the data itself.
An element can have as many attributes as desired, as long as each has a unique
name.
<book ISBN="1234567890123"> How to train your dragon </book>
<book SeriesNo="3"> How to Speak Dragonese </book>
<OrderAmount Currency="CAD"> 125.00 </ OrderAmount>

Indentation
It is a good practice to indent child elements relative to parents to make XML
documents easier to read and interpret by a human (see examples in the source)
Nesting
Elements must be properly nested
If you start element A, then start element B, you must
first close element B before closing element A
<root>

<child>
Root element
<grandchild>

Child element Toopy

</grandchild>
Grandchild element
</child>
Grandchild element </root>

Grandchild element

<root>
Child element <child>

<grandchild>

Toopy
Child element
</child>
XML syntax
XML declaration
Should be included at the beginning of each XML file: <?xml version="1.0"?>
Case matters
XML is case sensitive. Starting and closing tags must use the same capitalization.
Tag names
Names must begin with a letter, underscore, or colon, and may contain letters, digits, and
underscores.
Spaces are not allowed. Although valid, it is recommended to avoid including colons,
dashes, and periods within your names.
Names that begin with the letters xml, in any combination of upper- and lowercase, are
not allowed.
Tag contents does not require any additional format
XML: Visual QuickStart Guide, Second Edition
Everything within starting and closing tag is considered the tag content by Kevin Howard Goldberg Published by
Peachpit Press, 2008
XML syntax
Attribute values must be enclosed in quotation marks
An attribute’s value must always be enclosed in either matching single or double
quotation marks.
No spaces between attribute name and value.
White Space
You can add extra white space, including line breaks, around the elements in your XML
code to make it easier to edit and view.
While extra white space is visible in the file and when passed to other applications, it is
ignored by the XML processor,
Language support
Tag and element names do not need to be in English – it can be any language supported
by the software used.
Comments
Comments can be inserted anywhere, enclosed in <!-- and --> (double hyphen)
Special characters in XML

Special character XML replacement


Dun & Bradstreet
< &lt;
Dun &amp; Bradstreet
> &gt;

& &amp;

“ &quot;

' &apos;
XML example – dates

Using a date attribute: Using an expanded <date> element:


<note date="2008-01-10"> <note>
<to>Tove</to> <date>
<from>Jani</from> <year>2008</year>
<subj>Hello there</subj> <month>01</month>
</note> <day>10</day>
</date>
Using a <date> element: <to>Tove</to>
<note> <from>Jani</from>
<date>2008-01-10</date> </note>
<to>Tove</to>
<from>Jani</from>
</note>
https://www.w3schools.com/xml/xml_attributes.asp
XML example
<?xml version="1.0"?>
<family>
<Parent>Yulia </Parent>
<child>
<name>Lucy</name>
<DoB>7 /7 /2005 </DoB>
<gender>female</gender>
</child>

<child>
<name>Matt</name>
<DoB>7/12/2002</DoB>
</child>
<child>
<name>Preetika</name>
<DoB>7/7/2007</DoB>
</child>
</family>

<Average_daily>

<Average_monthly>
XML vs JSON example
{
<?xml version="1.0" encoding="UTF-8" ?> "student": [
<root>
<student> {
<id>01</id> "id":"01",
<name>Tom</name> "name": "Tom",
<lastname>Price</lastname> "lastname": "Price"
</student> },
<student>
<id>02</id> {
<name>Nick</name> "id":"02",
<lastname>Thameson</lastname> "name": "Nick",
</student> "lastname": "Thameson"
</root> }
]
}

JSON vs XML: What’s the Difference?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy