BIA 5000 Introduction To Analytics - Lesson 2
BIA 5000 Introduction To Analytics - Lesson 2
TO ANALYTICS
2023 – 2024
LESSON 2.
DATA LIFE CYCLE
Learning Objectives
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Data Life Cycle
Sourcing Collecting and capturing data values from various sources.
A.k.a Data capture/Data acquisition
Storage & Storing, maintaining and preparing data for usage.
preparation A.k.a. Storage & maintenance
Protection & Application of data to the tasks needed to operate the enterprise while
usage protecting the data.
A.k.a Permitted use of data
Sharing Sending data to users or entities that require the data for certain purposes,
both inside and outside the enterprise.
A.k.a. “publication”
Archiving Archiving data that is no longer actively used for a defined retention period.
Destruction Removal of every copy of data item from enterprise.
A.k.a. Purging / Permanently destroying
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
• Obtain data externally
Sharing
• Create or enter data
• Receive and capture data signals
Archiving
Destruction
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
• Move and store data
• Cleanse and enrich data Archiving
• Transform and synthesise data
• Integrate data from multiple sources Destruction
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
• Apply data to enterprise tasks
• Protect, monitor and audit usage Destruction
• Search, classify and explore data
• Model and analyse data
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
• Data publication
Destruction
• Visualization
• Data sharing, moving and copying
• Delivering data products to customers
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Module 7: Analytics • Phases in analytics projects – how do they relate to data life
project basics cycle
Sharing
Archiving
Destruction
DMBOK KNOWLEDGE
AREAS
Module 2
DAMA and DMBOK
DAMA DMBOK ®: Data Management Association (DAMA) Data Management Body of Knowledge
https://dama.org/content/body-knowledge
DMBOK Data Management
Knowledge Areas
Data Management is an
overarching term that
describes the processes
used to plan, specify,
enable, create, acquire,
maintain, use, archive,
retrieve, control, and purge
data. These processes
overlap and interact within
each data management
knowledge area.
DMBOK Planning, oversight, and control over management of data and the use
Definition of data and data-related resources.
Order
Management
Data Modeling & Design
DMBOK Managing shared data to reduce redundancy and ensure better data
Definition quality through standardized definition and use of data values
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Business Insights & Analytics: how does it fit in?
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
GOOD AND BAD
DATA
Module 2
The five C’s of data
Consistent data must follow the same standard, definitions and use
Consistent the same codes and ranges of values to reflect the same meaning
Text
Numbers Social media
XML files
Categories Satellite images
Email
Codes Presentations
JSON messages
Dates PDFs
Digital photo files
Character strings Audio recordings
Accessible PDFs
Binary (True/False) Video
Website content
Rectangular datasets
(spreadsheets, database
tables)
XML Basics
Example: https://learning-oreilly-com.ezproxy.humber.ca/library/view/xml-visual-
quickstart/9780321602589/ch02.html
XML example
Indentation
It is a good practice to indent child elements relative to parents to make XML
documents easier to read and interpret by a human (see examples in the source)
Nesting
Elements must be properly nested
If you start element A, then start element B, you must
first close element B before closing element A
<root>
<child>
Root element
<grandchild>
</grandchild>
Grandchild element
</child>
Grandchild element </root>
Grandchild element
<root>
Child element <child>
<grandchild>
Toopy
Child element
</child>
XML syntax
XML declaration
Should be included at the beginning of each XML file: <?xml version="1.0"?>
Case matters
XML is case sensitive. Starting and closing tags must use the same capitalization.
Tag names
Names must begin with a letter, underscore, or colon, and may contain letters, digits, and
underscores.
Spaces are not allowed. Although valid, it is recommended to avoid including colons,
dashes, and periods within your names.
Names that begin with the letters xml, in any combination of upper- and lowercase, are
not allowed.
Tag contents does not require any additional format
XML: Visual QuickStart Guide, Second Edition
Everything within starting and closing tag is considered the tag content by Kevin Howard Goldberg Published by
Peachpit Press, 2008
XML syntax
Attribute values must be enclosed in quotation marks
An attribute’s value must always be enclosed in either matching single or double
quotation marks.
No spaces between attribute name and value.
White Space
You can add extra white space, including line breaks, around the elements in your XML
code to make it easier to edit and view.
While extra white space is visible in the file and when passed to other applications, it is
ignored by the XML processor,
Language support
Tag and element names do not need to be in English – it can be any language supported
by the software used.
Comments
Comments can be inserted anywhere, enclosed in <!-- and --> (double hyphen)
Special characters in XML
& &
“ "
' '
XML example – dates
<child>
<name>Matt</name>
<DoB>7/12/2002</DoB>
</child>
<child>
<name>Preetika</name>
<DoB>7/7/2007</DoB>
</child>
</family>
<Average_daily>
<Average_monthly>
XML vs JSON example
{
<?xml version="1.0" encoding="UTF-8" ?> "student": [
<root>
<student> {
<id>01</id> "id":"01",
<name>Tom</name> "name": "Tom",
<lastname>Price</lastname> "lastname": "Price"
</student> },
<student>
<id>02</id> {
<name>Nick</name> "id":"02",
<lastname>Thameson</lastname> "name": "Nick",
</student> "lastname": "Thameson"
</root> }
]
}