0% found this document useful (0 votes)

114 views

Chapter 2 Notes

The document discusses the process of mastering data, which involves obtaining the necessary data from relational databases and preparing it for analysis through extraction, transformation, and loading (ETL). Specifically, it covers identifying the relevant data through data dictionaries and relationships between tables, extracting the data, validating and cleaning issues like formatting and inconsistencies, and finally loading into analytical tools. Relational databases help ensure data is complete, non-redundant, and follows business rules through uniquely identified primary keys and related foreign keys between tables.

Uploaded by

Emily Cleveland

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views

Chapter 2 Notes

Uploaded by

Emily Cleveland

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

ACCT 3130

Chapter 2: Mastering the Data

How Data are Used and Stored in the Accounting Cycle

LO 2-1 Understand how data are organized in an accounting information system.
Understand the data by looking at how it is organized:
 Data can be found throughout various systems
 In most cases, you need to know which tables and attributes contain the relevant data
 Unified Modeling Language (UML) is one way to understand databases

LO 2-2 Understand how data are stored in a relational database

 Relational Database:
 Flat file:
Relational databases ensure that data ensure that data:
 Are complete or include all data
 Are not redundant, so they do not take up too much space
 Follow business rules and internal controls
 Aid communication and integration of business processes
There are Four Types of Attributes:
1. Primary keys are unique identifiers (PO number)
2. Foreign keys are attributes that point to a primary key in another table
3. Composite keys are a combination of two foreign keys used for line items
4. Descriptive attributes include everything else (gives additional information)

Data Dictionary
Data dictionaries define what data are acceptable.
 For each attribute, we learn:
o What type of key it is
o What data are required
o What data can be stored in it
o How much data is stored
LO 2-3 What does it mean to extract, transform, and load? (ETL)

The Requesting data is an iterative practice involving 5 steps:

1. Determine the purpose and scope of the data request (extract)
2. Obtain the data (extract)
3. Validate the data for completeness and integrity (transform)
ACCT 3130

4. Sanitize the data (transform)

5. Load the data in preparation for data analysis (load)

Step 1: Determine the purpose and scope of the data request

 Ask a few questions before beginning the process:
• What is the purpose of the data request?
• What do you need the data to solve?
• What business problem will it address?
• What risk exists in data integrity (e.g., reliability, usefulness)?
• What is the mitigation plan?
• What other information will impact the nature, timing, and extent of the data
analysis?

Step 2: Obtain the data

 How will data be requested and/or obtained?
• Do you have access to the data yourself, or do you need to request a database
administrator or the information systems department to provide the data for you?
• If you need to request the data, is there a standard data request form that you
should use?
• From whom do you request the data?
• Where are the data located in the financial or other related systems?
• What specific data are needed (tables and fields)?
• What tools will be used to perform data analytic tests or procedures and why?
 There are a couple options:
• Obtain data through a data request to the IT department.
• Obtain data yourself.

Obtain the data yourself

 If you have direct access to a data warehouse, you can use SQL and other tools to pull
the data yourself.
1. Identify the tables that contain the information you need. You can do this by
looking through the data dictionary or the relationship model.
2. Identify which attributes, specifically, hold the information you need in each
table.
3. Identify how those tables are related to each other.

Step 3: Validate the data for completeness and integrity

 Chances are the data you request isn’t complete. Before you begin, do a little work to
make sure your data are valid:
1. Compare the number of records
2. Compare descriptive statistics for numeric fields
3. Validate Date/Time fields
4. Compare string limits for text fields
ACCT 3130

Step 4: Clean the data

 Once you have valid data, there is still some work that needs to be done to make sure it
is consistent and ready for analysis:
1. Remove headings or subtotals
2. Clean leading zeroes and nonprintable characters
3. Format negative numbers
4. Correct inconsistencies across data, in general

Step 5: Load the data for data analysis

 Finally, you can now import your data into the tool of your choice and expect the
functions to work properly.
Question: What are four common issues with data that must be fixed before analysis can take
place? Four common issues that must be fixed are removing headings or subtotals, cleaning
leading zeroes or nonprintable characters, formatting negative numbers, and correcting
inconsistencies across the data.

Summary
• The first step in the IMPACT cycle is to identify the questions that you intend to answer
through your data analysis project. Once a data analysis problem or question has been
identified, the next step in the IMPACT cycle is mastering the data, which can be broken
down to mean obtaining the data needed and preparing it for analysis.
• In order to obtain the right data, it is important to have a firm grasp of what data are
available to you and how that information is stored.
• Data are often stored in a relational database, which helps to ensure that an
organization’s data are complete and to avoid redundancy. Relational databases
are made up of tables with uniquely identified records (this is done through
primary keys) and are related through the usage of foreign keys.
• To obtain the data, you will either have access to extract the data yourself or you will
need to request the data from a database administrator or the information systems
team. If the latter is the case, you will complete a data request form, indicating exactly
which data you need and why.
• Once you have the data, they will need to be validated for completeness and integrity—
that is, you will need to ensure that all of the data you need were extracted, and that all
data are correct. Sometimes when data are extracted, some formatting or sometimes
even entire records will get lost, resulting in inaccuracies. Correcting the errors and
cleaning the data is an integral step in mastering the data.
• Finally, after the data have been cleaned, there may be one last step of mastering the
data, which is to load them into the tool that will be used for analysis. Often, the
cleaning and correcting of data occur in Excel and the analysis will also be done in Excel.
In this case, there is no need to load the data elsewhere. However, if you intend to do
more rigorous statistical analysis than Excel provides, or if you intend to do more robust
data visualization than can be done in Excel, it may be necessary to load the data into
another tool following the transformation process.
ACCT 3130

Questions
1. Mastering the data can also be described via the ETL process. The ETL process stands
for:
A. Extract, total, and load data
B. Enter, transform, and load data
C. Extract, transform, and load data
D. Enter, total, and load data
2. Which of the following describes part of the goal of the ETL process:
A. Identify which approach to data analytics should be used
B. Load the data into a relational database for storage
C. Communicate the results and insights found through the analysis
D. Identify and obtain the data needed for solving the problem
3. The advantages of storing data in a relational database include which of the following?
A. Help in enforcing business rules
B. Increased information redundancy
C. Integrating business processes
D. All of the above
E. Only A and B
F. Only B and C
G. Only A and C
4. The purpose of transforming data is:
A. To validate the data for completeness and integrity
B. To load the data into the appropriate tool for analysis
C. To obtain the data from the appropriate source
D. To identify which data are necessary to complete the analysis
5. Which attribute is required to exist in each table of a relational database and serves as
the “unique identifier” for each record in a table?
A. Foreign key
B. Unique identifier
C. Primary key
D. Key attribute
6. The metadata that describes each attribute in a database is which of the following?
A. Composite primary key
B. Data dictionary
C. Descriptive attributes
D. Flat file
7. As mentioned in the chapter, which of the following is not a common way that data will
need to be cleaned after extraction and validation?
A. Remove headings and subtotals
B. Format negative numbers
C. Clean up trailing zeroes
D. Correct inconsistencies across data
8. Why is Supplier ID considered to be a primary key for a Supplier table?
ACCT 3130

A. It contains a unique identifier for each supplier

B. It is a 10-digit number
C. It can either be for a vendor or miscellaneous provider
D. It is used to identify different supplier categories
9. What are attributes that exist in a relational database that are neither primary nor
foreign keys?
A. Nondescript attributes
B. Descriptive attributes
C. Composite key
D. Relational table attributes
10. Which of these is not included in the five steps of the ETL process?
A. Determine the purpose and scope of the data request
B. Obtain the data
C. Validate the data for completeness and integrity
D. Scrub the data

Data Analytics Lecture Notes
100% (1)
Data Analytics Lecture Notes
10 pages
Week 3 - Data Engineering Lifecycle
100% (1)
Week 3 - Data Engineering Lifecycle
6 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
DAA_Chapter 02
No ratings yet
DAA_Chapter 02
12 pages
Microsoft PowerPoint - DAA_Chapter 02
No ratings yet
Microsoft PowerPoint - DAA_Chapter 02
8 pages
Slide for Chapter 2
No ratings yet
Slide for Chapter 2
16 pages
1. Week2_Master the data
No ratings yet
1. Week2_Master the data
28 pages
Chapter 2 - Preparing Data for Analysis
No ratings yet
Chapter 2 - Preparing Data for Analysis
35 pages
Ais Elect - Reviewer
No ratings yet
Ais Elect - Reviewer
5 pages
DAA - Chapter 02
No ratings yet
DAA - Chapter 02
11 pages
02 - Data Preparation and Cleaning
No ratings yet
02 - Data Preparation and Cleaning
16 pages
MGMT 134 C2 Notes
No ratings yet
MGMT 134 C2 Notes
5 pages
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
No ratings yet
Unit - Iii: ETL: Data Extraction, Transformation, Cleansing, Loading Data Warehouse Information Flows
36 pages
02 - Data Preparation and Cleaning
No ratings yet
02 - Data Preparation and Cleaning
16 pages
Angeilyn Roda Activity 2
No ratings yet
Angeilyn Roda Activity 2
6 pages
Data Analyst Interview Questions PDF - E-Learning Portal
No ratings yet
Data Analyst Interview Questions PDF - E-Learning Portal
18 pages
Data Analysis and Information Management
No ratings yet
Data Analysis and Information Management
13 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
Chapter 1.3
No ratings yet
Chapter 1.3
9 pages
dsbd
No ratings yet
dsbd
23 pages
Data - Analytics - Interview - Q and A
No ratings yet
Data - Analytics - Interview - Q and A
64 pages
BA160 Concepts of Information Systems Home Work 2 by Henok Seifu & Natnael Nigussie
No ratings yet
BA160 Concepts of Information Systems Home Work 2 by Henok Seifu & Natnael Nigussie
19 pages
Imran Introduction To DWH-5
No ratings yet
Imran Introduction To DWH-5
26 pages
Unit III DWM
No ratings yet
Unit III DWM
13 pages
Module 2
No ratings yet
Module 2
117 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
01_Tutorial_ISB_L1-L2_shared
No ratings yet
01_Tutorial_ISB_L1-L2_shared
13 pages
General Data Analyst Interview Questions
No ratings yet
General Data Analyst Interview Questions
7 pages
Summary_ Lifecycle of Data Analysis -3982
No ratings yet
Summary_ Lifecycle of Data Analysis -3982
7 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Data Analytic For Accounting (DAFA) Main Reference
No ratings yet
Data Analytic For Accounting (DAFA) Main Reference
448 pages
Cyber Security Unit - 5
No ratings yet
Cyber Security Unit - 5
43 pages
ppt3
No ratings yet
ppt3
15 pages
Da CH 01 Answer Paper
No ratings yet
Da CH 01 Answer Paper
3 pages
Dafd Unit-2
No ratings yet
Dafd Unit-2
24 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
DAA - Chapter 02
No ratings yet
DAA - Chapter 02
10 pages
FDS UNIT 1 Part2
No ratings yet
FDS UNIT 1 Part2
47 pages
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
Handouts
No ratings yet
Handouts
19 pages
Data Analytics_Module-1.2
No ratings yet
Data Analytics_Module-1.2
55 pages
ETL Interview Questions
No ratings yet
ETL Interview Questions
19 pages
bi-unit-3
No ratings yet
bi-unit-3
26 pages
Data Analyst Interview Questions
No ratings yet
Data Analyst Interview Questions
39 pages
UNIT - 2 .DataScience 04.09.18
No ratings yet
UNIT - 2 .DataScience 04.09.18
53 pages
Acceptance_Testing_and_ETL_Process_j8Mus6Ctvj
No ratings yet
Acceptance_Testing_and_ETL_Process_j8Mus6Ctvj
19 pages
Data Analyst Questions
No ratings yet
Data Analyst Questions
39 pages
Blank Interview Questions 2024 DA
No ratings yet
Blank Interview Questions 2024 DA
7 pages
Assignment 6 Data Management Pharamaceuticle Laboration
No ratings yet
Assignment 6 Data Management Pharamaceuticle Laboration
9 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
Data Processing
No ratings yet
Data Processing
43 pages
Project Presentation2
No ratings yet
Project Presentation2
22 pages
Data Analytics Interview Questions
No ratings yet
Data Analytics Interview Questions
3 pages
Data-Driven Fraud Detection: Bwanika Najib
No ratings yet
Data-Driven Fraud Detection: Bwanika Najib
34 pages
Ata Analytics - 5 Data Analytics Software: About Jim Kaplan, CIA, CFE
No ratings yet
Ata Analytics - 5 Data Analytics Software: About Jim Kaplan, CIA, CFE
32 pages
03 Etl 081028 2055
No ratings yet
03 Etl 081028 2055
46 pages
Exploring Data with Access 2016
From Everand
Exploring Data with Access 2016
Larry Rockoff
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Exploring Data with Access 2019
From Everand
Exploring Data with Access 2019
Larry Rockoff
No ratings yet
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
HACMP Commandline
No ratings yet
HACMP Commandline
5 pages
MD5 PDF
No ratings yet
MD5 PDF
15 pages
Cis 158 Final Exam: Indicate Whether The Statement Is True or False
No ratings yet
Cis 158 Final Exam: Indicate Whether The Statement Is True or False
4 pages
Training Report On Linux
No ratings yet
Training Report On Linux
66 pages
Peripheral Interfacing
100% (1)
Peripheral Interfacing
32 pages
F5 Networks Configuring BIG-IP AFM: Advanced Firewall Manager
No ratings yet
F5 Networks Configuring BIG-IP AFM: Advanced Firewall Manager
5 pages
System Unit-The Main Part of A Microcomputer, Sometimes Called The
No ratings yet
System Unit-The Main Part of A Microcomputer, Sometimes Called The
6 pages
Technical Specification Architecture For Captiva
No ratings yet
Technical Specification Architecture For Captiva
9 pages
NoSql Lab4
No ratings yet
NoSql Lab4
26 pages
Basic Java Refresher
No ratings yet
Basic Java Refresher
18 pages
FSD Unit - 3 - Part-1
No ratings yet
FSD Unit - 3 - Part-1
15 pages
WINCE Min Requirements BSP
No ratings yet
WINCE Min Requirements BSP
5 pages
Power Bi
No ratings yet
Power Bi
68 pages
SQL Loader Basics
No ratings yet
SQL Loader Basics
13 pages
How To Convert A VARCHAR2 Column To A CLOB Column PDF
No ratings yet
How To Convert A VARCHAR2 Column To A CLOB Column PDF
4 pages
CITS1402 Mid Sem Notes
No ratings yet
CITS1402 Mid Sem Notes
4 pages
Operate Database Application
No ratings yet
Operate Database Application
26 pages
Sample Alv Reports
100% (1)
Sample Alv Reports
51 pages
Download Practical Entity Framework: Database Access for Enterprise Applications 1st Edition Brian L. Gorman ebook All Chapters PDF
No ratings yet
Download Practical Entity Framework: Database Access for Enterprise Applications 1st Edition Brian L. Gorman ebook All Chapters PDF
55 pages
Programming Project, C++ Programming
No ratings yet
Programming Project, C++ Programming
11 pages
Adaptive Huffman Coding PDF
No ratings yet
Adaptive Huffman Coding PDF
7 pages
PACS AA Diagnostic Utility Installation and Execution Guide
No ratings yet
PACS AA Diagnostic Utility Installation and Execution Guide
9 pages
VROC Product Brief
No ratings yet
VROC Product Brief
2 pages
GIT Interview QA ?
No ratings yet
GIT Interview QA ?
10 pages
Subham
No ratings yet
Subham
4 pages
Eye of The Beholder II - Install Guide
No ratings yet
Eye of The Beholder II - Install Guide
2 pages
Best Computer Mcqs Over 1000 by MD Khalil Uddinpdf
100% (1)
Best Computer Mcqs Over 1000 by MD Khalil Uddinpdf
58 pages
Blitzkrieg "How To" Tutorial: Major Pain
No ratings yet
Blitzkrieg "How To" Tutorial: Major Pain
45 pages
How To Create A Custom Startup Winpe 2.X CD
No ratings yet
How To Create A Custom Startup Winpe 2.X CD
13 pages
How To Configure DID Devices For Replication Using EMC SRDF (Sun Cluster System Administration Guide For Solaris OS)
No ratings yet
How To Configure DID Devices For Replication Using EMC SRDF (Sun Cluster System Administration Guide For Solaris OS)
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chapter 2 Notes

Uploaded by

Chapter 2 Notes

Uploaded by

ACCT 3130

Chapter 2: Mastering the Data

How Data are Used and Stored in the Accounting Cycle

LO 2-2 Understand how data are stored in a relational database

The Requesting data is an iterative practice involving 5 steps:

4. Sanitize the data (transform)

Step 1: Determine the purpose and scope of the data request

Step 2: Obtain the data

Obtain the data yourself

Step 3: Validate the data for completeness and integrity

Step 4: Clean the data

Step 5: Load the data for data analysis

A. It contains a unique identifier for each supplier

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.