0% found this document useful (0 votes)
114 views

Chapter 2 Notes

The document discusses the process of mastering data, which involves obtaining the necessary data from relational databases and preparing it for analysis through extraction, transformation, and loading (ETL). Specifically, it covers identifying the relevant data through data dictionaries and relationships between tables, extracting the data, validating and cleaning issues like formatting and inconsistencies, and finally loading into analytical tools. Relational databases help ensure data is complete, non-redundant, and follows business rules through uniquely identified primary keys and related foreign keys between tables.

Uploaded by

Emily Cleveland
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
114 views

Chapter 2 Notes

The document discusses the process of mastering data, which involves obtaining the necessary data from relational databases and preparing it for analysis through extraction, transformation, and loading (ETL). Specifically, it covers identifying the relevant data through data dictionaries and relationships between tables, extracting the data, validating and cleaning issues like formatting and inconsistencies, and finally loading into analytical tools. Relational databases help ensure data is complete, non-redundant, and follows business rules through uniquely identified primary keys and related foreign keys between tables.

Uploaded by

Emily Cleveland
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

ACCT 3130

Chapter 2: Mastering the Data

How Data are Used and Stored in the Accounting Cycle


LO 2-1 Understand how data are organized in an accounting information system.
Understand the data by looking at how it is organized:
 Data can be found throughout various systems
 In most cases, you need to know which tables and attributes contain the relevant data
 Unified Modeling Language (UML) is one way to understand databases

LO 2-2 Understand how data are stored in a relational database


 Relational Database:
 Flat file:
Relational databases ensure that data ensure that data:
 Are complete or include all data
 Are not redundant, so they do not take up too much space
 Follow business rules and internal controls
 Aid communication and integration of business processes
There are Four Types of Attributes:
1. Primary keys are unique identifiers (PO number)
2. Foreign keys are attributes that point to a primary key in another table
3. Composite keys are a combination of two foreign keys used for line items
4. Descriptive attributes include everything else (gives additional information)

Data Dictionary
Data dictionaries define what data are acceptable.
 For each attribute, we learn:
o What type of key it is
o What data are required
o What data can be stored in it
o How much data is stored
LO 2-3 What does it mean to extract, transform, and load? (ETL)

The Requesting data is an iterative practice involving 5 steps:


1. Determine the purpose and scope of the data request (extract)
2. Obtain the data (extract)
3. Validate the data for completeness and integrity (transform)
ACCT 3130

4. Sanitize the data (transform)


5. Load the data in preparation for data analysis (load)

Step 1: Determine the purpose and scope of the data request


 Ask a few questions before beginning the process:
• What is the purpose of the data request?
• What do you need the data to solve?
• What business problem will it address?
• What risk exists in data integrity (e.g., reliability, usefulness)?
• What is the mitigation plan?
• What other information will impact the nature, timing, and extent of the data
analysis?

Step 2: Obtain the data


 How will data be requested and/or obtained?
• Do you have access to the data yourself, or do you need to request a database
administrator or the information systems department to provide the data for you?
• If you need to request the data, is there a standard data request form that you
should use?
• From whom do you request the data?
• Where are the data located in the financial or other related systems?
• What specific data are needed (tables and fields)?
• What tools will be used to perform data analytic tests or procedures and why?
 There are a couple options:
• Obtain data through a data request to the IT department.
• Obtain data yourself.

Obtain the data yourself


 If you have direct access to a data warehouse, you can use SQL and other tools to pull
the data yourself.
1. Identify the tables that contain the information you need. You can do this by
looking through the data dictionary or the relationship model.
2. Identify which attributes, specifically, hold the information you need in each
table.
3. Identify how those tables are related to each other.

Step 3: Validate the data for completeness and integrity


 Chances are the data you request isn’t complete. Before you begin, do a little work to
make sure your data are valid:
1. Compare the number of records
2. Compare descriptive statistics for numeric fields
3. Validate Date/Time fields
4. Compare string limits for text fields
ACCT 3130

Step 4: Clean the data


 Once you have valid data, there is still some work that needs to be done to make sure it
is consistent and ready for analysis:
1. Remove headings or subtotals
2. Clean leading zeroes and nonprintable characters
3. Format negative numbers
4. Correct inconsistencies across data, in general

Step 5: Load the data for data analysis


 Finally, you can now import your data into the tool of your choice and expect the
functions to work properly.
Question: What are four common issues with data that must be fixed before analysis can take
place? Four common issues that must be fixed are removing headings or subtotals, cleaning
leading zeroes or nonprintable characters, formatting negative numbers, and correcting
inconsistencies across the data.

Summary
• The first step in the IMPACT cycle is to identify the questions that you intend to answer
through your data analysis project. Once a data analysis problem or question has been
identified, the next step in the IMPACT cycle is mastering the data, which can be broken
down to mean obtaining the data needed and preparing it for analysis.
• In order to obtain the right data, it is important to have a firm grasp of what data are
available to you and how that information is stored.
• Data are often stored in a relational database, which helps to ensure that an
organization’s data are complete and to avoid redundancy. Relational databases
are made up of tables with uniquely identified records (this is done through
primary keys) and are related through the usage of foreign keys.
• To obtain the data, you will either have access to extract the data yourself or you will
need to request the data from a database administrator or the information systems
team. If the latter is the case, you will complete a data request form, indicating exactly
which data you need and why.
• Once you have the data, they will need to be validated for completeness and integrity—
that is, you will need to ensure that all of the data you need were extracted, and that all
data are correct. Sometimes when data are extracted, some formatting or sometimes
even entire records will get lost, resulting in inaccuracies. Correcting the errors and
cleaning the data is an integral step in mastering the data.
• Finally, after the data have been cleaned, there may be one last step of mastering the
data, which is to load them into the tool that will be used for analysis. Often, the
cleaning and correcting of data occur in Excel and the analysis will also be done in Excel.
In this case, there is no need to load the data elsewhere. However, if you intend to do
more rigorous statistical analysis than Excel provides, or if you intend to do more robust
data visualization than can be done in Excel, it may be necessary to load the data into
another tool following the transformation process.
ACCT 3130

Questions
1. Mastering the data can also be described via the ETL process. The ETL process stands
for:
A. Extract, total, and load data
B. Enter, transform, and load data
C. Extract, transform, and load data
D. Enter, total, and load data
2. Which of the following describes part of the goal of the ETL process:
A. Identify which approach to data analytics should be used
B. Load the data into a relational database for storage
C. Communicate the results and insights found through the analysis
D. Identify and obtain the data needed for solving the problem
3. The advantages of storing data in a relational database include which of the following?
A. Help in enforcing business rules
B. Increased information redundancy
C. Integrating business processes
D. All of the above
E. Only A and B
F. Only B and C
G. Only A and C
4. The purpose of transforming data is:
A. To validate the data for completeness and integrity
B. To load the data into the appropriate tool for analysis
C. To obtain the data from the appropriate source
D. To identify which data are necessary to complete the analysis
5. Which attribute is required to exist in each table of a relational database and serves as
the “unique identifier” for each record in a table?
A. Foreign key
B. Unique identifier
C. Primary key
D. Key attribute
6. The metadata that describes each attribute in a database is which of the following?
A. Composite primary key
B. Data dictionary
C. Descriptive attributes
D. Flat file
7. As mentioned in the chapter, which of the following is not a common way that data will
need to be cleaned after extraction and validation?
A. Remove headings and subtotals
B. Format negative numbers
C. Clean up trailing zeroes
D. Correct inconsistencies across data
8. Why is Supplier ID considered to be a primary key for a Supplier table?
ACCT 3130

A. It contains a unique identifier for each supplier


B. It is a 10-digit number
C. It can either be for a vendor or miscellaneous provider
D. It is used to identify different supplier categories
9. What are attributes that exist in a relational database that are neither primary nor
foreign keys?
A. Nondescript attributes
B. Descriptive attributes
C. Composite key
D. Relational table attributes
10. Which of these is not included in the five steps of the ETL process?
A. Determine the purpose and scope of the data request
B. Obtain the data
C. Validate the data for completeness and integrity
D. Scrub the data

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy