Chapter 2 Notes
Chapter 2 Notes
Data Dictionary
Data dictionaries define what data are acceptable.
For each attribute, we learn:
o What type of key it is
o What data are required
o What data can be stored in it
o How much data is stored
LO 2-3 What does it mean to extract, transform, and load? (ETL)
Summary
• The first step in the IMPACT cycle is to identify the questions that you intend to answer
through your data analysis project. Once a data analysis problem or question has been
identified, the next step in the IMPACT cycle is mastering the data, which can be broken
down to mean obtaining the data needed and preparing it for analysis.
• In order to obtain the right data, it is important to have a firm grasp of what data are
available to you and how that information is stored.
• Data are often stored in a relational database, which helps to ensure that an
organization’s data are complete and to avoid redundancy. Relational databases
are made up of tables with uniquely identified records (this is done through
primary keys) and are related through the usage of foreign keys.
• To obtain the data, you will either have access to extract the data yourself or you will
need to request the data from a database administrator or the information systems
team. If the latter is the case, you will complete a data request form, indicating exactly
which data you need and why.
• Once you have the data, they will need to be validated for completeness and integrity—
that is, you will need to ensure that all of the data you need were extracted, and that all
data are correct. Sometimes when data are extracted, some formatting or sometimes
even entire records will get lost, resulting in inaccuracies. Correcting the errors and
cleaning the data is an integral step in mastering the data.
• Finally, after the data have been cleaned, there may be one last step of mastering the
data, which is to load them into the tool that will be used for analysis. Often, the
cleaning and correcting of data occur in Excel and the analysis will also be done in Excel.
In this case, there is no need to load the data elsewhere. However, if you intend to do
more rigorous statistical analysis than Excel provides, or if you intend to do more robust
data visualization than can be done in Excel, it may be necessary to load the data into
another tool following the transformation process.
ACCT 3130
Questions
1. Mastering the data can also be described via the ETL process. The ETL process stands
for:
A. Extract, total, and load data
B. Enter, transform, and load data
C. Extract, transform, and load data
D. Enter, total, and load data
2. Which of the following describes part of the goal of the ETL process:
A. Identify which approach to data analytics should be used
B. Load the data into a relational database for storage
C. Communicate the results and insights found through the analysis
D. Identify and obtain the data needed for solving the problem
3. The advantages of storing data in a relational database include which of the following?
A. Help in enforcing business rules
B. Increased information redundancy
C. Integrating business processes
D. All of the above
E. Only A and B
F. Only B and C
G. Only A and C
4. The purpose of transforming data is:
A. To validate the data for completeness and integrity
B. To load the data into the appropriate tool for analysis
C. To obtain the data from the appropriate source
D. To identify which data are necessary to complete the analysis
5. Which attribute is required to exist in each table of a relational database and serves as
the “unique identifier” for each record in a table?
A. Foreign key
B. Unique identifier
C. Primary key
D. Key attribute
6. The metadata that describes each attribute in a database is which of the following?
A. Composite primary key
B. Data dictionary
C. Descriptive attributes
D. Flat file
7. As mentioned in the chapter, which of the following is not a common way that data will
need to be cleaned after extraction and validation?
A. Remove headings and subtotals
B. Format negative numbers
C. Clean up trailing zeroes
D. Correct inconsistencies across data
8. Why is Supplier ID considered to be a primary key for a Supplier table?
ACCT 3130