Data Analytics Notes (Autorecovered)
Data Analytics Notes (Autorecovered)
Course content
Course 1– Foundations: Data, Data, Everywhere
1. Introducing data analytics: Data helps us make decisions, in everyday life and in
business. In this first part of the course, you will learn how data analysts use tools of
their trade to inform those decisions. You will also get to know more about this course
and the overall program expectations.
2. Thinking analytically: Data analysts balance many different roles in their work. In this
part of the course, you will learn about some of these roles and the key skills that are
required. You will also explore analytical thinking and how it relates to data-driven
decision making.
3. Exploring the wonderful world of data: Data has its own life cycle, and data analysts
use an analysis process that cuts across and leverages this life cycle. In this part of the
course, you will learn about the data life cycle and data analysis process. They are both
relevant to your work in this program and on the job as a future data analyst. You will be
introduced to applications that help guide data through the data analysis process.
4. Setting up a data toolbox: Spreadsheets, query languages, and data visualization tools
are all a big part of a data analyst’s job. In this part of the course, you will learn the basic
concepts to use them for data analysis. You will understand how they work through
examples provided.
5. Discovering data career possibilities: All kinds of businesses value the work that data
analysts do. In this part of the course, you will examine different types of businesses and
the jobs and tasks that analysts do for them. You will also learn how a Google Data
Analytics Certificate will help you meet many of the requirements for a position with
these organizations.
6. Completing the Course Challenge: At the end of this course, you will be able to put
everything you have learned into perspective with the Course Challenge. The Course
Challenge will ask you questions about the main concepts you have learned and then
give you an opportunity to apply those concepts in two scenarios.
1.
Question 1
Optional speed track for those experienced in data analytics
A clothing retailer collects and stores data about its sales revenue. Which of the following would be part of its
data ecosystem? Select all that apply.
1 / 1 point
2.
Question 2
What is the process of guiding business strategy using facts?
1 / 1 point
Data-driven decision-making
Analytical planning
Strategic improvement
Correct
Data-driven decision-making is the process of guiding business strategy using facts.
3.
Question 3
Fill in the blank: Curiosity, understanding context, having a technical mindset, data design, and data strategy are
_____. They enable data analysts to solve problems using facts.
1 / 1 point
thought processes
analytical skills
personal insights
business skills
Correct
Curiosity, understanding context, having a technical mindset, data design, and data strategy are analytical skills.
They enable data analysts to solve problems using facts.
4.
Question 4
The owner of a skate shop notices that every time a certain employee has a shift, there are higher sales numbers
at the end of the day. After some investigation, the owner realizes that since the employee was hired, the store
earns 15% more each month. In this scenario, the manager used which quality of analytical thinking?
1 / 1 point
Visualization
Correlation
Problem-orientation
Big-picture thinking
Correct
The owner used correlation, which involves being able to identify a relationship between two or more pieces of
data. They noticed that there is a correlation between the employee’s presence and the skate shop’s traffic and
monthly income.
5.
Question 5
Gap analysis is a process that could help accomplish which of the following tasks? Select all that apply.
1 / 1 point
Increase the efficiency of a car manufacturer based on its current assembly process
Correct
Gap analysis is a method for examining and evaluating how a process works currently in order to get where you
want to be in the future. Improving accessibility, increasing efficiency, and reducing carbon emissions are
examples of improvements that gap analysis can help accomplish.
Share
Act
Process
Analyze
Correct
The act phase is when insights are put into action. This involves a company or organization implementing a plan
to solve the original business problem.
7.
Question 7
A data analyst adds descriptive headers to columns of data in a spreadsheet. How does this improve the
spreadsheet?
1 / 1 point
It adds context
8.
Question 8
This is a selection from a spreadsheet that ranks the 10 most populous cities in North Carolina. To alphabetize
the county names in column D, which spreadsheet tool would you use?
A B C D
1 Rank Name Population County
2 7 Cary 170,282 Wake, Chatham
Name range
Organize range
Sort range
Correct
You can use sort range to alphabetize the county names in column D. Sorting a range of data from A to Z helps
data analysts organize and find data more quickly.
9.
Question 9
You are querying a database of manufacturing company suppliers. The column name for supplier identification
numbers is supplier_id. What is the correct clause to retrieve only data about the supplier with identification
number 85317?
1 / 1 point
10.
Question 10
Imagine you are sharing your data with a company stakeholder. Why might you display data with a data
visualization instead of a table? Select all that apply.
1 / 1 point
The six steps of the data analysis process that you have been learning in this program are:
ask, prepare, process, analyze, share, and act. These six steps apply to any data analysis.
Continue reading to learn how a team of people analysts used these six steps to answer a
business question.
An organization was experiencing a high turnover rate among new hires. Many employees left
the company before the end of their first year on the job. The analysts used the data analysis
process to answer the following question: how can the organization improve the
retention rate for new employees?
First up, the analysts needed to define what the project would look like and what would
qualify as a successful result. So, to determine these things, they asked effective questions
and collaborated with leaders and managers who were interested in the outcome of their
people analysis. These were the kinds of questions they asked:
• What do you think new employees need to learn to be successful in their first year on
the job?
• Have you gathered data from new employees before? If so, may we have access to the
historical data?
• Do you believe managers with higher retention rates offer new employees something
extra or unique?
• What do you suspect is a leading cause of dissatisfaction among new employees?
• By what percentage would you like employee retention to increase in the next fiscal
year?
It all started with solid preparation. The group built a timeline of three months and decided
how they wanted to relay their progress to interested parties. Also during this step, the
analysts identified what data they needed to achieve the successful result they identified in
the previous step - in this case, the analysts chose to gather the data from an online survey of
new employees. These were the things they did to prepare:
• They developed specific questions to ask about employee satisfaction with different
business processes, such as hiring and onboarding, and their overall compensation.
• They established rules for who would have access to the data collected - in this case,
anyone outside the group wouldn't have access to the raw data, but could view
summarized or aggregated data. For example, an individual's compensation wouldn't
be available, but salary ranges for groups of individuals would be viewable.
• They finalized what specific information would be gathered, and how best to present
the data visually. The analysts brainstormed possible project- and data-related issues
and how to avoid them.
The group sent the survey out. Great analysts know how to respect both their data and the
people who provide it. Since employees provided the data, it was important to make sure all
employees gave their consent to participate. The data analysts also made sure employees
understood how their data would be collected, stored, managed, and protected.
Collecting and using data ethically is one of the responsibilities of data analysts. In order to
maintain confidentiality and protect and store the data effectively, these were the steps they
took:
Then, the analysts did what they do best: analyze! From the completed surveys, the data
analysts discovered that an employee’s experience with certain processes was a key
indicator of overall job satisfaction. These were their findings:
• Employees who experienced a long and complicated hiring process were most likely
to leave the company.
• Employees who experienced an efficient and transparent evaluation and feedback
process were most likely to remain with the company.
The group knew it was important to document exactly what they found in the analysis, no
matter what the results. To do otherwise would diminish trust in the survey process and
reduce their ability to collect truthful data from employees in the future.
Just as they made sure the data was carefully protected, the analysts were also careful
sharing the report. This is how they shared their findings:
• They shared the report with managers who met or exceeded the minimum number of
direct reports with submitted responses to the survey.
• They presented the results to the managers to make sure they had the full picture.
• They asked the managers to personally deliver the results to their teams.
This process gave managers an opportunity to communicate the results with the right
context. As a result, they could have productive team conversations about next steps to
improve employee engagement.
The last stage of the process for the team of analysts was to work with leaders within their
company and decide how best to implement changes and take actions based on the
findings. These were their recommendations:
• Standardize the hiring and evaluation process for employees based on the most
efficient and transparent practices.
• Conduct the same survey annually and compare results with those from the previous
year.
A year later, the same survey was distributed to employees. Analysts anticipated that a
comparison between the two sets of results would indicate that the action plan worked.
Turns out, the changes improved the retention rate for new employees and the actions taken
by leaders were successful!
Data and gut instinct
Detectives and data analysts have a lot in common. Both depend on facts and clues to make
decisions. Both collect and look at the evidence. Both talk to people who know part of the story. And
both might even follow some footprints to see where they lead. Whether you’re a detective or a data
analyst, your job is all about following steps to collect and understand facts.
Analysts use data-driven decision-making and follow a step-by-step process. You have learned that
there are six steps to this process:
Consider an example of a restaurant entrepreneur, partnering with a well known chef to develop a
new restaurant in a bustling part of the city’s central shopping district. The well known chef has
several restaurants across the city. Banking on their reputation, the restaurant entrepreneur and chef
followed gut instinct and created another uniquely themed restaurant. However, fundraising efforts
fell short to fund the opening of the restaurant after months of planning and preparation. The
property will go back on the market to be sold at a loss. Had the entrepreneur done more research,
they would've found data showing prospective customers in this new restaurant location were very
different from the chef's other restaurants.
The more you understand the data related to a project, the easier it will be to figure out what is
required. These efforts will also help you identify errors and gaps in your data so you can
communicate your findings more effectively. Sometimes past experience helps you make a
connection that no one else would notice. For example, a detective might be able to crack open a case
because they remember an old case just like the one they’re solving today. It's not just gut instinct.
Data + business knowledge = mystery solved
Blending data with business knowledge, plus maybe a touch of gut instinct, will be a common part of
your process as a junior data analyst. The key is figuring out the exact mix for each particular project.
A lot of times, it will depend on the goals of your analysis. That is why analysts often ask, “How do I
define success for this project?”
In addition, try asking yourself these questions about a project to help find the perfect balance:
It is time to enter the data analysis life cycle—the process of going from data to decision. Data goes
through several phases as it gets created, consumed, tested, processed, and reused. With a life cycle
model, all key team members can drive success by planning work both up front and at the end of the
data analysis process. While the data analysis life cycle is well known among experts, there isn't a
single defined structure of those phases. There might not be one single architecture that’s uniformly
followed by every data analysis expert, but there are some shared fundamentals in every data analysis
process. This reading provides an overview of several, starting with the process that forms the
foundation of the Google Data Analytics Certificate.
The process presented as part of the Google Data Analytics Certificate is one that will be valuable to
you as you keep moving forward in your career:
For more information, refer to this e-book, Data Science & Big Data Analytics.
1. Ask
2. Prepare
3. Explore
4. Model
5. Implement
6. Act
7. Evaluate
The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol.
Their life cycle has seven steps, many of which we have seen in the other models, like Ask, Prepare,
Model, and Act. But this life cycle is also a little different; it includes a step after the act phase designed
to help analysts evaluate their solutions and potentially return to the ask phase again.
For more information, refer to Managing the Analytics Life Cycle for Decisions at Scale.
For more information, refer to Understanding the data analytics project life cycle.
For more information, refer to Big Data Adoption and Planning Considerations.
Key takeaway
From our journey to the pyramids and data in ancient Egypt to now, the way we analyze data has
evolved (and continues to do so). The data analysis process is like real life architecture, there are
different ways to do things but the same core ideas still appear in each model of the process. Whether
you use the structure of this Google Data Analytics Certificate or one of the many other iterations you
have learned about, we are here to help guide you as you continue on your data journey.
2.
Question 2
Fill in the blank: In data analytics, the data ecosystem refers to the various elements that interact with one
another to produce, manage, store, _____, analyze, and share data.
1 / 1 point
organize
ingest
locate
merge
Correct
In data analytics, the data ecosystem refers to the various elements that interact with one another to produce,
manage, store, organize, analyze, and share data.
3.
Question 3
Which of the following terms refers to the collection, transformation, and organization of data in order to draw
conclusions, make predictions, and drive informed decision-making?
1 / 1 point
Data insight
Data analysis
Data elements
Correct
Data analysis refers to the collection, transformation, and organization of data in order to draw conclusions,
make predictions, and drive informed decision-making.
4.
Question 4
An airline collects, observes, and analyzes its customers' online behaviors. Then, it uses the insights gained to
choose what new products and services to offer. What business process does this describe?
1 / 1 point
Performance measurement
Data-driven decision-making
Analytical thinking
Correct
An airline collecting, observing, and analyzing its customers' online behaviors, then using the insights gained to
choose what new products and services to offer, describes data-driven decision making. Data-driven decision-
making is using facts to guide business strategy.
1.
Question 1
The collection, transformation, and organization of data in order to draw conclusions, make predictions, and
drive informed decision-making describes what?
1 / 1 point
Data science
Data analysis
Data ecosystem
Correct
2.
Question 2
Which of the following could be elements of a data ecosystem? Select all that apply
1 / 1 point
Producing data
Correct
Gaining insights
Managing data
Correct
Sharing data
Correct
3.
Question 3
A data scientist is someone who does what?
1 / 1 point
4.
Question 4
What tactics can a data analyst use to effectively blend gut instinct with facts? Select all that apply.
1 / 1 point
Ask how to define success for a project, but rely most heavily on their own personal perspective.
Focus on intuition to choose which data to collect and how to analyze it.
Use their knowledge of how their company works to better understand a business need.
Correct
Apply their unique past experiences to their current work, while keeping in mind the story the data is
telling.
Correct
5.
Question 5
Sharing your results with subject matter experts and gathering and analyzing data are carried out in data driven-
decision-making. What else is included in this process?
1 / 1 point
Identification of trends
6.
Question 6
You have just received the results of your latest analysis about the effectiveness of your firm’s recent marketing
campaign. However, because you want to follow data-driven decision-making you share your results with
colleagues from the marketing department for their validation. In this role, these colleague’s are acting as what?
1 / 1 point
customers
stakeholders
subject-matter experts
competitors
Correct
7.
Question 7
Consulting with experts in the marketing department about your marketing analysis is an example of what
process?
1 / 1 point
Data analytics
Data management
Data-driven decision-making
Data science
Correct
Analytical skills are qualities and characteristics associated with solving problems using facts.
They are curiosity, understanding context, having technical mindset, data design, and data strategy.
Curiosity is all about wanting to learn something. Curious people usually seek out new challenges and
experiences.
Context is the condition in which something exists or happens. This can be a structure or an
environment.
A technical mindset involves the ability to break things down into smaller steps or pieces and work
with them in an orderly and logical way.
Data design is how you organize information.
Data strategy is the management of the people, processes, and tools used in data analysis.
1.
Question 1
This practice quiz will help you get a read on the analytical skills
you already have.
Identify the pattern from left to right in the set of blocks below and try to predict which block should replace the
block with the question mark.
1 / 1 point
Correct
This is the missing block. The pattern of the dots increases by one in each block. Therefore, the best answer has
five dots.
2.
Question 2
Here's a more complex pattern. Identify the pattern from left to right in the images below and try to predict
which image should come next.
1st pattern: Octagon with 7 dots 2nd pattern: Heptagon with 6 dots
3rd pattern: Hexagon with 5 dots 4th pattern: Pentagon with 4 dots 5th pattern: Square with 3 dots 6th pattern:
Question mark
Based on the images above, which option comes next in the pattern?
1 / 1 point
Correct
This is the next image in the sequence based on two patterns present in the series: the number of sides and the
number of dots. Moving from left to right, both decrease by one. Given these patterns, if the previous block
contained a shape with four sides and three dots, then the next shape should have three sides and two dots.
3.
Question 3
Now, find a pattern in a different format. Select the next number in the sequence:
10
55
25
33
Correct
The correct answer is 33. The pattern of numbers are all increasing, and the difference between each number is
4.
4.
Question 4
The following numbers are in a sequence from left to right. Determine the pattern and decide which number
should come next:
30
64
62
81
Correct
The next number in the series is 64. There are two patterns in the sequence. One is that each number is squared
and then the number being squared is increased by one (e.g., 2², 3², 4², 5², 6², 7²). The second pattern is in the
difference between the numbers in the sequence: 9 - 4 = 5, 16 - 9 = 7, 25 - 16 = 9, and so on.
5.
Question 5
The following question is about recognizing and matching patterns in shapes that are the same, but viewed from
different angles.
Two shapes are similar when one can become the other after a rotation clockwise ⟳ or counterclockwise ↺, or a
reflection horizontally ↔ and/or vertically ↕.
Your task is to choose the figure that completes the statement. Pay attention to the pattern by which the first
and second shapes are related, and then figure out which choice matches shape 3. Fill in the blank:
Correct
This image completes the statement. The first image in the statement is reflected in the second image. To
complete the analogy, the answer would be an image that is a side-by-side reflection of the third image.
6.
Question 6
The following question is about recognizing and matching patterns in shapes that are the same, but viewed from
different angles. Two shapes are similar when one can become the other after a rotation clockwise ⟳ or
counterclockwise ↺, or a reflection horizontally ↔ and/or vertically ↕.
Your task is to choose the figure that completes the statement. Fill in the blank:
Which image completes it?
1 / 1 point
Correct
Since the pattern in the first image was rotated 90 degrees counter-clockwise, this image completes the
statement.
7.
Question 7
The following series of codes are in a sequence from left to right. There is a repeating pattern that you will
notice. Determine the pattern and decide which code should come next.
Fill in the blank: A1, B3, C5, D7, E9, F11, G13, _____
1 / 1 point
H15
J15
D17
H16
Correct
The patterns of this series are the letters listed alphabetically and the numbers increasing by two with each new
set. Therefore, following that pattern, the next code would be H15.
8.
Question 8
The following series of codes are in a sequence from left to right. There is a repeating pattern that you will
notice. Determine the pattern and decide which sequence of letters should come next.
Fill in the blank: A, AA, AAA, B, BA, BAA, BAAA, BB, BBA, BBAA, BBAAA, BBB, ________
1 / 1 point
BBAAA
BBAA
BBBA
BBBB
Correct
The pattern in this sequence follows the letter A. A is added until there are three As, which is when the letter B
takes the place of the previous As, and the pattern continues. Therefore, BBBA is next in the series.
9.
Question 9
Now, identify patterns in a word problem using a data visualization. There are 12 chocolates in a box: eight have
caramel filling, six have coconut filling, and two have both caramel and coconut filling. Choose the best image
that describes this box of chocolates.
1 / 1 point
Correct
This diagram depicts six chocolates with caramel filling only, four chocolates with coconut filling only, two
chocolates with both caramel and coconut filling, and the total number of chocolates is 12.
10.
Question 10
There are 10 children in a class and they have all brought sandwiches for lunch: five children have sandwiches
with peanut butter, six children have sandwiches with jelly, and three children have sandwiches with both
peanut butter and jelly.
Find out how many children have sandwiches with neither peanut butter nor jelly and choose the image that
describes the situation best.
1 / 1 point
Correct
In this diagram, there are six sandwiches with jelly, five sandwiches with peanut butter, and three sandwiches
with both. This means that there are (5 + 6 - 3 = 8 ) eight sandwiches with either peanut butter or jelly. There are
a total of 10 children. Consider: 10 - 8 = 2. This means two children have neither peanut butter nor jelly in their
sandwiches.
Description
The analytical skill that involves breaking processes down into smaller steps and working with them in an orderly,
logical way
Skill
A technical mindset
Description
The qualities and characteristics associated with solving problems using facts
Skill
Analytical skills
Description
The analytical skill that involves how you organize information
Skill
Data design
Description
The analytical skill that has to do with how you group things into categories
Skill
Understanding context
Description
The analytical skill that involves managing the processes and tools used in data analysis
Skill
Data strategy
Fill in the blank: Data visualization involves using _____ to represent and present data. Select all that apply.
charts
Correct
Data visualization involves using graphs, maps, and charts to represent and present data.
maps
Correct
Data visualization involves using graphs, maps, and charts to represent and present data.
reports
graphs
Correct
Data visualization involves using graphs, maps, and charts to represent and present data.
Question 1
What practice involves identifying, defining, and solving a problem by using data in an organized, step-by-step manner?
1 / 1 point
Data design
Analytical thinking
Visualization
Context
Correct
Analytical thinking involves identifying and defining a problem, then solving it by using data in an organized, step-by-step
manner.
2.
Question 2
Which of the following are examples of data visualizations? Select all that apply.
1 / 1 point
Maps
Correct
Graphs, maps, and charts are used in data visualization.
Reports
Charts
Correct
Graphs, maps, and charts are used in data visualization.
Graphs
Correct
Graphs, maps, and charts are used in data visualization.
3.
Question 3
Gap analysis is used to examine and evaluate how a process currently works with the goal of getting to where you want to be
in the future.
1 / 1 point
True False
Correct
Gap analysis is used to examine and evaluate how a process currently works with the goal of getting to where you want to be
in the future.
4.
Question 4
Which aspect of analytical thinking involves being able to identify a relationship between two or more pieces of data?
1 / 1 point
Data design
Context
Correlation
Visualization
Correct
Correlation involves being able to identify a relationship between two or more pieces of data. A correlation is like a
relationship.
.
Question 1
Fill in the blank: The analytical skill of ______ involves seeking out new experiences in order to gain knowledge.
1 / 1 point
curiosity
data strategy
understanding context
Correct
2.
Question 2
Adding descriptive headers to columns of data in a spreadsheet is an example of which analytical skill?
1 / 1 point
Understanding context
Data strategy
Curiosity
Correct
3.
Question 3
Fill in the blank: A data analyst with a technical mindset would break things down into smaller steps or pieces and work with
them in an orderly and ______ way.
1 / 1 point
curious
creative
logical
clever
Correct
4.
Question 4
As a recently promoted data scientist one of your responsibilities is the implementation of data strategy. What would this
responsibility include?
0 / 1 point
Evaluating how a process works currently in order to get where you want to be in the future
Breaking things down into smaller steps or pieces and working with them in an orderly and logical way
5.
Question 5
Identifying a relationship between two or more pieces of data is known as what?
1 / 1 point
problem-orientation
detail-oriented thinking
visualization
correlation
Correct
6.
Question 6
Fill in the blank: In order to get to the root cause of a problem, a data analyst should ask “Why?” ________ times.
1 / 1 point
seven
three
five
four
Correct
7.
Question 7
An airport wants to make its luggage-handling process faster and simpler for travelers. A data analyst examines and
evaluates how the process works currently in order to achieve the goal of a more efficient process. What methodology do
they use?
1 / 1 point
Gap analysis
Data visualization
Strategy
Correct
8.
Question 8
Data analysts following data-driven decision-making use the analytical skills of curiosity, having a technical mindset, and
data design. What other two analytical skills would they employ? Select all that apply.
1 / 1 point
knowledge
data strategy
Correct
understanding context
Correct
efficiency
Phase 1
Ask: Define the problem and confirm stakeholder expectations
Phase 2
Prepare: Collect and store data for analysis
Phase 3
Process: Clean and transform data to ensure integrity
Phase 4
Analyze: Use data analysis tools to draw conclusions
Phase 5
Share: Interpret and communicate results to others to make data-driven decisions
Phase 6
Act: Put your insights to work in order to solve the original problem
1. Plan: Decide what kind of data is needed, how it will be managed, and who will be responsible for
it.
2. Capture: Collect or bring in data from a variety of different sources.
3. Manage: Care for and maintain the data. This includes determining how and where it is stored and
the tools used to do so.
4. Analyze: Use the data to solve problems, make decisions, and support business goals.
5. Archive: Keep relevant data stored for long-term and future reference.
6. Destroy: Remove data from storage and delete any shared copies of the data.
Warning: Be careful not to mix up or confuse the six stages of the data life cycle (Plan, Capture, Manage,
Analyze, Archive, and Destroy) with the six phases of the data analysis life cycle (Ask, Prepare,
Process, Analyze, Share, and Act). They shouldn't be used or referred to interchangeably.
The data life cycle provides a generic or common framework for how data is managed. You may recall that
variations of the data analysis life cycle were described in Origins of the data analysis process. The same
can be done for the data life cycle. The rest of this reading provides a glimpse of how government, finance,
and education institutions can view data life cycles a little differently.
1. Plan
2. Acquire
3. Maintain
4. Access
5. Evaluate
6. Archive
For more information, refer to U.S. Fish and Wildlife's Data Management Life Cycle page.
1. Plan
2. Acquire
3. Process
4. Analyze
5. Preserve
6. Publish/Share
Several cross-cutting or overarching activities are also performed during each stage of their life cycle:
Financial institutions
Financial institutions may take a slightly different approach to the data life cycle as described in The Data
Life Cycle, an article in Strategic Finance magazine:
1. Capture
2. Qualify
3. Transform
4. Utilize
5. Report
6. Archive
7. Purge
Harvard Business School (HBS)
One final data life cycle informed by Harvard University research has eight stages:
1. Generation
2. Collection
3. Processing
4. Storage
5. Management
6. Analysis
7. Visualization
8. Interpretation
For more information, refer to 8 Steps in the Data Life Cycle.
Key takeaway
Understanding the importance of the data life cycle will set you up for success as a data analyst. Individual
stages in the data life cycle will vary from company to company or by industry or sector. Historical data is
important to both the U.S. Fish and Wildlife Service and the USGS, so their data life cycle focuses on
archiving and backing up data. Harvard's interests are in research and teaching, so its data life cycle
includes visualization and interpretation even though these are more often associated with a data analysis
life cycle. The HBS data life cycle also doesn't call out a stage for purging or destroying data. In contrast,
the data life cycle for finance clearly identifies archive and purge stages. To sum it up, although data life
cycles vary, one data management principle is universal. Govern how data is handled so that it is accurate,
secure, and available to meet your organization's needs.
Fill in the blank: During the _____ phase of the data life cycle, a business decides what kind of data it needs, how
it will be managed, who will be responsible for it, and the optimal outcomes.
1 / 1 point
capture
manage
archive
planning
Correct
During the planning phase of the data life cycle, a business decides what kind of data it needs, how it will be
managed, who will be responsible for it, and the optimal outcomes.
2.
Question 2
In the data life cycle, which phase involves gathering data from various sources and bringing it into the
organization?
1 / 1 point
Archive
Analyze
Capture
Manage
Correct
The capture phase involves gathering data from various sources and bringing it into the organization.
3.
Question 3
A data analyst finishes using a dataset, so they erase or shred the files in order to protect private information.
This is called archiving.
0 / 1 point
True False
Incorrect
Erasing or shredding files describes the destroy phase of the data life cycle. Archiving involves storing files in a
place where it's still available.
4.
Question 4
A dairy farmer decides to open an ice cream shop on her farm. After surveying the local community about
people’s favorite flavors, she takes the data they provided and stores it in a secure hard drive so it can be
maintained safely on her computer. This is part of which phase of the data life cycle?
1 / 1 point
Analyze
Manage
Plan
Archive
Correct
This is the manage phase of the data life cycle. It deals with how data is cared for, how and where it’s stored, the
tools used to keep it safe and secure, and the actions taken to make sure it’s maintained properly.
5.
Question 5
After opening the ice cream shop on her farm, the same dairy farmer then surveys the local community about
people’s favorite flavors. She uses the data she collected to determine that the top five flavors are strawberry,
vanilla, chocolate, mint chip, and peanut butter. She feels confident in her decision to sell these flavors. This is
part of which phase of the data life cycle?
1 / 1 point
Capture
Plan
Analyze
Archive
Correct
This is part of the analyze phase. This phase involves using data to make smart decisions and support business
goals.
Key data analyst tools
As you are learning, the most common programs and solutions used by data analysts include
spreadsheets, query languages, and visualization tools. In this reading, you will learn more about each one.
You will cover when to use them, and why they are so important in data analytics.
Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet applications you
will probably use a lot in your future role as a data analyst are Microsoft Excel and Google Sheets.
Query languages
Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and more. Two popular
visualization tools are Tableau and Looker.
These tools
worksheets
- Looker communicates directly with a database, allowing you to connect your data right to the visual
tool you choose
A career as a data analyst also involves using programming languages, like R and Python, which are used a
lot for statistical analysis, visualization, and other data analysis.
Key takeaway
You have a lot of tools as a data analyst. This is a first glance at the possibilities, and you will explore many
of these tools in-depth throughout this program.
Depending on which phase of the data analysis process you’re in, you will need to use different tools. For
example, if you are focusing on creating complex and eye-catching visualizations, then the visualization
tools we discussed earlier are the best choice. But if you are focusing on organizing, cleaning, and
analyzing data, then you will probably be choosing between spreadsheets and databases using queries.
Spreadsheets and databases both offer ways to store, manage, and use data. The basic content for both
tools are sets of values. Yet, there are some key differences, too:
Spreadsheets Databases
Software applications Data stores - accessed using a query language (e.g. SQL)
Structure data in a row and column format Structure data using rules and relationships
Organize information in cells Organize information in complex collections
Provide access to a limited amount of data Provide access to huge amounts of data
Manual data entry Strict and consistent data entry
Generally, one user at a time Multiple users
Controlled by the user Controlled by a database management system
You don’t have to choose one or the other because each serves its own purpose. Generally, data analysts
work with a combination of the two, as both tools are very useful in data analytics. For example, you can
store data in a database, then export it to a spreadsheet for analysis. Or, if you are collecting information in
a spreadsheet, and it becomes too much for that particular platform, you can import it into a database.
And, later in this course, you will learn about programming languages like R that give you even greater
control of your data, its analysis, and the visualizations you create.
You are in the plan stage of the data lifecycle for your current project. What action might you take during this stage?
1 / 1 point
Decide what kind of data is needed. Shred paper files. Validate insights provided by analysts. Use
a formula to perform calculations.
Correct
2.
Question 2
Fill in the blank: Shredding paper files and using data-erasure software would be actions taken by a data analyst in the
_________ stage of the data lifecycle.
1 / 1 point
3.
Question 3
A data analyst uses a spreadsheet function to aggregate data. Then, they add a pivot table to show totals from least to
greatest. This would happen during which phase of the data life cycle?
1 / 1 point
4.
Question 4
Fill in the blank: Data analysis has six process steps whereas the data life cycle has six _____.
1 / 1 point
5.
Question 5
A company takes insights provided by its data analytics team, validates them, and finalizes a strategy. They then
implement a plan to solve the original business problem. This describes which step of the data analysis process?
1 / 1 point
6.
Question 6
In data analysis, a function is a preset command whereas a formula is a set of instructions used to carry out a specific
calculation.
1 / 1 point
True False
Correct
7.
Question 7
In the course of their current project, a data analyst uses a query to retrieve and request information. Which of the
following are options the analyst can use a query for? Select all that apply.
0.25 / 1 point
Visualizing data
This should not be selected
Review the video on the data analyst’s toolkit.
Deleting data
Collecting data
This should not be selected
Review the video on the data analyst’s toolkit.
Updating data
Correct
8.
Question 8
A data analyst wants to retrieve information from a database. Select the correct tool from the data analyst’s toolkit.
1 / 1 point
Fill in the blank: A data analyst uses a SQL query to retrieve information from a
database. They add a WHERE statement to _____ the data based on certain conditions.
filter sort categorize copy
Correct
They add a WHERE statement to filter the data based on certain conditions.
SQL Guide: Getting started
Just as humans use different languages to communicate with others, so do computers.
Structured Query Language (or SQL, often pronounced “sequel”) enables data analysts to
talk to their databases. SQL is one of the most useful data analyst tools, especially when
working with large datasets in tables. It can help you investigate huge databases, track down
text (referred to as strings) and numbers, and filter for the exact kind of data you need—much
faster than a spreadsheet can.
If you haven’t used SQL before, this reading will help you learn the basics so you can
appreciate how useful SQL is and how useful SQL queries are in particular. You will be writing
SQL queries in no time at all.
What is a query?
A query is a request for data or information from a database. When you query databases, you
use SQL to communicate your question or request. You and the database can always
exchange information as long as you speak the same language.
Every programming language, including SQL, follows a unique set of guidelines known as
syntax. Syntax is the predetermined structure of a language that includes all required words,
symbols, and punctuation, as well as their proper placement. As soon as you enter your
search criteria using the correct syntax, the query starts working to pull the data you’ve
requested from the target database.
Next, enter the table name after the FROM; the table
columns you want after the SELECT; and, finally, the
conditions you want to place on your query after the
WHERE. Make sure to add a new line and indent when
adding these, as shown below:
first_name
Tony
Tony
Tony
Following this method each time makes it easier to write
SQL queries. It can also help you make fewer syntax errors.
Example of a query
Here is how a simple query would appear in BigQuery, a data warehouse on the Google Cloud
Platform.
The above query uses three commands to locate customers with the first name Tony:
As you can conclude, this query had the correct syntax, but wasn't very useful after the data
was returned.
If you are requesting multiple data fields from a table, you need to include these columns in
your SELECT command. Each column is separated by a comma as shown below:
Here is an example of how it would appear in BigQuery:
The above query uses three commands to locate customers with the first name Tony.
Notice that unlike the SELECT command that uses a comma to separate
fields/variables/parameters, the WHERE command uses the AND statement to connect
conditions. As you become a more advanced writer of queries, you will make use of other
connectors/operators such as OR and NOT.
Here is a BigQuery example with multiple fields used in a WHERE clause:
The above query uses three commands to locate customers with a valid (greater than 0)
customer ID whose first name is Tony and last name is Magnolia.
If only one customer is named Tony Magnolia, the results from the query could be:
Key takeaway
The most important thing to remember is how to use SELECT, FROM, and WHERE in a query.
Queries with multiple fields will become simpler after you practice writing your own SQL
queries later in the program.
Endless SQL possibilities
You have learned that a SQL query uses SELECT, FROM, and WHERE to specify the data to be returned
from the query. This reading provides more detailed information about formatting queries, using WHERE
conditions, selecting all columns in a table, adding comments, and using aliases. All of these make it easier
for you to understand (and write) queries to put SQL in action. The last section of this reading provides an
example of what a data analyst would do to pull employee data for a project.
Notice that the SQL statement shown above has a semicolon at the end. The semicolon is a statement
terminator and is part of the American National Standards Institute (ANSI) SQL-92 standard, which is a
recommended common syntax for adoption by all SQL databases. However, not all SQL databases have
adopted or enforce the semicolon, so it’s possible you may come across some SQL statements that aren’t
terminated with a semicolon. If a statement works without a semicolon, it’s fine.
WHERE conditions
In the query shown above, the SELECT clause identifies the column you want to pull data from by name,
field1, and the FROM clause identifies the table where the column is located by name, table. Finally, the
WHERE clause narrows your query so that the database returns only the data with an exact value match or
the data that matches a certain condition that you want to satisfy.
For example, if you are looking for a specific customer with the last name Chavez, the WHERE clause would
be:
However, if you are looking for all customers with a last name that begins with the letters “Ch," the WHERE
clause would be:
You can conclude that the LIKE clause is very powerful because it allows you to tell the database to look for
a certain pattern! The percent sign (%) is used as a wildcard to match one or more characters. In the
example above, both Chavez and Chen would be returned. Note that in some databases an asterisk (*) is
used as the wildcard instead of a percent sign (%).
In the example, if you replace SELECT field1 with SELECT * , you would be selecting all of the columns in
the table instead of the field1 column only. From a syntax point of view, it is a correct SQL statement, but
you should use the asterisk (*) sparingly and with caution. Depending on how many columns a table has,
you could be selecting a tremendous amount of data. Selecting too much data can cause a query to run
slowly.
Comments
Some tables aren’t designed with descriptive enough naming conventions. In the example, field1 was the
column for a customer’s last name, but you wouldn’t know it by the name. A better name would have been
something such as last_name. In these cases, you can place comments alongside your SQL to help you
remember what the name represents. Comments are text placed between certain characters, /* and */, or
after two dashes (--) as shown below.
Comments can also be added outside of a statement as well as within a statement. You can use this
flexibility to provide an overall description of what you are going to do, step-by-step notes about how you
achieve it, and why you set different parameters/conditions.
The more comfortable you get with SQL, the easier it will be to read and understand queries at a glance.
Still, it never hurts to have comments in a query to remind yourself of what you’re trying to do. This also
makes it easier for others to understand your query if your query is shared. As your queries become more
and more complex, this practice will save you a lot of time and energy to understand complex queries you
wrote months or years ago.
As you develop your skills professionally, depending on the SQL database you use, you can pick the
appropriate comment delimiting symbols you prefer and stick with those as a consistent style. As your
queries become more and more complex, the practice of adding helpful comments will save you a lot of
time and energy to understand queries that you may have written months or years prior.
Aliases
You can also make it easier on yourself by assigning a new name or alias to the column or table names to
make them easier to work with (and avoid the need for comments). This is done with a SQL AS clause. In
the example below, the alias last_name has been assigned to field1 and the alias customers assigned to
table. These aliases are good for the duration of the query only. An alias doesn’t change the actual name of
a column or table in the database.
You want to pull all the columns: empID, firstName, lastName, jobCode, and salary. Because you know
the database isn’t that big, instead of entering each column name in the SELECT clause, you use SELECT
*. This will select all the columns from the Employee table in the FROM clause.
Now, you can get more specific about the data you want from the Employee table. If you want all the data
about employees working in the SFI job code, you can use a WHERE clause to filter out the data based on
this additional requirement.
A portion of the resulting data returned from the SQL query might look like
this:
You create a SQL query similar to below, where <> means "does not equal":
The resulting data from the SQL query might look like the following (interns with the job code INT aren't
returned):
Pulling the data, analyzing it, and implementing a solution might ultimately help improve employee
satisfaction and loyalty. That makes SQL a pretty powerful tool.
This is Part 1 to a series of PostgreSQL cheat sheets and will cover SELECT, FROM, WHERE, GROUP
The basic structure of a query pulling results from a single table is as follows.
SELECT
COLUMN_NAME(S)
FROM
TABLE_NAME
WHERE
CONDITION
GROUP BY
COLUMN_NAME(S)
HAVING
AGGREGATE_CONDITION
ORDER BY
COLUMN_NAME
LIMIT
N
What is SQL?
SQL (pronounced “ess-que-el”) stands for Structured Query Language. SQL is used to
communicate with a database. It is the standard language for relational database management
systems. SQL statements are used to perform tasks such as update data on a database or retrieve
An RDBMS organizes data into tables with rows and columns. The term relational means that values within each table
• Columns — also known as fields, have a descriptive name and specific data type.
What is PostgreSQL?
PostgreSQL is a general-purpose and relational database management system, the most advanced open-source
database system.
Other common database management systems are MySQL, Oracle, IBM Db2, and MS Access.
Let’s begin!
SELECT
The SELECT statement is used to select data from a database. The data returned is stored in a
result table, called the result-set.
Specific columns
SELECT
COLUMN_1,
COLUMN_2
FROM
TABLE_NAME
All columns
Using the * you can query every column in your table
SELECT *
FROM
TABLE_NAME
DISTINCT Columns
Finding all the unique records in a column
SELECT
DISTINCT(COLUMN_NAME)
FROM
TABLE_NAME
WHERE
Using the WHERE the clause, you can create conditions to filter out values you want or don't want.
Conditions
There are a variety of conditions that can be used in SQL. Below are some examples of a table
that consists of students’ grades in school. You only need to specify WHERE once, for the sake of the
example, I have included WHERE in each step.
WHERE FIRSTNAME = 'BOB' -- exact match
WHERE FIRSTNAME != 'BOB' -- everything excluding BOB
WHERE NOT FIRSTNAME ='BOB' -- everything excluding BOBWHERE FIRSTNAME IN ('BOB',
'JASON') -- either condition is met
WHERE FIRSTNAME NOT IN ('BOB', 'JASON') -- excludes both valuesWHERE FIRSTNAME = 'BOB'
AND LASTNAME = 'SMITH' -- both conditions
WHERE FIRSTNAME = 'BOB' OR FIRSTNAME = 'JASON' -- either conditionWHERE GRADES > 90
-- greater than 90
WHERE GRADES < 90 -- less than 90
WHERE GRADES >= 90 -- greater than or equal to 90
WHERE GRADES <= 90 -- less than or equal to 90WHERE SUBJECT IS NULL --
returns values with missing values
WHERE SUBJECT NOT NULL -- returns values with no missing values
Conditions — Wildcards
LIKE operator is used in a WHERE clause to search for a specified pattern in a column. When you
pass the LIKE operator in the '' upper and lower case matters.
There are two wildcards often used in conjunction with the LIKE operator:
GROUP BY
The GROUP BY function helps calculate summary values by the chosen column. It is often used
with aggregate functions (COUNT, SUM, AVG, MAX, MIN).
SELECT
SUBJECT,
AVG(GRADES)
FROM
STUDENTS
GROUP BY
SUBJECT
The query above will group each subject and calculate the average grades.
SELECT
SUBJECT,
COUNT(*)
FROM
STUDENTS
GROUP BY
SUBJECT
The above query will calculate the number (count) of students in each subject.
HAVING
The HAVING clause is similar to WHERE but is catered for filtering aggregate functions.
The HAVING function comes after the GROUP BY, in comparison the WHERE comes before the GROUP
BY.
If we wanted to find which subject had an average grade of 90 or more, we could use the
following.
SELECT
SUBJECT,
AVG(GRADES)
FROM
STUDENTS
GROUP BY
SUBJECT
HAVING
AVG(GRADES) >= 90
ORDER BY
Using the ORDER BY function, you can specify how you want your values sorted. Continuing with
the Student tables from earlier.
SELECT
*
FROM
STUDENTS
ORDER BY
GRADES DESC
When using the ORDER BY by default, the sort will be in ascending order. If you want to descend,
you need to specify DESC after the column name.
LIMIT
In Postgres, we can use the LIMIT function to control how many rows are outputted in the query.
For example, if we wanted to find the top 3 students with the highest grades.
SELECT
*
FROM
STUDENTS
ORDER BY
GRADES DESC
LIMIT
3
Since we use ORDER BY DESC we have the order of students with the highest grades on top - now
limiting it to 3 values, we see the top 3.
1.
Question 1
Select all columns from the employee table Select the LastName column from the employee table
Select all data that meets the criteria as stated in the query
Select all data that meets the criteria as stated in the query, then multiply it
Correct
SELECT * tells the database to select all columns from the employee table. The criteria in the WHERE clause tells
the database what data in those columns the query should return.
2.
Question 2
3.
Question 3
All data from the FTE table, where the employee's LastName is James.
All data from the jobCode table, where the jobCode is FTE and the employee has any last name other than
James.
All data from the employee table, where the jobCode is FTE and the employee has any last name other than
James.
All data from the employee table, where the jobCode is FTE and the last name is James.
Correct
This query will select all data from the employee table, where the jobCode is FTE and the last name is James.
4.
Question 4
You are working with a database table that contains data about music artists. The table is named artist. You
want to review all the columns in the table.
You write the SQL query below. Add a FROM clause that will retrieve the data from the artist table.
| artist_id | name |
+-----------+---------------------------------+
| 1 | AC/DC |
| 2 | Accept |
| 3 | Aerosmith |
| 4 | Alanis Morissette |
| 5 | Alice In Chains |
| 7 | Apocalyptica |
| 8 | Audioslave |
| 9 | BackBeat |
| 10 | Billy Cobham |
| 12 | Black Sabbath |
| 13 | Body Count |
| 14 | Bruce Dickinson |
| 15 | Buddy Guy |
| 16 | Caetano Veloso |
| 17 | Chico Buarque |
| 19 | Cidade Negra |
| 20 | Cláudio Zoli |
| 21 | Various Artists |
| 22 | Led Zeppelin |
| 24 | Marcos Valle |
+-----------+---------------------------------+
2 9 5 8
Correct
The clause FROM artist will retrieve the data from the artist table. The complete query is SELECT * FROM
artist. The FROM clause specifies which database table to select data from. There are two columns in the artist
table.
5.
Question 5
You are working with a database table that contains data about music albums. You are only interested in data
related to the album with ID number 277. The album IDs are listed in the album_id column from the album
table.You write the SQL query below. Add a WHERE clause that will return only data about the album with ID
number 277.
RunReset
+----------+---------------------------+-----------+
+----------+---------------------------+-----------+
+----------+---------------------------+-----------+
Vivaldi: The Four Seasons Beethoven: Piano Sonatas Bach: Goldberg Variations Mozart:
Chamber Music
Correct
The clause WHERE album_id = 277 will return only data about the album with ID number 277. The
complete query is SELECT * FROM album WHERE album_id = 277. The WHERE clause filters results
that meet certain conditions. The WHERE clause includes the name of the column, an equals sign, and the
value(s) in the column to include. The name of the album with ID number 277 is Bach: Goldberg Variations.
While reviewing the data you notice a pattern among those who visit the company’s website most
frequently: geography and larger amounts spent on purchases. With further analysis, this information
might explain why sales are so strong right now in the northeast—and help your company find ways to
make them even stronger through the new website.
• You can use the visualizations tools in your spreadsheet to create simple visualizations such as line
and bar charts.
• You can use more advanced tools such as Tableau that allow you to integrate data into dashboard-
style visualizations.
• If you’re working with the programming language R you can use the visualization tools in RStudio.
Your choice of visualization will be driven by a variety of drivers including the size of your data, the process
you used for analyzing your data (spreadsheet, or databases/queries, or programming languages). For
now, just consider the basics.
But sometimes you need a more powerful tool to truly bring your data to life. Tableau and RStudio are two
examples of widely used platforms that can help you plan, create, and present effective and compelling
data visualizations.
You can start exploring Tableau from the How-to Video resources. Tableau Public is free, easy to use, and
full of helpful information. The Resources page is a one-stop-shop for how-to videos, examples, and
datasets for you to practice with. To explore what other data analysts are sharing on Tableau, visit the Viz
of the Day page where you will find beautiful visuals ranging from the Hunt for (Habitable) Planets to Who’s
Talking in Popular Films.
You could easily spend days exploring all the resources provided at RStudio.com, but the RStudio
Cheatsheets and the RStudio Visualize Data Primer are great places to start. When you have more time,
check out the webinars and videos which offer advice and helpful perspectives for both beginners and
advanced users.
Key takeaway
The best data analysts use lots of different tools and methods to visualize and share their data. As you
continue learning more about data visualization throughout this course, be sure to stay curious, research
different options, and continuously test new programs and platforms to help you make the most of your
data.
1.
Question 1
In the following spreadsheet, the column labels in row 1 are called what?
- A B C D
1 / 1 point
2.
Question 2
Fill in the blank: In the following spreadsheet, the ________ of High Point describes all of the data in row
10.
- A B C D
1 / 1 point
3.
Question 3
A data analyst wants to list the cities in this spreadsheet alphabetically, instead of numerically. They can
use the feature sort range to do this.
- A B C D
Wake (seat),
3 2 Raleigh 474,069
Durham
Durham (seat),
5 4 Durham 278,993
Wake, Orange
Wake,
8 7 Cary 170,282
Chatham
Guilford,
Randolph,
10 9 High Point 112,791
Davidson,
Forsyth
True False
Correct
4.
Question 4
To find the average population of the cities in this spreadsheet, what is the correct AVERAGE function
syntax? Type your answer below.
- A B C D
1 Rank Name Population County
2 1 Charlotte 885,708 Mecklenburg
3 2 Raleigh 474,069 Wake (seat), Durham
5.
Question 5
You are working with a database table named genre that contains data about music genres. You want to
review all the columns in the table.
You write the SQL query below. Add a FROM clause that will retrieve the data from the genre table.
1 SELECT*
2 FROM genre
RunReset
+----------+--------------------+
| genre_id | name |
+----------+--------------------+
| 1 | Rock |
| 2 | Jazz |
| 3 | Metal |
| 6 | Blues |
| 7 | Latin |
| 8 | Reggae |
| 9 | Pop |
| 10 | Soundtrack |
| 11 | Bossa Nova |
| 12 | Easy Listening |
| 13 | Heavy Metal |
| 14 | R&B/Soul |
| 15 | Electronica/Dance |
| 16 | World |
| 17 | Hip Hop/Rap |
| 18 | Science Fiction |
| 19 | TV Shows |
| 21 | Drama |
| 22 | Comedy |
| 23 | Alternative |
| 24 | Classical |
| 25 | Opera |
+----------+--------------------+
6.
Question 6
You are working with a database table that contains invoice data. The customer_id column lists the ID
number for each customer. You are interested in invoice data for the customer with ID number 7.
You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID
number 7.
1 SELECT
2*
3 FROM
4 invoice where customer_id=7
5
+------------+-------------+---------------------+--------------------------------------+--------
------+---------------+-----------------+---------------------+-------+
+------------+-------------+---------------------+--------------------------------------+--------
------+---------------+-----------------+---------------------+-------+
+------------+-------------+---------------------+--------------------------------------+--------
------+---------------+-----------------+---------------------+-------+
After you run your query, use the slider to view all the data presented.
7.
Question 7
Which of the following best describes a bar chart?
1 / 1 point
It is a visualization that uses a circle which is divided into wedges sized based on numerical
proportion.
It is a visualization that represents data with columns, or bars, the heights of which are proportional to
the values that they represent.
It is a visualization that plots a sequence of points and connects them with them with straight lines or
curves.
8.
Question 8
A data analyst has to demonstrate how the population in Charlotte has increased over time. They create
the chart below. What is this type of chart called?
1 / 1 point
Area chart Column chart Bar chart Line chart
Correct
Data analyst roles and job descriptions
As technology continues to advance, being able to collect and analyze the data from that new technology
has become a huge competitive advantage for a lot of businesses. Everything from websites to social
media feeds are filled with fascinating data that, when analyzed and used correctly, can help inform
business decisions. A company’s ability to thrive now often depends on how well it can leverage data,
apply analytics, and implement new technologies.
This is why skilled data analysts are some of the most sought-after professionals in the world. A study
conducted by IBM estimates that there are over 380,000 job openings in the Data Analytics field in the
United States*. Because the demand is so strong, you’ll be able to find job opportunities in virtually any
industry. Do a quick search on any major job site and you’ll notice that every type of business from zoos, to
health clinics, to banks are seeking talented data professionals. Even if the job title doesn’t use the exact
term “data analyst,” the job description for most roles involving data analysis will likely include a lot of the
skills and qualifications you’ll gain by the end of this program. In this reading, we’ll explore some of the
data analyst-related roles you might find in different companies and industries.
To name a few others that sound similar but may not be the same role:
• Business analyst — analyzes data to help businesses improve processes, products, or services
• Data analytics consultant — analyzes the systems and models for using data
• Data engineer — prepares and integrates data from different sources for analytical use
• Data scientist — uses expert skills in technology and social science to find trends through data
analysis
• Data specialist — organizes or converts data for use in databases or software systems
• Operations analyst — analyzes data to assess the performance of business operations and
workflows
Data analysts, data scientists, and data specialists sound very similar but focus on different tasks. As you
start to browse job listings online, you might notice that companies’ job descriptions seem to combine
these roles or look for candidates who may have overlapping skills. The fact that companies often blur the
lines between them means that you should take special care when reading the job descriptions and the
skills required.
The table below illustrates some of the overlap and distinctions between them:
Title: Decoding the job description data analysts: -problem solving: Use existing tools and methods
to solve problems with existing types of data -analysis: Analyze collected data to help stakeholders
make better decisions -other relevant skills: database queries, data visualization, dashboards,
reports and spreadsheets data scientists: -problem solving: Invent new tools and models, ask open-
ended questions, and collect new types of data -analysis: Analyze and interpret complex data to
make business predictions -other relevant skills: advanced statistics, machine learning, deep
learning, data optimization, and programming data specialists: -problem solving: Use in-depth
knowledge of databases as a tool to solve problems and manage data -analysis: Organize large
volumes of data for use in data analytics or business operations -other relevant skills: data
manipulation, information security, data models, scalability of data, and disaster recovery
We used the role of data specialist as one example of many specializations within data analytics, but you
don’t have to become a data specialist! Specializations can take a number of different turns. For example,
you could specialize in developing data visualizations and likewise go very deep into that area.
Other industry-specific specialist positions that you might come across in your data analyst job search
include:
• Marketing analyst — analyzes market conditions to assess the potential sales of products and
services
• HR/payroll analyst — analyzes payroll data for inefficiencies and errors
• Financial analyst — analyzes financial status by collecting, monitoring, and reviewing data
• Risk analyst — analyzes financial documents, economic conditions, and client data to help
companies determine the level of risk involved in making a particular business decision
• Healthcare analyst — analyzes medical data to improve the business aspect of hospitals and
medical facilities
Key takeaway
Explore data analyst job descriptions and industry-specific analyst roles. You will start to get a better sense
of the different data analyst jobs out there and which types of roles you’re most interested to go after.