0% found this document useful (0 votes)
11 views

Data Analytics Notes (Autorecovered)

The document outlines a data analytics course that covers foundational concepts, analytical thinking, and the data analysis process, including tools and career opportunities for data analysts. It emphasizes the importance of data-driven decision-making and ethical data handling, illustrated through a case study on improving employee retention using data analysis. Additionally, it highlights the risks of relying solely on gut instinct in decision-making, advocating for a structured approach to data analysis.

Uploaded by

rudy111995
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Data Analytics Notes (Autorecovered)

The document outlines a data analytics course that covers foundational concepts, analytical thinking, and the data analysis process, including tools and career opportunities for data analysts. It emphasizes the importance of data-driven decision-making and ethical data handling, illustrated through a case study on improving employee retention using data analysis. Additionally, it highlights the risks of relying solely on gut instinct in decision-making, advocating for a structured approach to data analysis.

Uploaded by

rudy111995
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

DATA ANALYTICS NOTES

Course content
Course 1– Foundations: Data, Data, Everywhere

1. Introducing data analytics: Data helps us make decisions, in everyday life and in
business. In this first part of the course, you will learn how data analysts use tools of
their trade to inform those decisions. You will also get to know more about this course
and the overall program expectations.
2. Thinking analytically: Data analysts balance many different roles in their work. In this
part of the course, you will learn about some of these roles and the key skills that are
required. You will also explore analytical thinking and how it relates to data-driven
decision making.
3. Exploring the wonderful world of data: Data has its own life cycle, and data analysts
use an analysis process that cuts across and leverages this life cycle. In this part of the
course, you will learn about the data life cycle and data analysis process. They are both
relevant to your work in this program and on the job as a future data analyst. You will be
introduced to applications that help guide data through the data analysis process.
4. Setting up a data toolbox: Spreadsheets, query languages, and data visualization tools
are all a big part of a data analyst’s job. In this part of the course, you will learn the basic
concepts to use them for data analysis. You will understand how they work through
examples provided.
5. Discovering data career possibilities: All kinds of businesses value the work that data
analysts do. In this part of the course, you will examine different types of businesses and
the jobs and tasks that analysts do for them. You will also learn how a Google Data
Analytics Certificate will help you meet many of the requirements for a position with
these organizations.
6. Completing the Course Challenge: At the end of this course, you will be able to put
everything you have learned into perspective with the Course Challenge. The Course
Challenge will ask you questions about the main concepts you have learned and then
give you an opportunity to apply those concepts in two scenarios.
1.
Question 1
Optional speed track for those experienced in data analytics

A clothing retailer collects and stores data about its sales revenue. Which of the following would be part of its
data ecosystem? Select all that apply.
1 / 1 point

The database of sales revenue


Correct
The clothing retailer’s data ecosystem would include the database of sales revenue, the cloud that stores the
database, and records of its inventory. A data ecosystem is the various elements that interact with one another
in order to produce, manage, store, organize, analyze, and share data.

The cloud that store its database


Correct
The clothing retailer’s data ecosystem would include the database of sales revenue, the cloud that stores the
database, and records of its inventory. A data ecosystem is the various elements that interact with one another
in order to produce, manage, store, organize, analyze, and share data.

The databases of competing retailers

Records of its inventory


Correct
The clothing retailer’s data ecosystem would include the database of sales revenue, the cloud that stores the
database, and records of its inventory. A data ecosystem is the various elements that interact with one another
in order to produce, manage, store, organize, analyze, and share data.

2.
Question 2
What is the process of guiding business strategy using facts?
1 / 1 point

Data-driven decision-making

Analytical planning

Identification of data and decisions

Strategic improvement
Correct
Data-driven decision-making is the process of guiding business strategy using facts.

3.
Question 3
Fill in the blank: Curiosity, understanding context, having a technical mindset, data design, and data strategy are
_____. They enable data analysts to solve problems using facts.
1 / 1 point

thought processes

analytical skills
personal insights

business skills
Correct
Curiosity, understanding context, having a technical mindset, data design, and data strategy are analytical skills.
They enable data analysts to solve problems using facts.

4.
Question 4
The owner of a skate shop notices that every time a certain employee has a shift, there are higher sales numbers
at the end of the day. After some investigation, the owner realizes that since the employee was hired, the store
earns 15% more each month. In this scenario, the manager used which quality of analytical thinking?
1 / 1 point

Visualization

Correlation

Problem-orientation

Big-picture thinking
Correct
The owner used correlation, which involves being able to identify a relationship between two or more pieces of
data. They noticed that there is a correlation between the employee’s presence and the skate shop’s traffic and
monthly income.

5.
Question 5
Gap analysis is a process that could help accomplish which of the following tasks? Select all that apply.
1 / 1 point

Reduce a company’s carbon footprint based on its current emissions


Correct
Gap analysis is a method for examining and evaluating how a process works currently in order to get where you
want to be in the future. Improving accessibility, increasing efficiency, and reducing carbon emissions are
examples of improvements that gap analysis can help accomplish.

Improve accessibility for an educational app based on its current functionality


Correct
Gap analysis is a method for examining and evaluating how a process works currently in order to get where you
want to be in the future. Improving accessibility, increasing efficiency, and reducing carbon emissions are
examples of improvements that gap analysis can help accomplish.

Increase the efficiency of a car manufacturer based on its current assembly process
Correct
Gap analysis is a method for examining and evaluating how a process works currently in order to get where you
want to be in the future. Improving accessibility, increasing efficiency, and reducing carbon emissions are
examples of improvements that gap analysis can help accomplish.

Spread awareness about income inequality based on local salaries


6.
Question 6
An advertising firm has used insights from its analytics team to create a strategy for improving sales. Now, they
implement a plan to increase annual revenue. The firm is at which step of the data analysis process?
1 / 1 point

Share

Act

Process

Analyze
Correct
The act phase is when insights are put into action. This involves a company or organization implementing a plan
to solve the original business problem.

7.
Question 7
A data analyst adds descriptive headers to columns of data in a spreadsheet. How does this improve the
spreadsheet?
1 / 1 point

It adds context

It improves the aesthetic appeal

It eliminates unnecessary details

It clarifies the business strategy


Correct
Adding descriptive headers to columns of data in a spreadsheet adds context. Context is the condition in which
something exists, such as a structure.

8.
Question 8
This is a selection from a spreadsheet that ranks the 10 most populous cities in North Carolina. To alphabetize
the county names in column D, which spreadsheet tool would you use?

A B C D
1 Rank Name Population County
2 7 Cary 170,282 Wake, Chatham

3 1 Charlotte 885,708 Mecklenburg


4 10 Concord 96,341 Cabarrus
5 4 Durham 278,993 Durham (seat), Wake, Orange
6 6 Fayetteville 211,657 Cumberland

7 3 Greensboro 296,710 Guilford

8 9 High Point 112,791 Guilford, Randolph, Davidson, Forsyth

9 2 Raleigh 474,069 Wake (seat), Durham


10 8 Wilmington 123,784 New Hanover

11 5 Winston-Salem 247,945 Forsyth


1 / 1 point
Alphabetize range

Name range

Organize range

Sort range
Correct
You can use sort range to alphabetize the county names in column D. Sorting a range of data from A to Z helps
data analysts organize and find data more quickly.

9.
Question 9
You are querying a database of manufacturing company suppliers. The column name for supplier identification
numbers is supplier_id. What is the correct clause to retrieve only data about the supplier with identification
number 85317?
1 / 1 point

WHERE supplier_id = 85317

COLUMN supplier_id = 85317

SELECT supplier_id 85317

FROM supplier_id 85317


Correct
The correct clause is WHERE supplier_id = 85317. This clause tells the database to return only information about
the supplier whose ID is 85317.

10.
Question 10
Imagine you are sharing your data with a company stakeholder. Why might you display data with a data
visualization instead of a table? Select all that apply.
1 / 1 point

It thoroughly describes each data point

It’s easy to understand


Correct
When sharing data with others, you might use a data visualization instead of a table because visualizations are
more aesthetically pleasing, save time when identifying trends, and are easier to understand.

It’s aesthetically pleasing


Correct
When sharing data with others, you might use a data visualization instead of a table because visualizations
are more aesthetically pleasing, save time when identifying trends, and are easier to understand.

It helps them identify trends more quickly


Correct
When sharing data with others, you might use a data visualization instead of a table because visualizations
are more aesthetically pleasing, save time when identifying trends, and are easier to understand.
Fill in the blank: Data is a collection of _____ that can be used to draw conclusions, make predictions,
and assist in decision-making.
concepts
facts
thoughts
ideas
Correct
Data is a collection of facts that can be used to draw conclusions, make predictions, and assist in
decision-making.

The six steps of the data analysis process that you have been learning in this program are:
ask, prepare, process, analyze, share, and act. These six steps apply to any data analysis.
Continue reading to learn how a team of people analysts used these six steps to answer a
business question.

An organization was experiencing a high turnover rate among new hires. Many employees left
the company before the end of their first year on the job. The analysts used the data analysis
process to answer the following question: how can the organization improve the
retention rate for new employees?

First up, the analysts needed to define what the project would look like and what would
qualify as a successful result. So, to determine these things, they asked effective questions
and collaborated with leaders and managers who were interested in the outcome of their
people analysis. These were the kinds of questions they asked:

• What do you think new employees need to learn to be successful in their first year on
the job?
• Have you gathered data from new employees before? If so, may we have access to the
historical data?
• Do you believe managers with higher retention rates offer new employees something
extra or unique?
• What do you suspect is a leading cause of dissatisfaction among new employees?
• By what percentage would you like employee retention to increase in the next fiscal
year?
It all started with solid preparation. The group built a timeline of three months and decided
how they wanted to relay their progress to interested parties. Also during this step, the
analysts identified what data they needed to achieve the successful result they identified in
the previous step - in this case, the analysts chose to gather the data from an online survey of
new employees. These were the things they did to prepare:

• They developed specific questions to ask about employee satisfaction with different
business processes, such as hiring and onboarding, and their overall compensation.
• They established rules for who would have access to the data collected - in this case,
anyone outside the group wouldn't have access to the raw data, but could view
summarized or aggregated data. For example, an individual's compensation wouldn't
be available, but salary ranges for groups of individuals would be viewable.
• They finalized what specific information would be gathered, and how best to present
the data visually. The analysts brainstormed possible project- and data-related issues
and how to avoid them.

The group sent the survey out. Great analysts know how to respect both their data and the
people who provide it. Since employees provided the data, it was important to make sure all
employees gave their consent to participate. The data analysts also made sure employees
understood how their data would be collected, stored, managed, and protected.
Collecting and using data ethically is one of the responsibilities of data analysts. In order to
maintain confidentiality and protect and store the data effectively, these were the steps they
took:

• They restricted access to the data to a limited number of analysts.


• They cleaned the data to make sure it was complete, correct, and relevant. Certain
data was aggregated and summarized without revealing individual responses.
• They uploaded raw data to an internal data warehouse for an additional layer of
security.

Then, the analysts did what they do best: analyze! From the completed surveys, the data
analysts discovered that an employee’s experience with certain processes was a key
indicator of overall job satisfaction. These were their findings:
• Employees who experienced a long and complicated hiring process were most likely
to leave the company.
• Employees who experienced an efficient and transparent evaluation and feedback
process were most likely to remain with the company.
The group knew it was important to document exactly what they found in the analysis, no
matter what the results. To do otherwise would diminish trust in the survey process and
reduce their ability to collect truthful data from employees in the future.

Just as they made sure the data was carefully protected, the analysts were also careful
sharing the report. This is how they shared their findings:

• They shared the report with managers who met or exceeded the minimum number of
direct reports with submitted responses to the survey.
• They presented the results to the managers to make sure they had the full picture.
• They asked the managers to personally deliver the results to their teams.
This process gave managers an opportunity to communicate the results with the right
context. As a result, they could have productive team conversations about next steps to
improve employee engagement.

The last stage of the process for the team of analysts was to work with leaders within their
company and decide how best to implement changes and take actions based on the
findings. These were their recommendations:

• Standardize the hiring and evaluation process for employees based on the most
efficient and transparent practices.
• Conduct the same survey annually and compare results with those from the previous
year.
A year later, the same survey was distributed to employees. Analysts anticipated that a
comparison between the two sets of results would indicate that the action plan worked.
Turns out, the changes improved the retention rate for new employees and the actions taken
by leaders were successful!
Data and gut instinct
Detectives and data analysts have a lot in common. Both depend on facts and clues to make
decisions. Both collect and look at the evidence. Both talk to people who know part of the story. And
both might even follow some footprints to see where they lead. Whether you’re a detective or a data
analyst, your job is all about following steps to collect and understand facts.

Analysts use data-driven decision-making and follow a step-by-step process. You have learned that
there are six steps to this process:

1. Ask questions and define the problem.


2. Prepare data by collecting and storing the information.
3. Process data by cleaning and checking the information.
4. Analyze data to find patterns, relationships, and trends.
5. Share data with your audience.
6. Act on the data and use the analysis results.
But there are other factors that influence the decision-making process. You may have read mysteries
where the detective used their gut instinct, and followed a hunch that helped them solve the case.
Gut instinct is an intuitive understanding of something with little or no explanation. This isn’t always
something conscious; we often pick up on signals without even realizing. You just have a “feeling” it’s
right.

Why gut instinct can be a problem


At the heart of data-driven decision making is data. Therefore, it's essential that data analysts focus
on the data to ensure they make informed decisions. If you ignore data by preferring to make
decisions based on your own experience, your decisions may be biased. But even worse, decisions
based on gut instinct without any data to back them up can cause mistakes.

Consider an example of a restaurant entrepreneur, partnering with a well known chef to develop a
new restaurant in a bustling part of the city’s central shopping district. The well known chef has
several restaurants across the city. Banking on their reputation, the restaurant entrepreneur and chef
followed gut instinct and created another uniquely themed restaurant. However, fundraising efforts
fell short to fund the opening of the restaurant after months of planning and preparation. The
property will go back on the market to be sold at a loss. Had the entrepreneur done more research,
they would've found data showing prospective customers in this new restaurant location were very
different from the chef's other restaurants.

The more you understand the data related to a project, the easier it will be to figure out what is
required. These efforts will also help you identify errors and gaps in your data so you can
communicate your findings more effectively. Sometimes past experience helps you make a
connection that no one else would notice. For example, a detective might be able to crack open a case
because they remember an old case just like the one they’re solving today. It's not just gut instinct.
Data + business knowledge = mystery solved
Blending data with business knowledge, plus maybe a touch of gut instinct, will be a common part of
your process as a junior data analyst. The key is figuring out the exact mix for each particular project.
A lot of times, it will depend on the goals of your analysis. That is why analysts often ask, “How do I
define success for this project?”

In addition, try asking yourself these questions about a project to help find the perfect balance:

• What kind of results are needed?


• Who will be informed?
• Am I answering the question being asked?
• How quickly does a decision need to be made?
For instance, if you are working on a rush project, you might need to rely on your own knowledge and
experience more than usual. There just isn’t enough time to thoroughly analyze all of the available
data. But if you get a project that involves plenty of time and resources, then the best strategy is to be
more data-driven. It’s up to you, the data analyst, to make the best possible choice. You will probably
blend data and knowledge a million different ways over the course of your data analytics career. And
the more you practice, the better you will get at finding that perfect blend.

It is time to enter the data analysis life cycle—the process of going from data to decision. Data goes
through several phases as it gets created, consumed, tested, processed, and reused. With a life cycle
model, all key team members can drive success by planning work both up front and at the end of the
data analysis process. While the data analysis life cycle is well known among experts, there isn't a
single defined structure of those phases. There might not be one single architecture that’s uniformly
followed by every data analysis expert, but there are some shared fundamentals in every data analysis
process. This reading provides an overview of several, starting with the process that forms the
foundation of the Google Data Analytics Certificate.

The process presented as part of the Google Data Analytics Certificate is one that will be valuable to
you as you keep moving forward in your career:

1. Ask: Business Challenge/Objective/Question


2. Prepare: Data generation, collection, storage, and data management
3. Process: Data cleaning/data integrity
4. Analyze: Data exploration, visualization, and analysis
5. Share: Communicating and interpreting results
6. Act: Putting your insights to work to solve the problem
Understanding this process—and all of the iterations that helped make it popular—will be a big part of
guiding your own analysis and your work in this program. Let’s go over a few other variations of the
data analysis life cycle.

EMC's data analysis life cycle


EMC Corporation's data analytics life cycle is cyclical with six steps:
1. Discovery
2. Pre-processing data
3. Model planning
4. Model building
5. Communicate results
6. Operationalize
EMC Corporation is now Dell EMC. This model, created by David Dietrich, reflects the cyclical nature of
real-world projects. The phases aren’t static milestones; each step connects and leads to the next, and
eventually repeats. Key questions help analysts test whether they have accomplished enough to move
forward and ensure that teams have spent enough time on each of the phases and don’t start
modeling before the data is ready. It is a little different from the data analysis life cycle this program is
based on, but it has some core ideas in common: the first phase is interested in discovering and asking
questions; data has to be prepared before it can be analyzed and used; and then findings should be
shared and acted on.

For more information, refer to this e-book, Data Science & Big Data Analytics.

SAS's iterative life cycle


An iterative life cycle was created by a company called SAS, a leading data analytics solutions
provider. It can be used to produce repeatable, reliable, and predictive results:

1. Ask
2. Prepare
3. Explore
4. Model
5. Implement
6. Act
7. Evaluate
The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol.
Their life cycle has seven steps, many of which we have seen in the other models, like Ask, Prepare,
Model, and Act. But this life cycle is also a little different; it includes a step after the act phase designed
to help analysts evaluate their solutions and potentially return to the ask phase again.

For more information, refer to Managing the Analytics Life Cycle for Decisions at Scale.

Project-based data analytics life cycle


A project-based data analytics life cycle has five simple steps:

1. Identifying the problem


2. Designing data requirements
3. Pre-processing data
4. Performing data analysis
5. Visualizing data
This data analytics project life cycle was developed by Vignesh Prajapati. It doesn’t include the sixth
phase, or what we have been referring to as the Act phase. However, it still covers a lot of the same
steps as the life cycles we have already described. It begins with identifying the problem, preparing
and processing data before analysis, and ends with data visualization.

For more information, refer to Understanding the data analytics project life cycle.

Big data analytics life cycle


Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics life cycle in their
book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their life cycle suggests phases
divided into nine steps:

1. Business case evaluation


2. Data identification
3. Data acquisition and filtering
4. Data extraction
5. Data validation and cleaning
6. Data aggregation and representation
7. Data analysis
8. Data visualization
9. Utilization of analysis results
This life cycle appears to have three or four more steps than the previous life cycle models. But in
reality, they have just broken down what we have been referring to as Prepare and Process into
smaller steps. It emphasizes the individual tasks required for gathering, preparing, and cleaning data
before the analysis phase.

For more information, refer to Big Data Adoption and Planning Considerations.

Key takeaway
From our journey to the pyramids and data in ancient Egypt to now, the way we analyze data has
evolved (and continues to do so). The data analysis process is like real life architecture, there are
different ways to do things but the same core ideas still appear in each model of the process. Whether
you use the structure of this Google Data Analytics Certificate or one of the many other iterations you
have learned about, we are here to help guide you as you continue on your data journey.

Which of the following statements best defines data?


1 / 1 point

Data is the use of calculations and statistics.

Data is a collection of facts.


Data is an assortment of questions.

Data is a business process.


Correct
Data is a collection of facts. Through analysis, data can be used to draw conclusions and make predictions.

2.
Question 2
Fill in the blank: In data analytics, the data ecosystem refers to the various elements that interact with one
another to produce, manage, store, _____, analyze, and share data.
1 / 1 point

organize

ingest

locate

merge
Correct
In data analytics, the data ecosystem refers to the various elements that interact with one another to produce,
manage, store, organize, analyze, and share data.

3.
Question 3
Which of the following terms refers to the collection, transformation, and organization of data in order to draw
conclusions, make predictions, and drive informed decision-making?
1 / 1 point

Data insight

Data analysis

Data life cycle

Data elements
Correct
Data analysis refers to the collection, transformation, and organization of data in order to draw conclusions,
make predictions, and drive informed decision-making.

4.
Question 4
An airline collects, observes, and analyzes its customers' online behaviors. Then, it uses the insights gained to
choose what new products and services to offer. What business process does this describe?
1 / 1 point

Performance measurement

Collaboration with stakeholders

Data-driven decision-making

Analytical thinking
Correct
An airline collecting, observing, and analyzing its customers' online behaviors, then using the insights gained to
choose what new products and services to offer, describes data-driven decision making. Data-driven decision-
making is using facts to guide business strategy.

1.
Question 1
The collection, transformation, and organization of data in order to draw conclusions, make predictions, and
drive informed decision-making describes what?
1 / 1 point

Data science

Data life cycle

Data analysis

Data ecosystem
Correct

2.
Question 2
Which of the following could be elements of a data ecosystem? Select all that apply
1 / 1 point

Producing data
Correct

Gaining insights

Managing data
Correct

Sharing data
Correct

3.
Question 3
A data scientist is someone who does what?
1 / 1 point

Designs new products

Solves engineering problems

Finds answers to existing questions by creating insights from data sources

Creates new questions using data


Correct

4.
Question 4
What tactics can a data analyst use to effectively blend gut instinct with facts? Select all that apply.
1 / 1 point

Ask how to define success for a project, but rely most heavily on their own personal perspective.

Focus on intuition to choose which data to collect and how to analyze it.
Use their knowledge of how their company works to better understand a business need.
Correct

Apply their unique past experiences to their current work, while keeping in mind the story the data is
telling.
Correct

5.
Question 5
Sharing your results with subject matter experts and gathering and analyzing data are carried out in data driven-
decision-making. What else is included in this process?
1 / 1 point

Identification of trends

Determining the stakeholders

Drawing conclusions from your analysis.

Surveying customers about results, conclusions, and recommendations


Correct

6.
Question 6
You have just received the results of your latest analysis about the effectiveness of your firm’s recent marketing
campaign. However, because you want to follow data-driven decision-making you share your results with
colleagues from the marketing department for their validation. In this role, these colleague’s are acting as what?
1 / 1 point

customers

stakeholders

subject-matter experts

competitors
Correct

7.
Question 7
Consulting with experts in the marketing department about your marketing analysis is an example of what
process?
1 / 1 point

Data analytics

Data management

Data-driven decision-making

Data science
Correct
Analytical skills are qualities and characteristics associated with solving problems using facts.
They are curiosity, understanding context, having technical mindset, data design, and data strategy.
Curiosity is all about wanting to learn something. Curious people usually seek out new challenges and
experiences.
Context is the condition in which something exists or happens. This can be a structure or an
environment.
A technical mindset involves the ability to break things down into smaller steps or pieces and work
with them in an orderly and logical way.
Data design is how you organize information.
Data strategy is the management of the people, processes, and tools used in data analysis.

1.
Question 1
This practice quiz will help you get a read on the analytical skills
you already have.
Identify the pattern from left to right in the set of blocks below and try to predict which block should replace the
block with the question mark.

1 / 1 point

Correct
This is the missing block. The pattern of the dots increases by one in each block. Therefore, the best answer has
five dots.

2.
Question 2
Here's a more complex pattern. Identify the pattern from left to right in the images below and try to predict
which image should come next.
1st pattern: Octagon with 7 dots 2nd pattern: Heptagon with 6 dots
3rd pattern: Hexagon with 5 dots 4th pattern: Pentagon with 4 dots 5th pattern: Square with 3 dots 6th pattern:
Question mark
Based on the images above, which option comes next in the pattern?
1 / 1 point

Correct
This is the next image in the sequence based on two patterns present in the series: the number of sides and the
number of dots. Moving from left to right, both decrease by one. Given these patterns, if the previous block
contained a shape with four sides and three dots, then the next shape should have three sides and two dots.

3.
Question 3
Now, find a pattern in a different format. Select the next number in the sequence:

Fill in the blank: 9, 13, 17, 21, 25, 29, _____


1 / 1 point

10

55

25

33
Correct
The correct answer is 33. The pattern of numbers are all increasing, and the difference between each number is
4.

4.
Question 4
The following numbers are in a sequence from left to right. Determine the pattern and decide which number
should come next:

Fill in the blank: 4, 9, 16, 25, 36, 49, _____


1 / 1 point

30

64

62
81
Correct
The next number in the series is 64. There are two patterns in the sequence. One is that each number is squared
and then the number being squared is increased by one (e.g., 2², 3², 4², 5², 6², 7²). The second pattern is in the
difference between the numbers in the sequence: 9 - 4 = 5, 16 - 9 = 7, 25 - 16 = 9, and so on.

5.
Question 5
The following question is about recognizing and matching patterns in shapes that are the same, but viewed from
different angles.

Two shapes are similar when one can become the other after a rotation clockwise ⟳ or counterclockwise ↺, or a
reflection horizontally ↔ and/or vertically ↕.

Your task is to choose the figure that completes the statement. Pay attention to the pattern by which the first
and second shapes are related, and then figure out which choice matches shape 3. Fill in the blank:

Select the image below that completes the statement.


1 / 1 point

Correct
This image completes the statement. The first image in the statement is reflected in the second image. To
complete the analogy, the answer would be an image that is a side-by-side reflection of the third image.

6.
Question 6
The following question is about recognizing and matching patterns in shapes that are the same, but viewed from
different angles. Two shapes are similar when one can become the other after a rotation clockwise ⟳ or
counterclockwise ↺, or a reflection horizontally ↔ and/or vertically ↕.

Your task is to choose the figure that completes the statement. Fill in the blank:
Which image completes it?
1 / 1 point

Correct
Since the pattern in the first image was rotated 90 degrees counter-clockwise, this image completes the
statement.

7.
Question 7
The following series of codes are in a sequence from left to right. There is a repeating pattern that you will
notice. Determine the pattern and decide which code should come next.

Fill in the blank: A1, B3, C5, D7, E9, F11, G13, _____
1 / 1 point

H15

J15

D17

H16
Correct
The patterns of this series are the letters listed alphabetically and the numbers increasing by two with each new
set. Therefore, following that pattern, the next code would be H15.

8.
Question 8
The following series of codes are in a sequence from left to right. There is a repeating pattern that you will
notice. Determine the pattern and decide which sequence of letters should come next.

Fill in the blank: A, AA, AAA, B, BA, BAA, BAAA, BB, BBA, BBAA, BBAAA, BBB, ________

1 / 1 point

BBAAA

BBAA

BBBA
BBBB
Correct
The pattern in this sequence follows the letter A. A is added until there are three As, which is when the letter B
takes the place of the previous As, and the pattern continues. Therefore, BBBA is next in the series.

9.
Question 9
Now, identify patterns in a word problem using a data visualization. There are 12 chocolates in a box: eight have
caramel filling, six have coconut filling, and two have both caramel and coconut filling. Choose the best image
that describes this box of chocolates.
1 / 1 point

Correct
This diagram depicts six chocolates with caramel filling only, four chocolates with coconut filling only, two
chocolates with both caramel and coconut filling, and the total number of chocolates is 12.

10.
Question 10
There are 10 children in a class and they have all brought sandwiches for lunch: five children have sandwiches
with peanut butter, six children have sandwiches with jelly, and three children have sandwiches with both
peanut butter and jelly.

Find out how many children have sandwiches with neither peanut butter nor jelly and choose the image that
describes the situation best.
1 / 1 point
Correct
In this diagram, there are six sandwiches with jelly, five sandwiches with peanut butter, and three sandwiches
with both. This means that there are (5 + 6 - 3 = 8 ) eight sandwiches with either peanut butter or jelly. There are
a total of 10 children. Consider: 10 - 8 = 2. This means two children have neither peanut butter nor jelly in their
sandwiches.
Description
The analytical skill that involves breaking processes down into smaller steps and working with them in an orderly,
logical way

Skill
A technical mindset

Description
The qualities and characteristics associated with solving problems using facts

Skill
Analytical skills

Description
The analytical skill that involves how you organize information

Skill
Data design

Description
The analytical skill that has to do with how you group things into categories

Skill
Understanding context

Description
The analytical skill that involves managing the processes and tools used in data analysis

Skill
Data strategy

Fill in the blank: Data visualization involves using _____ to represent and present data. Select all that apply.

charts
Correct
Data visualization involves using graphs, maps, and charts to represent and present data.

maps
Correct
Data visualization involves using graphs, maps, and charts to represent and present data.
reports

graphs
Correct
Data visualization involves using graphs, maps, and charts to represent and present data.

Question 1
What practice involves identifying, defining, and solving a problem by using data in an organized, step-by-step manner?
1 / 1 point

Data design

Analytical thinking

Visualization

Context
Correct
Analytical thinking involves identifying and defining a problem, then solving it by using data in an organized, step-by-step
manner.

2.
Question 2
Which of the following are examples of data visualizations? Select all that apply.
1 / 1 point

Maps
Correct
Graphs, maps, and charts are used in data visualization.

Reports

Charts
Correct
Graphs, maps, and charts are used in data visualization.

Graphs
Correct
Graphs, maps, and charts are used in data visualization.

3.
Question 3
Gap analysis is used to examine and evaluate how a process currently works with the goal of getting to where you want to be
in the future.
1 / 1 point

True False
Correct
Gap analysis is used to examine and evaluate how a process currently works with the goal of getting to where you want to be
in the future.

4.
Question 4
Which aspect of analytical thinking involves being able to identify a relationship between two or more pieces of data?
1 / 1 point

Data design
Context

Correlation

Visualization
Correct
Correlation involves being able to identify a relationship between two or more pieces of data. A correlation is like a
relationship.

.
Question 1
Fill in the blank: The analytical skill of ______ involves seeking out new experiences in order to gain knowledge.
1 / 1 point

having a technical mindset

curiosity

data strategy

understanding context
Correct

2.
Question 2
Adding descriptive headers to columns of data in a spreadsheet is an example of which analytical skill?
1 / 1 point

Understanding context

Data strategy

Having a technical mindset

Curiosity
Correct

3.
Question 3
Fill in the blank: A data analyst with a technical mindset would break things down into smaller steps or pieces and work with
them in an orderly and ______ way.
1 / 1 point

curious

creative

logical

clever
Correct

4.
Question 4
As a recently promoted data scientist one of your responsibilities is the implementation of data strategy. What would this
responsibility include?
0 / 1 point

Evaluating how a process works currently in order to get where you want to be in the future

Identifying a relationship between two or more pieces of data

Breaking things down into smaller steps or pieces and working with them in an orderly and logical way

Managing the people, processes, and tools involved


Incorrect
Review the section on analytical skills.

5.
Question 5
Identifying a relationship between two or more pieces of data is known as what?
1 / 1 point

problem-orientation

detail-oriented thinking

visualization

correlation
Correct

6.
Question 6
Fill in the blank: In order to get to the root cause of a problem, a data analyst should ask “Why?” ________ times.
1 / 1 point

seven

three

five

four
Correct

7.
Question 7
An airport wants to make its luggage-handling process faster and simpler for travelers. A data analyst examines and
evaluates how the process works currently in order to achieve the goal of a more efficient process. What methodology do
they use?
1 / 1 point

Gap analysis

The five whys

Data visualization

Strategy
Correct

8.
Question 8
Data analysts following data-driven decision-making use the analytical skills of curiosity, having a technical mindset, and
data design. What other two analytical skills would they employ? Select all that apply.
1 / 1 point

knowledge

data strategy
Correct

understanding context
Correct

efficiency

Phase 1
Ask: Define the problem and confirm stakeholder expectations
Phase 2
Prepare: Collect and store data for analysis
Phase 3
Process: Clean and transform data to ensure integrity
Phase 4
Analyze: Use data analysis tools to draw conclusions
Phase 5
Share: Interpret and communicate results to others to make data-driven decisions
Phase 6
Act: Put your insights to work in order to solve the original problem

Variations of the data life cycle


You learned that there are six stages to the data life cycle. Here is a recap:

1. Plan: Decide what kind of data is needed, how it will be managed, and who will be responsible for
it.
2. Capture: Collect or bring in data from a variety of different sources.
3. Manage: Care for and maintain the data. This includes determining how and where it is stored and
the tools used to do so.
4. Analyze: Use the data to solve problems, make decisions, and support business goals.
5. Archive: Keep relevant data stored for long-term and future reference.
6. Destroy: Remove data from storage and delete any shared copies of the data.
Warning: Be careful not to mix up or confuse the six stages of the data life cycle (Plan, Capture, Manage,
Analyze, Archive, and Destroy) with the six phases of the data analysis life cycle (Ask, Prepare,
Process, Analyze, Share, and Act). They shouldn't be used or referred to interchangeably.
The data life cycle provides a generic or common framework for how data is managed. You may recall that
variations of the data analysis life cycle were described in Origins of the data analysis process. The same
can be done for the data life cycle. The rest of this reading provides a glimpse of how government, finance,
and education institutions can view data life cycles a little differently.

U.S. Fish and Wildlife Service


The U.S. Fish and Wildlife Service uses the following data life cycle:

1. Plan
2. Acquire
3. Maintain
4. Access
5. Evaluate
6. Archive
For more information, refer to U.S. Fish and Wildlife's Data Management Life Cycle page.

The U.S. Geological Survey (USGS)


The USGS uses the data life cycle below:

1. Plan
2. Acquire
3. Process
4. Analyze
5. Preserve
6. Publish/Share
Several cross-cutting or overarching activities are also performed during each stage of their life cycle:

• Describe (metadata and documentation)


• Manage Quality
• Backup and Secure
For more information, refer to the USGS Data Lifecycle page.

Financial institutions
Financial institutions may take a slightly different approach to the data life cycle as described in The Data
Life Cycle, an article in Strategic Finance magazine:

1. Capture
2. Qualify
3. Transform
4. Utilize
5. Report
6. Archive
7. Purge
Harvard Business School (HBS)
One final data life cycle informed by Harvard University research has eight stages:

1. Generation
2. Collection
3. Processing
4. Storage
5. Management
6. Analysis
7. Visualization
8. Interpretation
For more information, refer to 8 Steps in the Data Life Cycle.

Key takeaway
Understanding the importance of the data life cycle will set you up for success as a data analyst. Individual
stages in the data life cycle will vary from company to company or by industry or sector. Historical data is
important to both the U.S. Fish and Wildlife Service and the USGS, so their data life cycle focuses on
archiving and backing up data. Harvard's interests are in research and teaching, so its data life cycle
includes visualization and interpretation even though these are more often associated with a data analysis
life cycle. The HBS data life cycle also doesn't call out a stage for purging or destroying data. In contrast,
the data life cycle for finance clearly identifies archive and purge stages. To sum it up, although data life
cycles vary, one data management principle is universal. Govern how data is handled so that it is accurate,
secure, and available to meet your organization's needs.

Fill in the blank: During the _____ phase of the data life cycle, a business decides what kind of data it needs, how
it will be managed, who will be responsible for it, and the optimal outcomes.
1 / 1 point

capture

manage

archive

planning
Correct
During the planning phase of the data life cycle, a business decides what kind of data it needs, how it will be
managed, who will be responsible for it, and the optimal outcomes.

2.
Question 2
In the data life cycle, which phase involves gathering data from various sources and bringing it into the
organization?
1 / 1 point

Archive

Analyze
Capture

Manage
Correct
The capture phase involves gathering data from various sources and bringing it into the organization.

3.
Question 3
A data analyst finishes using a dataset, so they erase or shred the files in order to protect private information.
This is called archiving.
0 / 1 point

True False
Incorrect
Erasing or shredding files describes the destroy phase of the data life cycle. Archiving involves storing files in a
place where it's still available.

4.
Question 4
A dairy farmer decides to open an ice cream shop on her farm. After surveying the local community about
people’s favorite flavors, she takes the data they provided and stores it in a secure hard drive so it can be
maintained safely on her computer. This is part of which phase of the data life cycle?
1 / 1 point

Analyze

Manage

Plan

Archive
Correct
This is the manage phase of the data life cycle. It deals with how data is cared for, how and where it’s stored, the
tools used to keep it safe and secure, and the actions taken to make sure it’s maintained properly.

5.
Question 5
After opening the ice cream shop on her farm, the same dairy farmer then surveys the local community about
people’s favorite flavors. She uses the data she collected to determine that the top five flavors are strawberry,
vanilla, chocolate, mint chip, and peanut butter. She feels confident in her decision to sell these flavors. This is
part of which phase of the data life cycle?
1 / 1 point

Capture

Plan

Analyze

Archive
Correct
This is part of the analyze phase. This phase involves using data to make smart decisions and support business
goals.
Key data analyst tools
As you are learning, the most common programs and solutions used by data analysts include
spreadsheets, query languages, and visualization tools. In this reading, you will learn more about each one.
You will cover when to use them, and why they are so important in data analytics.

Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet applications you
will probably use a lot in your future role as a data analyst are Microsoft Excel and Google Sheets.

Spreadsheets structure data in a meaningful way by letting you

• Collect, store, organize, and sort information


• Identify patterns and piece the data together in a way that works for each specific data project
• Create excellent data visualizations, like graphs and charts.

Databases and query languages


A database is a collection of structured data stored in a computer system. Some popular Structured Query
Language (SQL) programs include MySQL, Microsoft SQL Server, and BigQuery.

Query languages

• Allow analysts to isolate specific information from a database(s)


• Make it easier for you to learn and understand the requests made to databases
• Allow analysts to select, create, add, or download data from a database for analysis

Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and more. Two popular
visualization tools are Tableau and Looker.

These tools

• Turn complex numbers into a story that people can understand


• Help stakeholders come up with conclusions that lead to informed decisions and effective
business strategies
• Have multiple features
- Tableau's simple drag-and-drop feature lets users create interactive graphs in dashboards and

worksheets

- Looker communicates directly with a database, allowing you to connect your data right to the visual
tool you choose

A career as a data analyst also involves using programming languages, like R and Python, which are used a
lot for statistical analysis, visualization, and other data analysis.

Key takeaway
You have a lot of tools as a data analyst. This is a first glance at the possibilities, and you will explore many
of these tools in-depth throughout this program.

Choosing the right tool for the job


As a data analyst, you will usually have to decide which program or solution is right for the particular
project you are working on. In this reading, you will learn more about how to choose which tool you need
and when.

Depending on which phase of the data analysis process you’re in, you will need to use different tools. For
example, if you are focusing on creating complex and eye-catching visualizations, then the visualization
tools we discussed earlier are the best choice. But if you are focusing on organizing, cleaning, and
analyzing data, then you will probably be choosing between spreadsheets and databases using queries.
Spreadsheets and databases both offer ways to store, manage, and use data. The basic content for both
tools are sets of values. Yet, there are some key differences, too:

Spreadsheets Databases
Software applications Data stores - accessed using a query language (e.g. SQL)
Structure data in a row and column format Structure data using rules and relationships
Organize information in cells Organize information in complex collections
Provide access to a limited amount of data Provide access to huge amounts of data
Manual data entry Strict and consistent data entry
Generally, one user at a time Multiple users
Controlled by the user Controlled by a database management system
You don’t have to choose one or the other because each serves its own purpose. Generally, data analysts
work with a combination of the two, as both tools are very useful in data analytics. For example, you can
store data in a database, then export it to a spreadsheet for analysis. Or, if you are collecting information in
a spreadsheet, and it becomes too much for that particular platform, you can import it into a database.
And, later in this course, you will learn about programming languages like R that give you even greater
control of your data, its analysis, and the visualizations you create.
You are in the plan stage of the data lifecycle for your current project. What action might you take during this stage?
1 / 1 point

Decide what kind of data is needed. Shred paper files. Validate insights provided by analysts. Use
a formula to perform calculations.
Correct

2.
Question 2
Fill in the blank: Shredding paper files and using data-erasure software would be actions taken by a data analyst in the
_________ stage of the data lifecycle.
1 / 1 point

Plan Manage Destroy Archive


Correct

3.
Question 3
A data analyst uses a spreadsheet function to aggregate data. Then, they add a pivot table to show totals from least to
greatest. This would happen during which phase of the data life cycle?
1 / 1 point

Plan Manage Capture Analyze


Correct

4.
Question 4
Fill in the blank: Data analysis has six process steps whereas the data life cycle has six _____.
1 / 1 point

stages steps key questions data analytics tools


Correct

5.
Question 5
A company takes insights provided by its data analytics team, validates them, and finalizes a strategy. They then
implement a plan to solve the original business problem. This describes which step of the data analysis process?
1 / 1 point

Process Analyze Share Act


Correct

6.
Question 6
In data analysis, a function is a preset command whereas a formula is a set of instructions used to carry out a specific
calculation.
1 / 1 point

True False
Correct

7.
Question 7
In the course of their current project, a data analyst uses a query to retrieve and request information. Which of the
following are options the analyst can use a query for? Select all that apply.
0.25 / 1 point

Visualizing data
This should not be selected
Review the video on the data analyst’s toolkit.

Deleting data

Collecting data
This should not be selected
Review the video on the data analyst’s toolkit.

Updating data
Correct

8.
Question 8
A data analyst wants to retrieve information from a database. Select the correct tool from the data analyst’s toolkit.
1 / 1 point

Dashboard Query Spreadsheet Visualization


Correct

Fill in the blank: A data analyst uses a SQL query to retrieve information from a
database. They add a WHERE statement to _____ the data based on certain conditions.
filter sort categorize copy
Correct
They add a WHERE statement to filter the data based on certain conditions.
SQL Guide: Getting started
Just as humans use different languages to communicate with others, so do computers.
Structured Query Language (or SQL, often pronounced “sequel”) enables data analysts to
talk to their databases. SQL is one of the most useful data analyst tools, especially when
working with large datasets in tables. It can help you investigate huge databases, track down
text (referred to as strings) and numbers, and filter for the exact kind of data you need—much
faster than a spreadsheet can.

If you haven’t used SQL before, this reading will help you learn the basics so you can
appreciate how useful SQL is and how useful SQL queries are in particular. You will be writing
SQL queries in no time at all.

What is a query?
A query is a request for data or information from a database. When you query databases, you
use SQL to communicate your question or request. You and the database can always
exchange information as long as you speak the same language.

Every programming language, including SQL, follows a unique set of guidelines known as
syntax. Syntax is the predetermined structure of a language that includes all required words,
symbols, and punctuation, as well as their proper placement. As soon as you enter your
search criteria using the correct syntax, the query starts working to pull the data you’ve
requested from the target database.

The syntax of every SQL query is the same:

• Use SELECT to choose the columns you want to return.


• Use FROM to choose the tables where the columns you want are located.
• Use WHERE to filter for certain information.
A SQL query is like filling in a template. You will find that if you are writing a SQL query from
scratch, it is helpful to start a query by writing the SELECT, FROM, and WHERE keywords in the
following format:

Next, enter the table name after the FROM; the table
columns you want after the SELECT; and, finally, the
conditions you want to place on your query after the
WHERE. Make sure to add a new line and indent when
adding these, as shown below:
first_name
Tony
Tony
Tony
Following this method each time makes it easier to write
SQL queries. It can also help you make fewer syntax errors.

Example of a query
Here is how a simple query would appear in BigQuery, a data warehouse on the Google Cloud
Platform.

The above query uses three commands to locate customers with the first name Tony:

1. SELECT the column named first_name


2. FROM a table named customer_name (in a dataset named customer_data) (The
dataset name is always followed by a dot, and then the table name.)
3. But only return the data WHERE the first_name is Tony
The results from the query might be similar to the following:

As you can conclude, this query had the correct syntax, but wasn't very useful after the data
was returned.

Multiple columns in a query


In real life, you will need to work with more data beyond customers named Tony. Multiple
columns that are chosen by the same SELECT command can be indented and grouped
together.

If you are requesting multiple data fields from a table, you need to include these columns in
your SELECT command. Each column is separated by a comma as shown below:
Here is an example of how it would appear in BigQuery:

The above query uses three commands to locate customers with the first name Tony.

1. SELECT the columns named customer_id, first_name, and last_name


2. FROM a table named customer_name (in a dataset named customer_data) (The
dataset name is always followed by a dot, and then the table name.)
3. But only return the data WHERE the first_name is Tony
The only difference between this query and the previous one is that more data columns are
selected. The previous query selected first_name only while this query selects customer_id
and last_name in addition to first_name. In general, it is a more efficient use of resources to
select only the columns that you need. For example, it makes sense to select more columns if
you will actually use the additional fields in your WHERE clause. If you have multiple
conditions in your WHERE clause, they may be written like this:

Notice that unlike the SELECT command that uses a comma to separate
fields/variables/parameters, the WHERE command uses the AND statement to connect
conditions. As you become a more advanced writer of queries, you will make use of other
connectors/operators such as OR and NOT.
Here is a BigQuery example with multiple fields used in a WHERE clause:

The above query uses three commands to locate customers with a valid (greater than 0)
customer ID whose first name is Tony and last name is Magnolia.

1. SELECT the columns named customer_id, first_name, and last_name


2. FROM a table named customer_name (in a dataset named customer_data) (The
dataset name is always followed by a dot, and then the table name.)
3. But only return the data WHERE customer_id is greater than 0, first_name is Tony,
and last_name is Magnolia.
Note that one of the conditions is a logical condition that checks to see if customer_id is
greater than zero.

If only one customer is named Tony Magnolia, the results from the query could be:

customer_id first_name last_name


1967 Tony Magnolia
If more than one customer has the same name, the results from the query could be:

customer_id first_name last_name


1967 Tony Magnolia
7689 Tony Magnolia

Key takeaway
The most important thing to remember is how to use SELECT, FROM, and WHERE in a query.
Queries with multiple fields will become simpler after you practice writing your own SQL
queries later in the program.
Endless SQL possibilities
You have learned that a SQL query uses SELECT, FROM, and WHERE to specify the data to be returned
from the query. This reading provides more detailed information about formatting queries, using WHERE
conditions, selecting all columns in a table, adding comments, and using aliases. All of these make it easier
for you to understand (and write) queries to put SQL in action. The last section of this reading provides an
example of what a data analyst would do to pull employee data for a project.

Capitalization, indentation, and semicolons


You can write your SQL queries in all lowercase and don’t have to worry about extra spaces between
words. However, using capitalization and indentation can help you read the information more easily. Keep
your queries neat, and they will be easier to review or troubleshoot if you need to check them later on.

Notice that the SQL statement shown above has a semicolon at the end. The semicolon is a statement
terminator and is part of the American National Standards Institute (ANSI) SQL-92 standard, which is a
recommended common syntax for adoption by all SQL databases. However, not all SQL databases have
adopted or enforce the semicolon, so it’s possible you may come across some SQL statements that aren’t
terminated with a semicolon. If a statement works without a semicolon, it’s fine.

WHERE conditions
In the query shown above, the SELECT clause identifies the column you want to pull data from by name,
field1, and the FROM clause identifies the table where the column is located by name, table. Finally, the
WHERE clause narrows your query so that the database returns only the data with an exact value match or
the data that matches a certain condition that you want to satisfy.

For example, if you are looking for a specific customer with the last name Chavez, the WHERE clause would
be:

WHERE field1 = 'Chavez'

However, if you are looking for all customers with a last name that begins with the letters “Ch," the WHERE
clause would be:

WHERE field1 LIKE 'Ch%'

You can conclude that the LIKE clause is very powerful because it allows you to tell the database to look for
a certain pattern! The percent sign (%) is used as a wildcard to match one or more characters. In the
example above, both Chavez and Chen would be returned. Note that in some databases an asterisk (*) is
used as the wildcard instead of a percent sign (%).

SELECT all columns


Can you use SELECT * ?

In the example, if you replace SELECT field1 with SELECT * , you would be selecting all of the columns in
the table instead of the field1 column only. From a syntax point of view, it is a correct SQL statement, but
you should use the asterisk (*) sparingly and with caution. Depending on how many columns a table has,
you could be selecting a tremendous amount of data. Selecting too much data can cause a query to run
slowly.

Comments
Some tables aren’t designed with descriptive enough naming conventions. In the example, field1 was the
column for a customer’s last name, but you wouldn’t know it by the name. A better name would have been
something such as last_name. In these cases, you can place comments alongside your SQL to help you
remember what the name represents. Comments are text placed between certain characters, /* and */, or
after two dashes (--) as shown below.

Comments can also be added outside of a statement as well as within a statement. You can use this
flexibility to provide an overall description of what you are going to do, step-by-step notes about how you
achieve it, and why you set different parameters/conditions.

The more comfortable you get with SQL, the easier it will be to read and understand queries at a glance.
Still, it never hurts to have comments in a query to remind yourself of what you’re trying to do. This also
makes it easier for others to understand your query if your query is shared. As your queries become more
and more complex, this practice will save you a lot of time and energy to understand complex queries you
wrote months or years ago.

Example of a query with comments


Here is an example of how comments could be written in BigQuery:
In the above example, a comment has been added before the SQL statement to explain what the query
does. Additionally, a comment has been added next to each of the column names to describe the column
and its use. Two dashes (--) are generally supported. So it is best to use -- and be consistent with it. You can
use # in place of -- in the above query, but # is not recognized in all SQL versions; for example, MySQL
doesn’t recognize #. You can also place comments between /* and */ if the database you are using
supports it.

As you develop your skills professionally, depending on the SQL database you use, you can pick the
appropriate comment delimiting symbols you prefer and stick with those as a consistent style. As your
queries become more and more complex, the practice of adding helpful comments will save you a lot of
time and energy to understand queries that you may have written months or years prior.

Aliases
You can also make it easier on yourself by assigning a new name or alias to the column or table names to
make them easier to work with (and avoid the need for comments). This is done with a SQL AS clause. In
the example below, the alias last_name has been assigned to field1 and the alias customers assigned to
table. These aliases are good for the duration of the query only. An alias doesn’t change the actual name of
a column or table in the database.

Example of a query with aliases

Putting SQL to work as a data


analyst
Imagine you are a data analyst for a small business and your manager asks you for some employee data.
You decide to write a query with SQL to get what you need from the database.

You want to pull all the columns: empID, firstName, lastName, jobCode, and salary. Because you know
the database isn’t that big, instead of entering each column name in the SELECT clause, you use SELECT
*. This will select all the columns from the Employee table in the FROM clause.
Now, you can get more specific about the data you want from the Employee table. If you want all the data
about employees working in the SFI job code, you can use a WHERE clause to filter out the data based on
this additional requirement.

Here, you use:

A portion of the resulting data returned from the SQL query might look like
this:

empID firstName lastName jobCode salary


0002 Homer Simpson SFI 15000
0003 Marge Simpson SFI 30000
0034 Bart Simpson SFI 25000
0067 Lisa Simpson SFI 38000
0088 Ned Flanders SFI 42000
0076 Barney Gumble SFI 32000
Suppose you notice a large salary range for the SFI job code. You might like to flag all employees in all
departments with lower salaries for your manager. Because interns are also included in the table and they
have salaries less than $30,000, you want to make sure your results give you only the full time employees
with salaries that are $30,000 or less. In other words, you want to exclude interns with the INT job code
who also earn less than $30,000. The AND clause enables you to test for both conditions.

You create a SQL query similar to below, where <> means "does not equal":

The resulting data from the SQL query might look like the following (interns with the job code INT aren't
returned):

empID firstName lastName jobCode salary


0002 Homer Simpson SFI 15000
empID firstName lastName jobCode salary
0003 Marge Simpson SFI 30000
0034 Bart Simpson SFI 25000
0108 Edna Krabappel TUL 18000
0099 Moe Szyslak ANA 28000
With quick access to this kind of data using SQL, you can provide your manager with tons of different
insights about employee data, including whether employee salaries across the business are equitable.
Fortunately, the query shows only an additional two employees might need a salary adjustment and you
share the results with your manager.

Pulling the data, analyzing it, and implementing a solution might ultimately help improve employee
satisfaction and loyalty. That makes SQL a pretty powerful tool.

Standard SQL Structure

This is Part 1 to a series of PostgreSQL cheat sheets and will cover SELECT, FROM, WHERE, GROUP

BY, HAVING, ORDER BY and LIMIT.

The basic structure of a query pulling results from a single table is as follows.
SELECT
COLUMN_NAME(S)
FROM
TABLE_NAME
WHERE
CONDITION
GROUP BY
COLUMN_NAME(S)
HAVING
AGGREGATE_CONDITION
ORDER BY
COLUMN_NAME
LIMIT
N

What is SQL?

SQL (pronounced “ess-que-el”) stands for Structured Query Language. SQL is used to

communicate with a database. It is the standard language for relational database management

systems. SQL statements are used to perform tasks such as update data on a database or retrieve

data from a database.


What is Relational Database Management System (RDBMS)?

An RDBMS organizes data into tables with rows and columns. The term relational means that values within each table

have a relationship with each other.

• Rows — also known as records

• Columns — also known as fields, have a descriptive name and specific data type.

What is PostgreSQL?

PostgreSQL is a general-purpose and relational database management system, the most advanced open-source

database system.

Other common database management systems are MySQL, Oracle, IBM Db2, and MS Access.

Let’s begin!

SELECT
The SELECT statement is used to select data from a database. The data returned is stored in a
result table, called the result-set.

Specific columns
SELECT
COLUMN_1,
COLUMN_2
FROM
TABLE_NAME

All columns
Using the * you can query every column in your table
SELECT *
FROM
TABLE_NAME

DISTINCT Columns
Finding all the unique records in a column
SELECT
DISTINCT(COLUMN_NAME)
FROM
TABLE_NAME

COUNT all rows


If you want to know all the values in the entire table use COUNT(*) you will get a single number.
SELECT
COUNT(*)
FROM
TABLE_NAME

COUNT DISTINCT values


If you want the number of distinct values in a column using COUNT with DISTINCT and you will get
a number representing the total unique values of a column
SELECT
COUNT (DISTINCT COLUMN_NAME)
FROM
TABLE_NAME

WHERE
Using the WHERE the clause, you can create conditions to filter out values you want or don't want.

NOTE — WHERE is always used before a GROUP BY (More on this later)


SELECT *
FROM
TABLE_NAME
WHERE
CONDITION

Conditions
There are a variety of conditions that can be used in SQL. Below are some examples of a table
that consists of students’ grades in school. You only need to specify WHERE once, for the sake of the
example, I have included WHERE in each step.
WHERE FIRSTNAME = 'BOB' -- exact match
WHERE FIRSTNAME != 'BOB' -- everything excluding BOB
WHERE NOT FIRSTNAME ='BOB' -- everything excluding BOBWHERE FIRSTNAME IN ('BOB',
'JASON') -- either condition is met
WHERE FIRSTNAME NOT IN ('BOB', 'JASON') -- excludes both valuesWHERE FIRSTNAME = 'BOB'
AND LASTNAME = 'SMITH' -- both conditions
WHERE FIRSTNAME = 'BOB' OR FIRSTNAME = 'JASON' -- either conditionWHERE GRADES > 90
-- greater than 90
WHERE GRADES < 90 -- less than 90
WHERE GRADES >= 90 -- greater than or equal to 90
WHERE GRADES <= 90 -- less than or equal to 90WHERE SUBJECT IS NULL --
returns values with missing values
WHERE SUBJECT NOT NULL -- returns values with no missing values
Conditions — Wildcards
LIKE operator is used in a WHERE clause to search for a specified pattern in a column. When you
pass the LIKE operator in the '' upper and lower case matters.

There are two wildcards often used in conjunction with the LIKE operator:

• % - The percent sign represents zero, one, or multiple characters

• _ - The underscore represents a single character


WHERE FIRSTNAME LIKE ‘B%’ -- finds values starting uppercase BWHERE FIRSTNAME LIKE ‘%b’ --
finds values ending lowercase bWHERE FIRSTNAME LIKE ‘%an%’ -- find values that have “an”
in any positionWHERE FIRSTNAME LIKE ‘_n%’ -- find values that have “n” in the second
positionWHERE FIRSTNAME LIKE ‘B__%’ -- find values that start with “B” and have at least 3
characters in lengthWHERE FIRSTNAME LIKE ‘B%b’ -- find values that start with “B” and end
with “b”WHERE FIRSTNAME LIKE ‘[BFL]’ -- find all values that start with ‘B’, ‘F’ OR
‘L’WHERE FIRSTNAME LIKE ‘[B-D]’ -- find all values that start with ‘B’, ‘C’, OR ‘D’WHERE
FIRSTNAME LIKE ‘[!BFL]%’ -- find everything exlcusing values that start with ‘B’, ‘F’ OR
‘L’WHERE FIRSTNAME NOT LIKE ‘[BFL]%’ -- same as above. excludes values starting with ‘B’,
‘F’, OR ‘L’WHERE GRADES BETWEEN 80 and 90 -- find grades between 80 and 90

GROUP BY
The GROUP BY function helps calculate summary values by the chosen column. It is often used
with aggregate functions (COUNT, SUM, AVG, MAX, MIN).
SELECT
SUBJECT,
AVG(GRADES)
FROM
STUDENTS
GROUP BY
SUBJECT

The query above will group each subject and calculate the average grades.
SELECT
SUBJECT,
COUNT(*)
FROM
STUDENTS
GROUP BY
SUBJECT

The above query will calculate the number (count) of students in each subject.

HAVING
The HAVING clause is similar to WHERE but is catered for filtering aggregate functions.
The HAVING function comes after the GROUP BY, in comparison the WHERE comes before the GROUP
BY.
If we wanted to find which subject had an average grade of 90 or more, we could use the
following.
SELECT
SUBJECT,
AVG(GRADES)
FROM
STUDENTS
GROUP BY
SUBJECT
HAVING
AVG(GRADES) >= 90

ORDER BY
Using the ORDER BY function, you can specify how you want your values sorted. Continuing with
the Student tables from earlier.
SELECT
*
FROM
STUDENTS
ORDER BY
GRADES DESC

When using the ORDER BY by default, the sort will be in ascending order. If you want to descend,
you need to specify DESC after the column name.

LIMIT
In Postgres, we can use the LIMIT function to control how many rows are outputted in the query.
For example, if we wanted to find the top 3 students with the highest grades.
SELECT
*
FROM
STUDENTS
ORDER BY
GRADES DESC
LIMIT
3

Since we use ORDER BY DESC we have the order of students with the highest grades on top - now
limiting it to 3 values, we see the top 3.

1.
Question 1

SELECT * FROM employee WHERE jobCode = 'FTE' AND LastName = 'James'


What does the asterisk (*) after SELECT tell the database to do in this query?
1 / 1 point

Select all columns from the employee table Select the LastName column from the employee table
Select all data that meets the criteria as stated in the query

Select all data that meets the criteria as stated in the query, then multiply it
Correct
SELECT * tells the database to select all columns from the employee table. The criteria in the WHERE clause tells
the database what data in those columns the query should return.

2.
Question 2

SELECT * FROM employee WHERE jobCode = 'FTE' AND LastName = 'James'


In this query, the data analyst wants to retrieve data from which table?
1 / 1 point

LastName jobCode James employee


Correct
The data analyst wants to retrieve data from the employee table.

3.
Question 3

SELECT * FROM employee WHERE jobCode = 'FTE' AND LastName = 'James'


In this query, what will be retrieved from the database?
1 / 1 point

All data from the FTE table, where the employee's LastName is James.

All data from the jobCode table, where the jobCode is FTE and the employee has any last name other than
James.

All data from the employee table, where the jobCode is FTE and the employee has any last name other than
James.

All data from the employee table, where the jobCode is FTE and the last name is James.
Correct
This query will select all data from the employee table, where the jobCode is FTE and the last name is James.

4.
Question 4
You are working with a database table that contains data about music artists. The table is named artist. You
want to review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the artist table.

1 SELECT* 2 FROM artist


RunReset
+-----------+---------------------------------+

| artist_id | name |

+-----------+---------------------------------+

| 1 | AC/DC |

| 2 | Accept |

| 3 | Aerosmith |

| 4 | Alanis Morissette |

| 5 | Alice In Chains |

| 6 | Antônio Carlos Jobim |

| 7 | Apocalyptica |

| 8 | Audioslave |

| 9 | BackBeat |

| 10 | Billy Cobham |

| 11 | Black Label Society |

| 12 | Black Sabbath |

| 13 | Body Count |

| 14 | Bruce Dickinson |

| 15 | Buddy Guy |

| 16 | Caetano Veloso |

| 17 | Chico Buarque |

| 18 | Chico Science & Nação Zumbi |

| 19 | Cidade Negra |

| 20 | Cláudio Zoli |

| 21 | Various Artists |
| 22 | Led Zeppelin |

| 23 | Frank Zappa & Captain Beefheart |

| 24 | Marcos Valle |

| 25 | Milton Nascimento & Bebeto |

+-----------+---------------------------------+

(Output limit exceeded, 25 of 275 total rows shown)

How many columns are in the artist table?


1 / 1 point

2 9 5 8
Correct
The clause FROM artist will retrieve the data from the artist table. The complete query is SELECT * FROM
artist. The FROM clause specifies which database table to select data from. There are two columns in the artist
table.

5.
Question 5
You are working with a database table that contains data about music albums. You are only interested in data
related to the album with ID number 277. The album IDs are listed in the album_id column from the album
table.You write the SQL query below. Add a WHERE clause that will return only data about the album with ID
number 277.

1 SELECT* 2 FROM album 3 WHERE album_id=277 4 5 6

RunReset
+----------+---------------------------+-----------+

| album_id | title | artist_id |

+----------+---------------------------+-----------+

| 277 | Bach: Goldberg Variations | 211 |

+----------+---------------------------+-----------+

What is the name of the album with ID number 277?


1 / 1 point

Vivaldi: The Four Seasons Beethoven: Piano Sonatas Bach: Goldberg Variations Mozart:
Chamber Music
Correct
The clause WHERE album_id = 277 will return only data about the album with ID number 277. The
complete query is SELECT * FROM album WHERE album_id = 277. The WHERE clause filters results
that meet certain conditions. The WHERE clause includes the name of the column, an equals sign, and the
value(s) in the column to include. The name of the album with ID number 277 is Bach: Goldberg Variations.

Planning a data visualization


Earlier, you learned that data visualization is the graphical representation of information. As a data
analyst, you will want to create visualizations that make your data easy to understand and interesting to
look at. Because of the importance of data visualization, most data analytics tools (such as spreadsheets
and databases) have a built-in visualization component while others (such as Tableau) specialize in
visualization as their primary value-add. In this reading, you will explore the steps involved in the data
visualization process and a few of the most common data visualization tools available.

Steps to plan a data visualization


Let’s go through an example of a real-life situation where a data analyst might need to create a data
visualization to share with stakeholders. Imagine you’re a data analyst for a clothing distributor. The
company helps small clothing stores manage their inventory, and sales are booming. One day, you learn
that your company is getting ready to make a major update to its website. To guide decisions for the
website update, you’re asked to analyze data from the existing website and sales records. Let’s go through
the steps you might follow.

Step 1: Explore the data for patterns


First, you ask your manager or the data owner for access to the current sales records and website analytics
reports. This includes information about how customers behave on the company’s existing website, basic
information about who visited, who bought from the company, and how much they bought.

While reviewing the data you notice a pattern among those who visit the company’s website most
frequently: geography and larger amounts spent on purchases. With further analysis, this information
might explain why sales are so strong right now in the northeast—and help your company find ways to
make them even stronger through the new website.

Step 2: Plan your visuals


Next it is time to refine the data and present the results of your analysis. Right now, you have a lot of data
spread across several different tables, which isn’t an ideal way to share your results with management and
the marketing team. You will want to create a data visualization that explains your findings quickly and
effectively to your target audience. Since you know your audience is sales oriented, you already know that
the data visualization you use should:

• Show sales numbers over time


• Connect sales to location
• Show the relationship between sales and website use
• Show which customers fuel growth

Step 3: Create your visuals


Now that you have decided what kind of information and insights you want to display, it is time to start
creating the actual visualizations. Keep in mind that creating the right visualization for a presentation or to
share with stakeholders is a process. It involves trying different visualization formats and making
adjustments until you get what you are looking for. In this case, a mix of different visuals will best
communicate your findings and turn your analysis into the most compelling story for stakeholders. So, you
can use the built-in chart capabilities in your spreadsheets to organize the data and create your visuals.

1) line charts can track sales over time 2) maps can


connect sales to locations 3) donut charts can show customer segments 4) bar charts can compare
total visitors that make a purchase

Build your data visualization toolkit


There are many different tools you can use for data visualization.

• You can use the visualizations tools in your spreadsheet to create simple visualizations such as line
and bar charts.
• You can use more advanced tools such as Tableau that allow you to integrate data into dashboard-
style visualizations.
• If you’re working with the programming language R you can use the visualization tools in RStudio.
Your choice of visualization will be driven by a variety of drivers including the size of your data, the process
you used for analyzing your data (spreadsheet, or databases/queries, or programming languages). For
now, just consider the basics.

Spreadsheets (Microsoft Excel or Google Sheets)


In our example, the built-in charts and graphs in spreadsheets made the process of creating visuals quick
and easy. Spreadsheets are great for creating simple visualizations like bar graphs and pie charts, and even
provide some advanced visualizations like maps, and waterfall and funnel diagrams (shown in the
following figures).

But sometimes you need a more powerful tool to truly bring your data to life. Tableau and RStudio are two
examples of widely used platforms that can help you plan, create, and present effective and compelling
data visualizations.

Visualization software (Tableau)


Tableau is a popular data visualization tool that lets you pull data from nearly any system and turn it into
compelling visuals or actionable insights. The platform offers built-in visual best practices, which makes
analyzing and sharing data fast, easy, and (most importantly) useful. Tableau works well with a wide
variety of data and includes an interactive dashboard that lets you and your stakeholders click to explore
the data interactively.

You can start exploring Tableau from the How-to Video resources. Tableau Public is free, easy to use, and
full of helpful information. The Resources page is a one-stop-shop for how-to videos, examples, and
datasets for you to practice with. To explore what other data analysts are sharing on Tableau, visit the Viz
of the Day page where you will find beautiful visuals ranging from the Hunt for (Habitable) Planets to Who’s
Talking in Popular Films.

Programming language (R with RStudio)


A lot of data analysts work with a programming language called R. Most people who work with R end up
also using RStudio, an integrated developer environment (IDE), for their data visualization needs. As with
Tableau, you can create dashboard-style data visualizations using RStudio.
Check out their website to learn more about RStudio.

You could easily spend days exploring all the resources provided at RStudio.com, but the RStudio
Cheatsheets and the RStudio Visualize Data Primer are great places to start. When you have more time,
check out the webinars and videos which offer advice and helpful perspectives for both beginners and
advanced users.

Key takeaway
The best data analysts use lots of different tools and methods to visualize and share their data. As you
continue learning more about data visualization throughout this course, be sure to stay curious, research
different options, and continuously test new programs and platforms to help you make the most of your
data.

1.
Question 1
In the following spreadsheet, the column labels in row 1 are called what?

- A B C D

1 Rank Name Population County

2 1 Charlotte 885,708 Mecklenburg

3 2 Raleigh 474,069 Wake (seat), Durham

4 3 Greensboro 296,710 Guilford

5 4 Durham 278,993 Durham (seat), Wake, Orange

6 5 Winston-Salem 247,945 Forsyth

7 6 Fayetteville 211,657 Cumberland

8 7 Cary 170,282 Wake, Chatham

9 8 Wilmington 123,784 New Hanover

10 9 High Point 112,791 Guilford, Randolph, Davidson, Forsyth

11 10 Concord 96,341 Cabarrus

1 / 1 point

Criteria Characteristics Descriptors Attributes


Correct

2.
Question 2
Fill in the blank: In the following spreadsheet, the ________ of High Point describes all of the data in row
10.

- A B C D

1 Rank Name Population County

2 1 Charlotte 885,708 Mecklenburg

3 2 Raleigh 474,069 Wake (seat), Durham

4 3 Greensboro 296,710 Guilford

5 4 Durham 278,993 Durham (seat), Wake, Orange

6 5 Winston-Salem 247,945 Forsyth

7 6 Fayetteville 211,657 Cumberland

8 7 Cary 170,282 Wake, Chatham

9 8 Wilmington 123,784 New Hanover

10 9 High Point 112,791 Guilford, Randolph, Davidson, Forsyth

11 10 Concord 96,341 Cabarrus

1 / 1 point

criteria observation dataset format


Correct

3.
Question 3
A data analyst wants to list the cities in this spreadsheet alphabetically, instead of numerically. They can
use the feature sort range to do this.

- A B C D

1 Rank Name Population County

2 1 Charlotte 885,708 Mecklenburg

Wake (seat),
3 2 Raleigh 474,069
Durham

4 3 Greensboro 296,710 Guilford

Durham (seat),
5 4 Durham 278,993
Wake, Orange

6 5 Winston-Salem 247,945 Forsyth

7 6 Fayetteville 211,657 Cumberland

Wake,
8 7 Cary 170,282
Chatham

9 8 Wilmington 123,784 New Hanover

Guilford,
Randolph,
10 9 High Point 112,791
Davidson,
Forsyth

11 10 Concord 96,341 Cabarrus


1 / 1 point

True False
Correct
4.
Question 4
To find the average population of the cities in this spreadsheet, what is the correct AVERAGE function
syntax? Type your answer below.

- A B C D
1 Rank Name Population County
2 1 Charlotte 885,708 Mecklenburg
3 2 Raleigh 474,069 Wake (seat), Durham

4 3 Greensboro 296,710 Guilford


5 4 Durham 278,993 Durham (seat), Wake, Orange
Winston-
6 5 247,945 Forsyth
Salem
7 6 Fayetteville 211,657 Cumberland

8 7 Cary 170,282 Wake, Chatham

9 8 Wilmington 123,784 New Hanover

10 9 High Point 112,791 Guilford, Randolph, Davidson, Forsyth


11 10 Concord 96,341 Cabarrus
1 / 1 point

AVERAGE(C2:C11) =AVERAGE(C2-C11) AVERAGE(C2-C11) =AVERAGE(C2:C11)


Correct

5.
Question 5
You are working with a database table named genre that contains data about music genres. You want to
review all the columns in the table.

You write the SQL query below. Add a FROM clause that will retrieve the data from the genre table.

1 SELECT*
2 FROM genre

RunReset
+----------+--------------------+

| genre_id | name |

+----------+--------------------+

| 1 | Rock |

| 2 | Jazz |

| 3 | Metal |

| 4 | Alternative & Punk |

| 5 | Rock And Roll |

| 6 | Blues |

| 7 | Latin |

| 8 | Reggae |

| 9 | Pop |

| 10 | Soundtrack |

| 11 | Bossa Nova |

| 12 | Easy Listening |

| 13 | Heavy Metal |

| 14 | R&B/Soul |

| 15 | Electronica/Dance |

| 16 | World |

| 17 | Hip Hop/Rap |

| 18 | Science Fiction |

| 19 | TV Shows |

| 20 | Sci Fi & Fantasy |

| 21 | Drama |

| 22 | Comedy |

| 23 | Alternative |

| 24 | Classical |

| 25 | Opera |

+----------+--------------------+

What is the name of the genre with ID number 3?


1 / 1 point
Jazz Metal Blues Rock
Correct
The clause FROM genre will retrieve the data from the genre table. The complete query is SELECT *
FROM genre. The FROM clause specifies which database table to query. The name of the genre with ID
number 3 is Metal.

6.
Question 6
You are working with a database table that contains invoice data. The customer_id column lists the ID
number for each customer. You are interested in invoice data for the customer with ID number 7.

You write the SQL query below. Add a WHERE clause that will return only data about the customer with ID
number 7.

1 SELECT
2*
3 FROM
4 invoice where customer_id=7
5

+------------+-------------+---------------------+--------------------------------------+--------
------+---------------+-----------------+---------------------+-------+

| invoice_id | customer_id | invoice_date | billing_address | billing


_city | billing_state | billing_country | billing_postal_code | total |

+------------+-------------+---------------------+--------------------------------------+--------
------+---------------+-----------------+---------------------+-------+

| 78 | 7 | 2009-12-08 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 1.98 |

| 89 | 7 | 2010-01-18 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 18.86 |

| 144 | 7 | 2010-09-18 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 8.91 |

| 273 | 7 | 2012-04-24 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 1.98 |

| 296 | 7 | 2012-07-27 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 3.96 |

| 318 | 7 | 2012-10-29 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 5.94 |

| 370 | 7 | 2013-06-19 00:00:00 | Rotenturmstraße 4, 1010 Innere Stadt | Vienne


| None | Austria | 1010 | 0.99 |

+------------+-------------+---------------------+--------------------------------------+--------
------+---------------+-----------------+---------------------+-------+

After you run your query, use the slider to view all the data presented.

What is the billing country for the customer with ID number 7?


1 / 1 point
France Austria Poland Brazil
Correct

7.
Question 7
Which of the following best describes a bar chart?
1 / 1 point

It is a visualization that uses a circle which is divided into wedges sized based on numerical
proportion.
It is a visualization that represents data with columns, or bars, the heights of which are proportional to
the values that they represent.

It is a visualization that plots a sequence of points and connects them with them with straight lines or
curves.

It is a visualization that plots individual points in the Cartesian coordinate plane.


Correct

8.
Question 8
A data analyst has to demonstrate how the population in Charlotte has increased over time. They create
the chart below. What is this type of chart called?

1 / 1 point
Area chart Column chart Bar chart Line chart
Correct
Data analyst roles and job descriptions
As technology continues to advance, being able to collect and analyze the data from that new technology
has become a huge competitive advantage for a lot of businesses. Everything from websites to social
media feeds are filled with fascinating data that, when analyzed and used correctly, can help inform
business decisions. A company’s ability to thrive now often depends on how well it can leverage data,
apply analytics, and implement new technologies.

This is why skilled data analysts are some of the most sought-after professionals in the world. A study
conducted by IBM estimates that there are over 380,000 job openings in the Data Analytics field in the
United States*. Because the demand is so strong, you’ll be able to find job opportunities in virtually any
industry. Do a quick search on any major job site and you’ll notice that every type of business from zoos, to
health clinics, to banks are seeking talented data professionals. Even if the job title doesn’t use the exact
term “data analyst,” the job description for most roles involving data analysis will likely include a lot of the
skills and qualifications you’ll gain by the end of this program. In this reading, we’ll explore some of the
data analyst-related roles you might find in different companies and industries.

* Burning Glass data, Feb 1, 2021 - Jan 31, 2022, US

Decoding the job description


The data analyst role is one of many job titles that contain the word “analyst.”

To name a few others that sound similar but may not be the same role:

• Business analyst — analyzes data to help businesses improve processes, products, or services
• Data analytics consultant — analyzes the systems and models for using data
• Data engineer — prepares and integrates data from different sources for analytical use
• Data scientist — uses expert skills in technology and social science to find trends through data
analysis
• Data specialist — organizes or converts data for use in databases or software systems
• Operations analyst — analyzes data to assess the performance of business operations and
workflows
Data analysts, data scientists, and data specialists sound very similar but focus on different tasks. As you
start to browse job listings online, you might notice that companies’ job descriptions seem to combine
these roles or look for candidates who may have overlapping skills. The fact that companies often blur the
lines between them means that you should take special care when reading the job descriptions and the
skills required.

The table below illustrates some of the overlap and distinctions between them:
Title: Decoding the job description data analysts: -problem solving: Use existing tools and methods
to solve problems with existing types of data -analysis: Analyze collected data to help stakeholders
make better decisions -other relevant skills: database queries, data visualization, dashboards,
reports and spreadsheets data scientists: -problem solving: Invent new tools and models, ask open-
ended questions, and collect new types of data -analysis: Analyze and interpret complex data to
make business predictions -other relevant skills: advanced statistics, machine learning, deep
learning, data optimization, and programming data specialists: -problem solving: Use in-depth
knowledge of databases as a tool to solve problems and manage data -analysis: Organize large
volumes of data for use in data analytics or business operations -other relevant skills: data
manipulation, information security, data models, scalability of data, and disaster recovery
We used the role of data specialist as one example of many specializations within data analytics, but you
don’t have to become a data specialist! Specializations can take a number of different turns. For example,
you could specialize in developing data visualizations and likewise go very deep into that area.

Job specializations by industry


We learned that the data specialist role concentrates on in-depth knowledge of databases. In similar
fashion, other specialist roles for data analysts can focus on in-depth knowledge of specific industries. For
example, in a job as a business analyst you might wear some different hats than in a more general position
as a data analyst. As a business analyst, you would likely collaborate with managers, share your data
findings, and maybe explain how a small change in the company’s project management system could save
the company 3% each quarter. Although you would still be working with data all the time, you would focus
on using the data to improve business operations, efficiencies, or the bottom line.

Other industry-specific specialist positions that you might come across in your data analyst job search
include:

• Marketing analyst — analyzes market conditions to assess the potential sales of products and
services
• HR/payroll analyst — analyzes payroll data for inefficiencies and errors
• Financial analyst — analyzes financial status by collecting, monitoring, and reviewing data
• Risk analyst — analyzes financial documents, economic conditions, and client data to help
companies determine the level of risk involved in making a particular business decision
• Healthcare analyst — analyzes medical data to improve the business aspect of hospitals and
medical facilities

Key takeaway
Explore data analyst job descriptions and industry-specific analyst roles. You will start to get a better sense
of the different data analyst jobs out there and which types of roles you’re most interested to go after.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy