Analytics Assignment 2
Analytics Assignment 2
Analytics Assignment 2
Contents
1 Introduction.........................................................................................................................3
4.1 First visualization (total number of bedrooms sold per ZIP code)..............................4
4.2 Second visualization (price of house sold vs. house year built)..................................5
5 Visualization dashboard......................................................................................................8
2
1 Introduction
The assignment is a case study for the housing unit sales in King County, US, between the
year 2014-2015. The information has been obtained from Kaggle and then visualizations
have been created with Tableau 22.0 as well as their managerial interpretations have been
covered. The paper also provides a brief account of the recent development in visualizations,
which is data democratization given its relevance to the housing data obtained from Kaggle.
Some insights have also been included for the relevance of the visualization concepts towards
The relevant concept in this case that is applicable to our visualizations and datasets is that of
data democratization. In many cases, the information that is used but certain parts may be
difficult for everyone to understand. Research by (Flaherty, Sturm, & Farries, 2022) suggests
that one of the challenges associated with data visualizations is the relatively limited
understanding of individual employees due to them working in silos, a problem that can be
For instance, in the chosen dataset, which is the house sales in King County, US, can help us
uncover insights about the housing and market in the said county of the United States.
However, some aspects of the information may not be easily understood unless someone has
specific knowledge for that, such as some of the real estate agents in the country, for that
matter.
So, data democratization refers to the concept of enabling everyone in the company being
comfortable with the data, so that they can work with it or at least interact with it at least –
regardless of their technical knowledge about data analysis or the lack thereof. Therefore, in
our present dataset, data democratization is important, because some details related to our
3
housing sales in the county can be better understood by individuals who were involved in the
transaction.
As per (Sil, Sharma, Jhamb, Marathe, & Sharma, 2021), data democratization in multiple
contexts can help us achieve better results towards getting or gaining a better understanding
of the data, including the ways it is visualized or interpreted as well as the insights it
provides.
Relevant to the dataset chosen, which is the housing sales in the King County, the student has
elected to create four data visualizations with Tableau, including the (1) total number of
bedrooms sold per ZIP code, (2) price of house sold vs. year of house built, (3) sum price of
house sold by ZIP code, and (4) median price of house sold by ZIP code. Once the four
visualizations have been created, a dashboard has been created that has been included later in
the section. The next section (3b) interprets the importance of each of the visualizations from
the managerial perspective and how they could be helpful in strategic decision-making in the
4.1 First visualization (total number of bedrooms sold per ZIP code)
The first visualization was focused on the number of bedrooms sold per ZIP code. Therefore, the
student has constructed a heatmap to show which localities have purchased the greatest number of
bedrooms and by extension, houses. Therefore, the heatmap visualization has been shown below. The
‘SUM (Bedrooms)’ – the total number of bedrooms sold, since this is a SUM function of the
4
Zip code – the respective ZIP code representing the locality in the county
The greater number of bedrooms sold in a ZIP code, the darker shade of the blue the box
appears in the heatmap. There is another visualization that later talks about the total price of
the houses sold per ZIP code, but the quantity or number of houses sold may not be the same
as the total price of the houses; some neighbourhoods may have higher sales volume of
housing units, but some others may have higher sales revenue figures of housing units.
4.2 Second visualization (price of house sold vs. house year built)
The second visualization was a comparison of two variables, namely ‘Yr_built’ and
Yr_built – the year in which the house was initially built (not renovated)
SUM (price) – the price of each house sold; the SUM function indicates that it is the
5
The illustration is a line graph with a trend to show how the year of construction affects the
house prices or what customers are willing to pay for them. The idea of this graph is to better
understand whether there is any trend or not about how the houses are priced on the basis of
when or in which year they have been constructed in the first place.
Third visualization is about the median price of house sold classified by the ZIP code. The three
variables included here were ‘MEDIAN (Price)’, ‘SUM (Price)’, and Zip code. The definitions for the
SUM (Price) – the sum of all the houses sold in a ZIP code (in this visualization)
Zip code – the ZIP code for the locality in the King County
Since the house sales would bring profits for the real estate companies in the county, the
visualization has been coded in green automatic colour scheme/palette. The higher the sum
price of the house sold in the ZIP code or locality, the greener the said box in the heatmap is.
6
For instance, the 98004 ZIP code is greener than the 98039 ZIP code, because the price or
sum price for 98004 is $429 million approximately, which is greater than that of 98039,
The fourth and final visualization here considers two variables here once more; including
‘Zip code’ and ‘MEDIAN (Price)’; the definitions here have been mentioned as shown below
Zip code – which shows the zip code for the locality in the King County
MEDIAN (Price) – this shows the median price of the house sold in the said locality
or ZIP code
The difference between the fourth and third visualization is that the third visualization is a
heatmap that visualizes the most profitable or ZIP codes or localities with the highest SUM
(Price), while the fourth visualization is focused on the median price of the houses sold per
7
The visualization has been shown below as noted.
5 Visualization dashboard
The visualization dashboard includes the four visualizations including the number of
bedrooms sold per ZIP code, the sum price of the houses sold per ZIP code, the median price
of houses sold by ZIP code, and finally, the price of house sold vs. the year of house build.
8
6 Part 3(b) – Interpretation of visualizations
The purpose of this section is to better explain and understand how and why each of the four
visualizations in the dashboard are important from the managerial perspective. In other
words, each of those figures has insights that could be likely used by the company to make
The first visualization that examines the number of bedrooms sold per ZIP code indicate that
ZIP code 98052, 98038, and 98006 are among the three localities with the highest sales
volume (in the terms of no. of bedrooms sold) with 2,076, 2,072, and 1,913 units
respectively. Therefore, it appears that the residents in those localities may be purchasing
more properties and hence, it could be possible to further support marketing campaigns to
target the prospective buyers in those areas. Alternatively, the zip codes 98148 and 98039 had
only 179 and 203 bedrooms sold. Therefore, it can be interpretated that those localities do not
9
have buyers for housing properties and thus, any further constructions should be halted or
The second visualization helps us understand how the year of housing construction affects the
price for which the house may be sold. With the help of the illustration, we can see that
although there is an upward trend, there may be certain cases wherein houses built in specific
years may be sold for very little. For instance, the houses built in 2014 were sold
cumulatively for $382 million, but houses built in 2015 were sold cumulatively for $28.87
million only. However, it is clear that the houses that were constructed in the 1990s were sold
for much less. The managerial takeaway point here is that the company should not try to
purchase older residences and properties with the intention of renovating them and selling
them, because of two reasons, including (1) the additional costs associated with renovation of
the older properties, and (2) the overall drop in the price due to the properties being aged or
The third visualization shows us the cumulative sales in houses sold per ZIP code. The
purpose of this visualization is to help us as managers better analyse and understand where
and how the total sales revenue from the housing unit sales were the highest. It is clearly
noticed that the zip codes 98004, 98006, and 98052 were among the highest grossing
neighbourhoods as the company was able to sell maximum properties there. Some connection
can be drawn here with the first visualization where it was seen that neighbourhoods 98006
and 98052 also accounted for some of the highest number of bedrooms sold. Therefore, it is a
further confirmation that those zip codes must be targeted by the company for further
property development because there are buyers who also have the capital and willingness to
purchase houses.
10
Finally, the fourth visualization shows us the median price of the houses sold per ZIP code.
This visualization here is created to help us better understand which localities have the
wealthiest customers. The median price appeared to be the highest at $1.892 million in the
zip code 98039, though that is an outlier statistically. Nonetheless, zip codes 98004 and
98040 too had median prices of $1.15 million and $993,750 respectively, indicating that
wealthier customers do live in those zip codes and hence, the company could consider
investing to construct luxury residential properties targeted for a relatively smaller market
segment with adequate capital to purchase those housing units. Alternatively, the median
price ranges from $235,000 in zip code 98002 to $915,000 in zip code 98112 that provides us
an idea of what the prices of housing properties in the county looks like and what most
11