0% found this document useful (0 votes)
24 views

Hacking-Airbnb’s-search-rank-algorithm

The document analyzes factors influencing search rankings on Airbnb for rental properties in Cape Town, focusing on listings that accommodate six or more guests. It details a methodology that includes web scraping to gather data, which is then analyzed for correlations with search rank, revealing that guest satisfaction, pricing, and listing activity significantly impact visibility. The findings aim to assist Airbnb hosts in optimizing their listings for better search results and increased bookings.

Uploaded by

rn.nyrgs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Hacking-Airbnb’s-search-rank-algorithm

The document analyzes factors influencing search rankings on Airbnb for rental properties in Cape Town, focusing on listings that accommodate six or more guests. It details a methodology that includes web scraping to gather data, which is then analyzed for correlations with search rank, revealing that guest satisfaction, pricing, and listing activity significantly impact visibility. The findings aim to assist Airbnb hosts in optimizing their listings for better search results and increased bookings.

Uploaded by

rn.nyrgs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

TABLE OF CONTENTS

Foreword....................................................................................................................................................... 2
Introduction .................................................................................................................................................. 2
Methodology................................................................................................................................................. 2
Software used: .............................................................................................................................................. 2
Getting the data ............................................................................................................................................ 3
Getting an initial data set.......................................................................................................................... 3
Diving Deeper............................................................................................................................................ 4
Put it all together ...................................................................................................................................... 6
Results ........................................................................................................................................................... 8
Data Validation ......................................................................................................................................... 8
Describing the results ............................................................................................................................... 9
A word about Correlation vs Causation .................................................................................................... 9
The factors most correlated to page rank .............................................................................................. 10
Airbnb’s changing stance on Instant Book .............................................................................................. 12
The correlation of all the factors tested ................................................................................................. 13
The correlation of amenities to search rank ........................................................................................... 14
Conclusions ................................................................................................................................................. 15
About the Author ........................................................................................................................................ 15
Appendix A .................................................................................................................................................. 16
Appendix B .................................................................................................................................................. 17
Hacking Airbnb’s search rank algorithm
FOREWORD
1. The work I have done usually involves clients’ proprietary information that I cannot share nor
publish. Since this project involved sourcing data on my own and doing analysis for my own
benefit I am happy to share this, please feel free to share/use this information as you like.
2. Part of the process in collecting this data required scraping publicly available data on a massive
scale using an automated process. This is an ethical grey-area and steps were taken to ensure
that I was not DDoSing the servers where I was scraping information from. More on this in
Appendix B

INTRODUCTION
In my part-time I manage a portfolio of rental properties in Cape Town and host them on
www.airbnb.com, I would like to use data to assist with the marketing of the properties. The project
aims to uncover the factors that influence search results on the Airbnb site. SEO is an area that has seen
much focus with the rise of search engines and this project aims to answer what drives search results on
Airbnb. Hopefully we can answer what an Airbnb host can do to achieve a better rank, since as a host
you’d like your property to appear as close to the top as possible as this leads to more bookings and
more money.

The data used is scraped from the Airbnb site itself, specifically in Cape Town where the properties I
host are located. What follows is an analysis of the short-term rental market in Cape Town on
www.airbnb.com, specifically for rental properties that can accommodate 6 or more people (as this is
what the properties I manage offer).

METHODOLOGY
1. Use automated software to download/scrape data from www.airbnb.com
2. Organise the data into a database
3. Analyse what aspects of hosting correlate with listings ranking higher in search results
4. Identify what a host can do to rank higher in the search results

SOFTWARE USED:
Python 3.5.2
Scrapy 1.0.5 (Python Library)
import.io
Microsoft Excel & VBA
Qlikview – (SQL Load script)

2
GETTING THE DATA

GETTING AN INITIAL DATA SET


Airbnb’s database is not freely available and Airbnb does not provide an API that allows easy interface
with the database. However, we could get the data by going to each page and manually getting the
information, this is likely to take a very long time if we want to get a relevant sample size to analyse.

Instead of manually going to every page to get the data we can make use of automated software to do
some web scraping. This will automatically download data from their site to later process into a
database. To get the initial data for the listings I set up an Extractor on import.io to scan the search page
from www.airbnb.com and scrape the following data (highlighted in the picture):

Page number of the search results


Listing name
Listing price
Link to actual listing
Number of reviews
Number of beds

Figure 1: The data we want from the Airbnb search results page

Note that before scraping this information I had set the search criteria to only include listings that could
accommodate 6+ people. I also set the map to include pockets of greater Cape Town to limit the number
of search results. What import.io allows you to do is create your own API and then feed other URLs into
the API that will grab the same data for you from those pages.

The search result page only contains 18 properties, however, at the bottom of the page it shows there
are 17 pages of results. The import.io software allows you to add multiple URLs to the request when
running the Extractor. All we need to do is replicate the URL for page 1 and feed in the URLs for pages 2-
17. I did this by manipulating the string in Excel and creating the URLs for the 17 to pages of results.

3
Below is an example of a URL for page 1 (highlighted in red):
https://www.airbnb.com/s/cape-
town?guests=6&adults=6&children=0&infants=0&ss_id=ia2qhgz8&ss_preload=true&source=
bb&page=1&allow_override%5B%5D=&ne_lat=-
33.92949338148802&ne_lng=18.499001070258828&sw_lat=-
33.983999649640914&sw_lng=18.43651632904789&zoom=14&search_by_map=true&s_tag
=Bp9FFd3N

You’ll notice that the URL contains the filters we require for the search: 6 adults as well as the latitude
and longitude of the search area on the map.

To get the other 17 URLs I did some string manipulation in Excel that substituted the “1” for the
numbers 2-17:

Figure 2: The Excel formula used to insert the page numbers into the URLs

The formula:
=LEFT(B2,SEARCH("page=",B2)+4)&C2&RIGHT(B2,LEN(B2)-SEARCH("page=",B2)-5)

The above formula searches through the URL and takes everything left of “page=” adds the page
number (in column C) and then adds everything right of “page=”. We can then Autofill down 17 rows
and we have our 17 URLs to feed into import.io.

DIVING DEEPER
To get more in depth data a more intensive way of scraping the data was needed. There is some data
that isn’t obviously available on Airbnb’s site and is hidden away in the HTML and JSON on each listing
page. This time I used Python to run a web spider that scraped each listing’s data from the same points
on the map. To do this I setup a web scraping spider using the Python library Scrapy.

4
The method I used was adapted code from an incredibly helpful resource by Luca Verginer,
http://www.verginer.eu/blog/web-scraping-airbnb/. The key data I extracted from each listing can be
seen within the code to parse the listing data:

Figure 4: The parse function that scrapes the data from the listings

Figure 3: Parsing detailed data from each listing

This data that is parsed from the JSON array within the listing page provides a lot deeper insight into the
listing and more pieces of data to analyse.

Again, we had to run the spider across multiple areas to get all the listings within the suburbs of greater
Cape Town. The code is adapted to add a suffix to the URL to only get listings for 6 guests, as well as that
area on the map (the GPS co-ordinates). This code loops over all the pages that hold the results of the
search:

5
Figure 4: Creating the web scraping spider that would collect the detailed information by listing

PUT IT ALL TOGETHER


In a few minutes, I had several csv files each with around 300 listings from the import.io extractors. To
join up the files into one master file with all the listings I ran a VBA script that combines sheets to one
sheet (note this is an extract from the script, the part that loops through and adds the sheets together):

For Each ws In Worksheets


If ws.Name <> cs.Name Then
LR = ws.Range("A" & ws.Rows.Count).End(xlUp).Row
If NR = 1 Then
ws.Range("A1", ws.Cells(1, Columns.Count).End(xlToLeft)).Copy
If sName Then
cs.Range("B1").PasteSpecial xlPasteAll
Else
cs.Range("A1").PasteSpecial xlPasteAll
End If
NR = 2
End If
ws.Range("A2:BB" & LR).Copy
If sName Then 'paste and add sheet names if required
cs.Range("B" & NR).PasteSpecial xlPasteValuesAndNumberFormats
cs.Range("A"&NR,cs.Range("B"&cs.Rows.Count).End(xlUp).Offset(0,-))=ws.Name
Else
cs.Range("A" & NR).PasteSpecial xlPasteValuesAndNumberFormats
End If
NR = cs.Range("A" & cs.Rows.Count).End(xlUp).Row + 1
End If
Next ws

Figure 5: VBA Code to concatenate sheets in Excel, making one master file with the basic listing data

We now have a master file with around 3000 listings that can sleep more than 6 people in greater Cape
Town. Since we moved the map to capture certain areas we would have overlapped and included the
same listings in multiple search results. To remove duplicates Excel has a handy tool to remove
duplicates which now left us with around 2000 listings.

6
The more detailed Scrapy data was collected into a set of CSV files that had the same fields but in all
different orders. The data was loaded into Qlikview which allowed us to use the SQL functionality and
UNION the tables together building one big table of data, correctly ordering the fields automatically.

One of the fields is called Amenities which was a list of all the amenity codes a property had, by
separating the list into separate fields in Excel and creating a CrossTable in Qlikview we created a further
table with Amenities by property as well as their description. The descriptions of the 60 amenities came
from trawling through the HTML code in a very manual way unfortunately, at least we only had to do it
once!

The last 2 pieces of data were a list of Cape Town’s suburb names from Wikipedia, as well as a file that
contained a list of first names and whether they were male or female names. This came from a German
site but the data in the available zip file is all I needed to classify the genders of the Airbnb hosts.

We now had the 5 tables of data:


1. Top level “io Data” with the listing name, number of beds/guests the results page it came from
2. Deeper “Scrapy Data” specific to the host and the property including all the property’s amenities
3. A table of amenities by property which we could use to further analyse the data
4. Suburb Data to check whether listings that mentioned suburb in its name fared better
5. Gender Data to see whether Male or Females hosts fared better

After some cleaning and organising we have now created a database with this structure:

Figure 6: Database Structure – Room ID Primary Key

7
RESULTS

DATA VALIDATION
The data we have extracted was properly distributed across the 17 result pages as seen below. Since the
listing data were taken from moving a map position on the Airbnb search page we are likely to have
some overlap in results due to the map overlapping certain properties.

The data was still evenly distributed even after removing duplicate results, the drop off on page 17 is
due to certain map areas having fewer properties and therefore fewer than 17 result pages:

Duplicates Removed With Duplicates


Page # Properties Page # Properties
1 109 1 144
2 111 2 144
3 106 3 144
4 105 4 144
5 104 5 144
6 104 6 144
7 110 7 144
8 96 8 144
9 101 9 144
10 95 10 144
11 93 11 144
12 104 12 144
13 109 13 144
14 100 14 140
15 91 15 126
16 92 16 126
17 50 17 116
Grand Total 1680 Grand Total 2380

Figure 7: Evenly distributed results after removing duplicates

8
DESCRIBING THE RESULTS
When looking at results we are looking for correlation between search result page and the variable
being tested. We used the Average for each variable and a good way to interpret the table below is to
say: “The average Page 1 listing has a guest satisfaction score of 83.7%”. We will cover the results in
more detail later in the report but perhaps unsurprisingly the most important factor influencing search
rank is the Guest Satisfaction score that is calculated once a guest completes a review for a listing.

To interpret the results, we are looking for correlation (both positive and negative) with result page. As
seen in the table below, as page number increases, average guest satisfaction decreases.

Figure 8: How to interpret results graph

A WORD ABOUT CORRELATION VS CAUSATION


Some of the results are intuitive and make sense and some may be surprising. One of the factors highly
correlated to page rank is the number of words a listing has in its description. This may well be
something Airbnb uses in their ranking algorithm or it may be that hosts who have wordy, descriptive
listings are more conscientious with all aspects of hosting and therefore perform better. There is no way
of knowing exactly what the rank algorithm is comprised of but we can give a very good indication as to
what factors tend to result in higher ranking properties. We must be careful not to confuse correlation
with causation.

9
THE FACTORS MOST CORRELATED TO PAGE RANK
The below table shows the top 5 factors that are most correlated to page rank. We can clearly see the
trends in the graphs:

Figure 9: Factors 1-5 correlated to higher search ranks

Things to note:

Guest satisfaction score (from guest reviews) is understandably the most correlated factor.
Price: anecdotally from my experience, Airbnb has been recommending lower and lower prices
as suggested prices. Airbnb wants to offer the best deal to its users so lower prices mean a
better rank.
Word count: as described above, this may be a factor that Airbnb values or may be that wordy
descriptions are a characteristic of more conscientious hosts who score well elsewhere too.
Minimum stay length: perhaps shorter stays get more bookings and therefore score better in
other factors influencing rank but it seems the shorter a host’s minimum stay requirements are
the higher they rank.
Days since calendar updated: the more active a host is in updating the calendar the better the
rank of the property. Unsurprisingly Airbnb reward active hosts.

10
Below are the factors 6-10 that are most correlated to page rank. Note we are still looking at averages
for each factor ie: on average 62.4% of listings on page 1 are Instant Book listings

Figure 10: Factors 6-10 correlating to higher search ranks

Things to note:

Price/Bed: Since we have details on the number of beds we can figure out price/bed, again
Airbnb rewards cheaper listings.
Name Length: this field can only be 50 characters long but listings with more words (average 5)
seem to rank higher. Again, this may be due to other factors.
Is Instant Book: See below for a more detailed analysis but Instant Book listings perform better
Reviews: Having more reviews is correlated with ranking higher
Times Saved to Wishlist: One listing on page 5 was removed as an outlier from this set, it had
been saved to wish lists over 22 000 times and skewed the results. (It must have made it onto
Airbnb’s featured page or on some other site that gets major traffic.)

11
AIRBNB’S CHANGING STANCE ON INSTANT BOOK
Having a listing set to Instant Book (where a host allows potential guests to book without their approval)
is correlated with having a higher search result in this data set. This wasn’t always the case… about a
year ago, I did some similar research looking for correlation between Instant Book listings and search
rank. Below shows how the rank algorithm seems to have changed over the last year:

Figure 11: Airbnb rewarding Instant Book listings with higher ranks in 2017

Clearly there is no real correlation in the 2016 data (0.14 correlation coefficient) but in the 2017 data we
can see that listings that have Instant Book enabled tend to appear higher in the search results.

This may be part of Airbnb’s drive to compete with the hotel industry and their stance that hosts should
not discriminate against potential guests (by not accepting certain bookings as hosts without Instant
Book can do).

From Airbnb’s “Work to fight discrimination and Build Inclusion Report” – Sept 2016:
One Million Instant Book Listings
Instant Book allows certain listings to be booked immediately̶without prior host approval
of a specific guest. To achieve these goals, Airbnb will accelerate the use of Instant Book
with a goal of one million listings bookable via Instant Book by January 2017.
More importantly, Instant Book reduces the potential for bias because hosts automatically
accept guests who meet these objective custom settings they have put in place. Airbnb has
already worked to increase the number of Instant Book listings, which has more than doubled
in the past year.

12
THE CORRELATION OF ALL THE FACTORS TESTED
To quickly see which factors are most correlated with search rank we can look at the statistical
correlation instead of interpreting the graphs as we did above. Below is a table which shows the
correlation coefficients of each factor I tested:

Note: This table shows an absolute correlation coefficient from 0 to 1, 1 being most correlated. I
converted the inversely correlated factors (negatives) for easier interpretation.

Figure 12: Correlation to search rank

Appendix A has a more detailed description of what each of these factors means and how it was tested.

Things to note:

Being a SuperHost doesn’t seem to make as big a difference as one would think. It ranked 13th
most correlated to page rank
Airbnb hosting businesses and hosts with multiple listings aren’t correlated to higher search
results, nor are listings that are Business Ready
The ratio of male-to-female hosts didn’t correlate to search results, this was admittedly a long
shot! However, there are almost double the number of female hosts in this data set.
Smoking and pet friendly properties didn’t seem to negatively impact search results.
Having the suburb name or the base word “view” in the title didn’t correlate with search rank.
Age of a host’s account (how long they have used Airbnb) didn’t correlate with search rank.

13
THE CORRELATION OF AMENITIES TO SEARCH RANK
In the same way that we tested the factors that are correlated to search result rank we can also test the
correlation of amenities. Below is a table that ranks the correlation of amenities with search rank:

Note: This data set includes listings that can accommodate 6 or more people, likely houses not apartments.

Things to note:

Offering breakfast is not correlated to higher search results, it may be a factor for listings that
sleep 1 or 2 but not for those accommodating 6 or more.
Having a TV and having Cable/Satellite TV does not correlate to higher search ranks.
Business Ready required amenities are more highly correlated to higher search ranks.
More listings have wireless internet (93%) as an amenity than Internet (57.6%), I can’t explain
this. Perhaps some hosts don’t know that Wi-Fi isn’t possible without internet itself?

14
CONCLUSIONS
Based on this research analysing the correlation of host variables and search page rank there several
easy things a host can do to get a better search rank:

1. Keep your calendar updated


2. Ask guests to complete a review
3. Lower prices
4. Lower the minimum stay length
5. Enable Instant Book
6. Respond quickly to requests
7. Have the following amenities:
a. Iron
b. Desk
c. Hangers
d. Hair Dryer
e. Essentials
f. Internet – and list it as internet not only as wireless internet.

The results of this research may be skewed due to this data set only including listings that accommodate
6 or more people. It may also differ to results from other cities/countries. Further research could look to
explore the same research methodology on all listings regardless of how many people a listing can
accommodate as well in different cities/countries.

ABOUT THE AUTHOR


This research project was completed by Nicholas Child and any views and/or opinions are strictly his
own and do not represent those of Airbnb.com

childnick@gmail.com

15
APPENDIX A
Rank Description Details

1 Guest Satisfaction The score provided on each listing page


2 Absolute Price The price on each listing page
3 Listing Word Count The number of words in each listings description
4 Minimum Stay Length The minimum stay length defined by the host on each listing page
5 Days since calendar The date of last calendar update is stored on each listing page
updated
6 Price/Bed Price divided by number of beds advertised on each listing page
7 Description Length The number of words in each listings title
8 Is Instant Book Whether the listing has Instant Book enabled
9 Review Count Number of reviews for the listing taken from each listing
10 Saved to Wishlist Number of times a listing is saved to a potential guests wishlist taken
from the listing page
11 # of Amenities Count of each listings advertised amenities
12 # of Pictures The number of pictures each listing has
13 Is SuperHost Whether the host of a listing is a SuperHost
14 Response Speed The speed with which a host responds to requests from potential guests
15 Guest Capacity How many guests a listing can accommodate
16 Hosted by Business Whether the listing is hosted by a business (seen by looking at host
name/picture)
17 Is Business Ready Whether the listing has met the requirements Airbnb sets for Business
Ready listings
18 # of Other Properties Number of other properties the hosted by that listings host (within this
Hosted data set)
19 # of Beds Number of beds available at each listing
20 Cancellation Policy The cancellation policy of the listing, the higher the stricter
21 Male-to-Female Ratio The ratio of males-to-females found by assigning a gender to each host
based on first name
22 Has Pets on Property Whether the listing has pets on the property
23 Beds per Guest Number of beds divided by number of guests a listing can accommodate
(how many double beds)
24 Account Age The natural log (Ln) of the host's ID, these numbers vary massively and
chronological so a natural log allows for better analysis
25 Allows Smoking Whether the listing allows smoking on the premises
26 "suburb name" in Whether a listing uses a suburb name in its listing name
Description
27 "view" in Description Whether a listing uses the base word "view" in its listing name

16
APPENDIX B
Scrapy has built in settings that ensure you are guilty of DDoSing Airbnb servers while scraping the data:
http://doc.scrapy.org/en/latest/topics/autothrottle.html

My settings:

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy