Capstone Project Final Full Report
Capstone Project Final Full Report
Capstone Project Final Full Report
Robert Joseph
This report discusses the use of Foursquare Api to build upon the
comparison between Paris and London and the correlation whether
someone should start a buisness in either of the two cities. Various
machine learning algorithms were used to compare and differentiate
the two cities as well as other geolocation tools to check either of
the city is good to start an Artificial Intelligence Business.
1 - Introduction
The final course of the Data Science Professional Certificate consist
of a capstone project where in all the skills and relevant knowledge
that one has gathered from this 9 intense courses has to be applied
on a final capstone project.
The final problem as well as the analysis is the left for the reader to
explore and decide. The idea uses location data with the help of the
foursquare api that can be leveraged into coming up with a problem
that the foursquare location data to solve it or just in contrast to
compare cities or neighbourhoods of ones own choice.
London is the capital and largest city of England and the United
Kingdom.Standing on the River Thames in the south-east of Eng-
land, at the head of its 50-mile (80 km) estuary leading to the North
Sea, London has been a major settlement for two millennia.
Paris is the capital and most populous city of France, with an es-
timated population of 2,150,271 residents as of 2020, in an area of
105 square kilometres (41 square miles).Since the 17th century, Paris
has been one of Europe’s major centres of finance, diplomacy, com-
merce, fashion, science and arts.
The main Goal of this project that I have chosen would be to eval-
uate the comparison between Paris and London as well as point out
the differences. Another factor to be included is which city would
be more ideal to start an Artificial Intelligence company and the
various factors correlating to it as both cities are major cities and
global hotspots in the world for tech companies.
1
Target Audience
2
2 - Business Problem
In this ever changing world of technology and reforms the use of
AI will dominate and change most of the world and industries as we
know so among the two busiest cities in the world which one would
a person be willing to start a business in AI. Various factors would
be included such as pricing, multiculturism, language barriers and
so on would influence this decision.
3
Figure 1: Paris Geolocation Dataset
3 - Data
Various data sets were collected, reformatted and analysed in
order to get the required results. Some of them include
http://www.cgedd.developpement-durable.gouv.fr/house-prices-
in-france-property-price-index-french-a1117.html - House Prices
in France
https://www.kaggle.com/alphaepsilon/housing-prices-dataset -
Housing Dataset
https://data.world/datasets/real-estate - Numerous Datasets
for different categories
https://data.london.gov.uk/dataset?tag=start-ups - Data sets
for london
https://www.kaggle.com/tags/companies - Various companies
and their datasets
More datasets were included and merged to get the final dataset re-
lating to the idea.The use of even foursquare datasets were used and
important features such as Housing Prices, Locality, Famous icons,
Restaurant Prices, transportation facilities, technological hotspots
as well as access to a high wifi speed and so on were all assessed.
London
Almost half a million lines of records were present for the London
dataset and had to sampled such that only 160 rows were extracted
4
Figure 2: London Geolocation Dataset
Paris
Similar to the London Dataset there were about 30,000 rows out
of which only a sample was taken by using the same python code as
state above.
The paris dataset had no columns which needed to be dropped and
so was retained in its original state.
Artificial Intelligence
5
import requests
import pandas as pd
url = ’https://golden.com/list-of-artificial-intelligence-companies/’
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv(’my data.csv’)
6
4 - Methodology
An in-depth research of the dataset has been done and a thorough
analysis of the various features and methods have been investigated
to ensure the maximum accuracy of the model as possible.
GeoLocation
geolocator = Nominatim(user_agent="uk_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(’The geographical coordinates of London are {},
{}.’.format(latitude, longitude))
Folium
7
Figure 3: Map of Paris with Markers
# Python Code
# create map of Paris using latitude and longitude values
map_paris = folium.Map(location = [latitude, longitude], zoom_start = 11)
map_paris # show the map of paris with markers from the dataset
8
Figure 4: Map of London with Markers
Foursquare API
url =
’https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius=
CLIENT_ID,
CLIENT_SECRET,
VERSION,
neighborhood_latitude,
neighborhood_longitude,
radius,
LIMIT)
results = requests.get(url).json()
9
categories_list = row[’categories’]
except:
categories_list = row[’venue.categories’]
if len(categories_list) == 0:
return None
else:
return categories_list[0][’name’]
venues = results[’response’][’groups’][0][’items’]
# filter columns
filtered_columns = [’venue.name’, ’venue.categories’,
’venue.location.lat’, ’venue.location.lng’]
nearby_venues = nearby_venues.loc[:, filtered_columns]
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in
nearby_venues.columns]
nearby_venues.head()
paris_onehot.head()
10
Figure 5: Top 10 Most visited Venues for London
The top 10 most visited venues were extracted from each neigh-
bourhood and then merged together to form another dataset.
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[1:]
row_categories_sorted = row_categories.sort_values(ascending = False)
return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 10
11
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted[’Neighborhood’] =
paris_grouped[’Neighborhood’]
neighborhoods_venues_sorted.head()
K- Means Clustering
After the venues were put into a dataframe, The K- Means Cluster-
ing Machine Learning Algorithm was used to train the data and get
the desired clusters. The first task was finding the optimal K and
as there were two different datasets to explore
For Paris the optimal K found out to be was - 5
For London the optimal K found out to be was - 6
After finding the Optimal K the data was trained using KMeans
# set number of clusters
kclusters = int(len(rlondon["District"].unique()) / 4)
london_grouped_clustering = london_grouped.drop(’Neighborhood’, 1)
12
Figure 7: Map of Paris Clustered
13
Figure 9: Countries with the maximum number of Artificial Intelligence Com-
panies/Startups
Artificial Intelligence
14
5 - Results and Discussion
Finally we have reached the main part of our report. Let us break
this down into two parts
15
Differences
– While looking at the maps one can observe that Paris is
more compact and one can walk around much more freely
without the use of transport
– London on the other hand requires the use of transport as
its much larger on the scale.
– In terms of population density Paris definitely outweighs
London by a ratio of 4:1.
– By a recent comparison and taking a look of the most vis-
ited venues Paris definitely has a higher number of restau-
rants of a ratio of almost 3:1 and according to studies
restaurants in Paris have earned higher Michelin Stars than
London’s.
– In terms of Leisure and entertainment London definitely
has more spots than Paris. A simple example would be
that London has more museums than Paris in a ratio of
8:5.
– Paris definitely hosts three of the top 10 most visited at-
traction sites while London has none.
– London definitely has more people from abroad.
– London has a lower temperature than Paris on average.
Artificial Intelligence
16
Figure 10: Table showing venture capital funding into AI companies across
major global cities from Jan 2013- August 2018
17
Discussion
18
6 - Conclusion
After an indepth review of the comparison between London and
Paris and which city would be a better place to start an Artificial
Intelligence Company or invest multiple conclusions can be drawn.
One of them being that both cities are diverse in their own ways
and boast a culture unlike no other.
Artificial Intelligence is a booming topic and recently more people
have started investing into it as well as companies automating their
processes.
Both cities offer a wide range of opportunities for anyone starting to
invest in Artificial Intelligence or even start a company and various
factors were shown.
Finally a better model could be made by various other methods and
much stronger Machine Learning Algorithms like KD Tree which
have a much faster run time algorithm of O(N log(N )) vs KNN
O(N 2 ).
Furthermore, clustering however did help us to highlight the most
optimal venues and areas.
Finally correlation does not imply causation and so any result here is
subject to change on various other trends and opinions and datasets.
19
7 - Acknowledgements
I sincerely thank all the course instructors who have taken their
time and effort into making this Proffesional Certificate worth the
effort.I also want to state that these are my opinions and are subject
to change as well as I am grateful for all resources and knowledge
that I have learnt throughout this course. I also want to thank God,
my Family as well as friends who have made this a reality and for
supporting me throughout.Thank you to all the peer reviewers that
have graded my projects.
20