Skip to content

lawrencefchan/rea_scraping

Repository files navigation

rea_scraping

This package contains scripts used to scrape realestate data from:

  • realestate.com.au
  • Allhomes
  • Domain (to be implemented)

realestate_com

scrape_historical_prices

Uses sel_scrape to pull suburb-level historical data for houses/units. Includes:

  • Historical sales/rental data (written to prices_volumes):
    • median: 12 month median price trend for houses/units
    • volume: 12 month total sales/leases
  • Sales/rental summaries (written to current_snapshot):
    • daysOnSite: median time on market in the past 12 months
    • rentalYield: current rental yield (owned propreties only)
    • demand: no. buyers/renters interested in the past month
    • supply: no. dwellings available in the past month

Data is saved to ./data/historical_trends.db.

TODO

  • fix multiple print statements on write

recent_sales

Scrapes recent sales results for all states including clearance rates and number of homes sold etc.

Data is saved to ./data/recent_sales.db.

saved_properties

Todo

Allhomes

scrape_sales_prices.py

Scrapes historical sales prices from allhomes.com.au using requests and bs4. Separate results are saved for each suburb in the following json format:

results = {
    'suburb_name': {
        'street_name': {
            'property_name': {
                'transfers': [list of dicts],  # completed properties
                'type': str,  # dwelling type
                'bed': int,  # number of bedrooms
                'bath': int,  # number of bathrooms
                'garage': int,  # number of garages
            },
            'complete': bool  # street is completed
        },
        'complete': bool  # suburb is completed
    }
}

where transfers is a list of dictionaries, each with keys including (but not limited to?) Contract, Transfer, Listed, Days on market, Block size, Transfer type, Purpose, Price, corresponding to a sale.

munge_results.py

Various helper functions for allhomes results.

Resources

https://digitalfinanceanalytics.com/blog/mortgage-stress-grinds-higher-before-rate-rises/

https://www.finder.com.au/how-to-find-out-property-past-sales-history

https://www.allhomes.com.au/ah/research/quay-street-haymarket-nsw-2000/1933521212/sale-history

  • TODO: check faster way to scrape sales history??

Data sources

NSW postcode regions: https://www.training.nsw.gov.au/about_us/postcodes_byregion.html

NSW regions maps: https://www.training.nsw.gov.au/about_us/sts_contacts.html

Sydney postcode regions: https://docs.google.com/spreadsheets/d/1tHCxouhyM4edDvF60VG7nzs5QxID3ADwr3DGJh71qFg

Postcode mapping method: https://greenash.net.au/thoughts/2014/07/australian-lga-to-postcode-mappings-with-postgis-and-intersects/

Achive/old scripts/old notes

Uses selenium (download chrome driver 90.0.4430.24 here).
Note: internet connection is required to render mapbox map (house.html and unit.html).

TODOs

EDA - points of interest

  • check correlation between house and unit price changes
  • areas of highest growth (range of years)
  • areas of lowest growth (range of years)
  • areas of highest yield
  • measures of variance: z-score, r^2 against linear fit
  • discard series with low count values and lots mising data

Hypotheses:

  • low variance implies stable pricing, possibly low growth
  • house and unit prices should be complementary (both in terms of variance and growth). Deviation from this implies data skewed by outliers (e.g. enormous, 1-off land purchases)

Other:

Sources:

About

scraping historical dwelling price data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy