Automate the Hunt for Your Dream Apartment: Harnessing the Power of Web Scraping

Matt Bommer
5 min readJun 1, 2023
Photo by Carlos Alfonso on Unsplash

In today’s fast-paced world, finding the perfect apartment can often feel like searching for a needle in an infinitesimally large haystack. With high demand and limited availability, timing is everything. But what if there was a way to automate the process, to have the latest information on apartment availability delivered right to your fingertips? That’s where web scraping comes in. In this article, we’ll explore how accessible web scraping can be for you and how it can revolutionize your apartment search, allowing you to stay one step ahead in the hunt for your dream home.

In my recent quest to find a new apartment in Austin, Texas, I discovered a solution that transformed my apartment search experience. I created a web scraping bot that informed me when my favorite units became available. In this post, I’ll share my journey and provide you with insights on how you can do the same.

Scraping the Web:

When people hear about web scraping, the first thing that comes to mind is usually extracting data directly from web pages using tools like Beautiful Soup. While this approach is viable and necessary for sites that use server-side rendering (SSR), there is a simpler way — observing the requests a website sends to its backend servers. By leveraging the developer tools in Chrome, I was able to discover all the necessary information about apartment availability on my favorite complex’s website within minutes.

Using the network section of Chrome’s developer tools, I sifted through the Fetch/XHR requests until I found a response that contained the relevant data (shown below). Armed with this knowledge, I could identify the endpoint to fetch unit details and determine the available information for each unit.

"units": [
{
"id": "1174955",
"unit_disclaimer_id": null,
"floor_id": "21173",
"floor_plan_id": "63835",
"status_id": null,
"unit_number": "1511",
"display_unit_number": "APT 1511",
"display_area": "980 sq. ft.",
"area": 980,
"display_full_price": null,
"display_price": "$3,223",
"price": 3223,
"show_online_leasing": true,
"specials_description": null,
"display_available_on": "Available Aug 4th",
"display_lease_term": "12 Months",
"available_on": "2023-08-04",
...
}
],
...
"floor_plans": [
{
"id": "63830",
"name": "1F",
"bedroom_count": 1,
"bathroom_count": 1,
"bedroom_label": "1 Bed",
"bathroom_label": "1 Bath",
"filter_label": "1F",
...
}
],
...

Transforming Data into Action:

With neatly organized data, it doesn’t take much code to transform it into a useful format. In just 69 lines of code, I was able to develop a simple but effective solution. By parsing the raw data into custom objects and making a simple system for ranking units, I filtered out the best available units that matched my preferences.

import json
import requests
from dataclasses import dataclass
from datetime import datetime

id_unit_map = {"63844" : "1J", "63831": "1D", "63838": "1E"}

@dataclass
class Unit:
floor_plan: str
unit_number: int
floor_number: int
building_number: int
unit_price: int
square_feet: int
lease_term_for_price: int
date_available: datetime


def __init__(self, json_data):
self.floor_plan = id_unit_map.get(json_data["floor_plan_id"])
unit_str = json_data["unit_number"]
self.unit_number = int(unit_str)
self.floor_number = int(unit_str[1])
self.building_number = int(unit_str[0])
self.unit_price = json_data["price"]
self.square_feet = json_data["area"]
self.lease_term_for_price = json_data["display_lease_term"]
self.date_available = datetime.strptime(json_data["available_on"], "%Y-%m-%d")

def __str__(self) -> str:
return f'''
Unit #{self.unit_number} | Type {self.floor_plan} | {self.square_feet} SqFt.
Price: {self.unit_price}
Level: {self.floor_number}
Lease term: {self.lease_term_for_price}
Available on: {self.date_available.strftime("%b %d, %Y")}
'''

def generate_unit_report():
# Fetch data
req = requests.get("~~redacted_url~~", headers={"Accept": "application/json"})
response_body = req.json()
raw_units_data = response_body['data']['units']

# Parse raw data into our own objects.
fav_floor_plans = [Unit(raw_unit_data) for raw_unit_data in raw_units_data if id_unit_map.get(raw_unit_data["floor_plan_id"])]

# Filter units by my personal criteria so only the best units are left.
best_units = filter(filter_units_by_personal_criteria, ideal_floor_plans)

# Summarize best available units.
return "\n".join([str(unit) for unit in best_units])

def filter_units_by_personal_criteria(unit: Unit):
if unit.floor_plan == "1D" and unit.building_number == 1:
faces_city_skyline = (unit.unit_number % 100) <= 12
return faces_city_skyline
elif unit.floor_plan == "1E":
has_porch = unit.floor_number > 2
has_natural_light = (unit.unit_number % 100) in [17, 19, 66, 68]
return has_porch and has_natural_light
else:
# ❤️ the unit as is
return True

if __name__ == "__main__":
report = generate_unit_report()
print(report)

Now I can run this bot from the comfort of our my own computer and instantly know which of my favorite units are available and which aren’t. That’s not good enough though… That requires a few more key strokes and clicks on my trackpad than I am willing to do. We made it this far, so we might as well put this up in the cloud and have it run around the clock so all we have to do is sit back and let the available units come to us.

Taking it to the Cloud:

To streamline the process further, I decided to deploy the script using AWS Lambda. With AWS Lambda, I could run the script on a regular interval, ensuring that I always had up-to-date information on available units. I utilized CloudWatch events and scheduled the script to run at specific times during the day, making the automation process effortless.

Delivering Results:

Of course, knowing the results of the search is crucial. To keep myself informed, I leveraged AWS Simple Notification Service (SNS). By creating a notification topic, I received email updates about changes to the available units in my favorite complex. The integration with SNS required minimal additional code, and I now receive the unit details in my inbox automatically.

Putting it all together we now have the following masterpiece 🎨:

import json
from dataclasses import dataclass
from datetime import datetime
import requests
import boto3

client = boto3.client('sns')
sns_arn = "~~redacted-arn~~"
id_unit_map = {"63844" : "1J", "63831": "1D", "63838": "1E"}

@dataclass
class Unit:
floor_plan: str
unit_number: int
floor_number: int
building_number: int
unit_price: int
square_feet: int
lease_term_for_price: int
date_available: datetime


def __init__(self, json_data):
self.floor_plan = id_unit_map.get(json_data["floor_plan_id"])
unit_str = json_data["unit_number"]
self.unit_number = int(unit_str)
self.floor_number = int(unit_str[1])
self.building_number = int(unit_str[0])
self.unit_price = json_data["price"]
self.square_feet = json_data["area"]
self.lease_term_for_price = json_data["display_lease_term"]
self.date_available = datetime.strptime(json_data["available_on"], "%Y-%m-%d")

def __str__(self) -> str:
return f'''
Unit #{self.unit_number} | Type {self.floor_plan} | {self.square_feet} SqFt.
Price: {self.unit_price}
Level: {self.floor_number}
Lease term: {self.lease_term_for_price}
Available on: {self.date_available.strftime("%b %d, %Y")}
'''

def generate_unit_report():
# Fetch data
req = requests.get("~~redacted_url~~", headers={"Accept": "application/json"})
response_body = req.json()
raw_units_data = response_body['data']['units']

# Parse raw data into our own objects.
fav_floor_plans = [Unit(raw_unit_data) for raw_unit_data in raw_units_data if id_unit_map.get(raw_unit_data["floor_plan_id"])]

# Filter units by my personal criteria so only the best units are left.
best_units = filter(filter_units_by_personal_criteria, ideal_floor_plans)

# Summarize best available units.
return "\n".join([str(unit) for unit in best_units])

def filter_units_by_personal_criteria(unit: Unit):
if unit.floor_plan == "1D" and unit.building_number == 1:
faces_city_skyline = (unit.unit_number % 100) <= 12
return faces_city_skyline
elif unit.floor_plan == "1E":
has_porch = unit.floor_number > 2
has_natural_light = (unit.unit_number % 100) in [17, 19, 66, 68]
return has_porch and has_natural_light
else:
# ❤️ the unit as is
return True

def lambda_handler(event, context):
report = generate_report()
response = client.publish(TopicArn = sns_arn, Message = report, Subject='Available Units Report')

Conclusion:

Automating the apartment hunting process using web scraping has been a game-changer for me. The ability to receive real-time updates on apartment availability and have them delivered straight to my email has saved me valuable time and effort. I encourage you to explore web scraping as a powerful tool for enhancing your apartment search.

Remember, with a little bit of coding and the right tools, you can revolutionize your apartment hunt and stay one step ahead. Happy apartment hunting 🔎!

--

--

Matt Bommer

Professional iOS developer with a general love for all things programming. A NestJs and Blockchain enthusiast in my spare time ;)