Ned Hermann
workprojects

StreetEasy Scraper

Advanced web scraper using BrightData, webhooks, and secure tunnels.

StreetEasy Scraper preview

โœจ Inspiration

I was chatting with a friend that works in real estate and they mentioned how they were manually going over thousands of listings on StreetEasy for his work. That's insane! I decided to build a pipeline to scrape, transform, and load the data into a CSV file for him to use instead.

Admittedly, this was a bit more involved than I expected because:

๐Ÿ“„ Project Details

To bypass CAPTCHA, I had to use BrightData's Web Unlocker. This service has two approaches, synchronous or asynchronous web scraping. The difference between the two is that the asynchronous approach involves a lot more infra, and to set this up seeminglessly for my non-tech friend was a bit of a challenge.

Originally, I built the synchronous approach, but it had it's cons:

That's when I decided to build the asynchronous approach. Instead of simply scraping the URLs that I wanted, the difference with this approach is:

  1. You send URLs to BrightData's Web Unlocker, which returns a job ID.
  2. Upon completing the scrape, BrightData sends a POST request to your webhook containing the job's status, requiring you to update the job's status in your database.
  3. A background worker continuously monitors database job statuses, fetches data for ready jobs from BrightData, processes and saves the results, and then marks the job as "completed" or "failed."

That's quite a bit of work to set up seeminglessly on my friend's local computer, but the pros were worth it:

๐Ÿš€ Features

๐Ÿ› ๏ธ Technologies

โš™๏ธ Tooling