Web Scraping for Business Outreach (Google maps and yellowpages)

 

Purpose

I was approached by a client who needed to reach out to specific types of businesses in the US, including pool builders, home builders, landscape architects, and deck builders. Their goal was to offer their services to these companies, but they needed accurate contact information to do so. Additionally, they were interested in collecting reviews of these businesses to gain a better understanding about them.

Solution

I created a custom web scraping solution using Python, Selenium, Beautiful Soup 4, and regular expressions to collect contact information and reviews from various websites such as Yellow Pages, Google Maps, and Google Business. I was able to extract company names, generic emails and phone numbers, contact person’s emails and phone numbers (if available), company websites, locations, and reviews.
In addition to the data collected from these websites, I also created a Generic email collector script to collect email addresses and phone numbers from each company’s website. Since each company’s website is different, I used regular expressions to match email and phone number patterns to extract those information. I also utilized third-party services such as snov.io and hunter.io to collect personal emails of the contact persons at each company. To automate the process of using these third-party services, I created a custom script.

Technologies

  • Python
  • Google map: Selenium and BeautifulSoup4
  • Yellowpages: Requests and BeautifulSoup4
  • Generic email collector: Requests and Regular Expressions

Result

The web scraping solution I created has been used by my client for years and has helped them to gain new clients and expand their business. They have direct access to my support, which includes ongoing data collection and processing.

Challenges

Scraping data from Google Maps was a challenge due to frequent changes in the HTML DOM and selectors. Additionally, Google has captchas, which required me to carefully mimic human behavior to avoid being detected.