Google Maps Business Data Scraping with Python and Selenium

Google Maps has become an essential tool for businesses to showcase their locations, services, and contact information. However, manually collecting these data from Google Maps can be a tedious and time-consuming process. That’s where web scraping comes in handy.

Web scraping is the process of extracting data from websites automatically using software tools. Python and Selenium are two popular technologies for web scraping. In this tutorial, we will walk you through a step-by-step process of using Python and Selenium to scrape business data from Google Maps.

With this tutorial, you’ll learn how to:

Set up your Python and Selenium environment
Automate web browser interactions with Selenium
Use XPaths and CSS selectors to locate elements on a web page
Extract business data from Google Maps and save it to a CSV file

By the end of this tutorial, you’ll have a powerful tool for collecting business data from Google Maps, which can be used for market research, lead generation, and other business purposes. So, let’s get started!

Installation

Create and activate your python virtual environment and install Selenium and webdriver-manager using,

pip install selenium webdriver-manager

Importing the dependencies

import csv
import time
from selenium import webdriver
from selenium.common import NoSuchElementException
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager

csv is a built-in Python module that provides functionality for reading from and writing to CSV files.
time is another built-in module that provides various time-related functions.
webdriver_manager is a module that provides an easy way to download and install the ChromeDriver executable.

GoogleMapScraper: the main Class for our project

class GoogleMapScraper:
    def __init__(self):
        self.output_file_name = "google_map_business_data.csv"
        self.headless = False
        self.driver = None
        self.unique_check = []

__init__ is a special method that is called when an instance of the GoogleMapScraper class is created. It initializes the instance’s attributes with default values.
self.output_file_name is a string that represents the name of the CSV file where the scraped data will be saved. By default, it is set to “google_map_business_data.csv”.
self.headless is a boolean that determines whether the Chrome browser will run in headless mode or not. Headless mode means that the browser will run without a graphical user interface. By default, it is set to False.
self.driver is a variable that will hold the webdriver.Chrome instance used to interact with the Chrome browser. It is initialized to None because the instance will be created later in the code.
self.unique_check is a list that will contain the unique identifiers of the businesses that have already been scraped. This is used to avoid scraping duplicate data. It is initialized to an empty list.

Configuring the selenium webdriver

def config_driver(self):
    options = webdriver.ChromeOptions()
    if self.headless:
        options.add_argument("--headless")
    s = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=s, options=options)
    self. Driver = driver

config_driver is a method that configures the webdriver used to interact with the Chrome browser. It sets the options for the Chrome browser, such as whether it should run in headless mode or not.
options is an instance of webdriver.ChromeOptions that is used to set the options for the Chrome browser.
If self.headless is True, then the –headless argument is added to the options object. This tells the Chrome browser to run in headless mode.
s is an instance of Service that starts the ChromeDriver server. The ChromeDriverManager().install() method is used to download and install the latest version of ChromeDriver.
driver is an instance of webdriver.Chrome that is created using the service and options objects.
self.driver is set to the driver instance so that it can be used to interact with the Chrome browser in other methods.

Scrolling down the page to load more data

def load_companies(self, url):
    print("Getting business info", url)
    self.driver.get(url)
    time.sleep(5)
    panel_xpath = '//*[@id="QA0Szd"]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]'
    scrollable_div = self.driver.find_element(By.XPATH, panel_xpath)
    # scrolling
    flag = True
    i = 0
    while flag:
        print(f"Scrolling to page {i + 2}")
        self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', scrollable_div)
        time.sleep(2)

        if "You've reached the end of the list." in self.driver.page_source:
            flag = False

        self.get_business_info()
        i += 1

The load_companies method is used to load more data. First, the function opens the URL and selects the panel using a pre-defined xpath which we will be using to scroll down the page. The function continues to load data and call get_business_info for each subsequent page until it reaches the end of the page, which is indicated by the appearance of a specific text. At this point, the function sets a flag to stop the while loop and terminates the scraping process.

Panel xpath to scroll down before scraping data

Copy the panel xpath as shown in the image above.

Parsing the Data

def get_business_info(self):
    time.sleep(2)
    for business in self.driver.find_elements(By.CLASS_NAME, 'THOPZb'):
        name = business.find_element(By.CLASS_NAME, 'fontHeadlineSmall').text
        rating, reviews_count = self.parse_rating_and_review_count(business)
        address, category = self.parse_address_and_category(business)
        contact = self.parse_contact(business)
        try:
            website = business.find_element(By.CLASS_NAME, "lcr4fd").get_attribute("href")
        except NoSuchElementException:
            website = ""

        unique_id = "".join([name, rating, reviews_count, address, category, contact, website])
        if unique_id not in self.unique_check:
            data = [name, rating, reviews_count, address, category, contact, website]
            self.save_data(data)
            self.unique_check.append(unique_id)

In this portion of the code, the get_business_info method is defined to extract relevant business information from the Google Maps listing.

For each business, it extracts the name, rating, reviews count, address, category, contact, and website link (if available). It then generates a unique ID using all of these extracted attributes.

First we’ll find out all the loaded business or company items,

business item for scraping

self.driver.find_elements(By.CLASS_NAME, 'THOPZb')

It finds all the elements with class name THOPZb on the page. This will return a list of businesses, each of which represents a business. Then we can loop through the items and parse the attributes.

Let’s go through the attributes we are scraping.

Name:

web scrape the title

name = business.find_element(By.CLASS_NAME, 'fontHeadlineSmall').text

This extracts the text from the element with class name fontHeadlineSmall.

Rating & Reviews Count:

We made a method to parse rating and reviews count.

def parse_rating_and_review_count(self, business):
    try:
        reviews_block = business.find_element(By.CLASS_NAME, 'AJB7ye').text.split("(")
        rating = reviews_block[0].strip()
        reviews_count = reviews_block[1].split(")")[0].strip()
    except:
        rating = ""
        reviews_count = ""

    return rating, reviews_count

Scrape reviews and ratings

Within this method we are extracting the element with class name AJB7ye and then splitting them into rating and reviews count. Finally, we are returning them.

Address & Category:

For this portion another method helps us to scrape address and category. Let’s have a look.

def parse_address_and_category(self, business):
    try:
        address_block = business.find_elements(By.CLASS_NAME, "W4Efsd")[2].text.split("·")
        if len(address_block) >= 2:
            address = address_block[1].strip()
            category = address_block[0].strip()
        elif len(address_block) == 1:
            address = ""
            category = address_block[0]
    except:
        address = ""
        category = ""

    return address, category

scrape address and category

As you can see there are multiple occurrences of the class name W4Efsd. We will select all of them and take only the third one that contains the category and address.

Contact info:

def parse_contact(self, business):
    try:
        contact = business.find_elements(By.CLASS_NAME, "W4Efsd")[3].text.split("·")[-1].strip()
    except:
        contact = ""

    if "+1" not in contact:
        try:
            contact = business.find_elements(By.CLASS_NAME, "W4Efsd")[4].text.split("·")[-1].strip()
        except:
            contact = ""

    return contact

Scraping contact info

Again we’ll select the elements with the class name W4Efsd. We’ll go for the fourth or fifth item as fourth or fifth element contains the contact information. We’ll find out the correct result applying some additional logics.

Website:

website = business.find_element(By.CLASS_NAME, "lcr4fd").get_attribute("href")

Scraping website

Class name lcr4fd contains the website and above code gets its href attribute of this element to extract the website URL.

Saving the data

Finally we will save the data by calling below method every time we collect a business info,

def save_data(self, data):
    header = ['id', 'company_name', 'rating', 'reviews_count', 'address', 'category', 'phone', 'website']
    with open(self.output_file_name, 'a', newline='', encoding="utf-8") as csvfile:
        writer = csv.writer(csvfile)
        if data[0] == 1:
            writer.writerow(header)
        writer.writerow(data)

What’s next?

If you want to scrape Google Maps using Python and Selenium, just remember: don’t be evil (to Google’s terms of service) and always use your powers for good (or at least for something more productive than just finding the nearest taco truck). Happy scraping! You want the full code?
Here you go,

import csv
import time
from selenium import webdriver
from selenium.common import NoSuchElementException
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager


class GoogleMapScraper:
    def __init__(self):
        self.output_file_name = "google_map_business_data.csv"
        self.headless = False
        self.driver = None
        self.unique_check = []

    def config_driver(self):
        options = webdriver.ChromeOptions()
        if self.headless:
            options.add_argument("--headless")
        s = Service(ChromeDriverManager().install())
        driver = webdriver.Chrome(service=s, options=options)
        self.driver = driver

    def save_data(self, data):
        header = ['id', 'company_name', 'rating', 'reviews_count', 'address', 'category', 'phone', 'website']
        with open(self.output_file_name, 'a', newline='', encoding="utf-8") as csvfile:
            writer = csv.writer(csvfile)
            if data[0] == 1:
                writer.writerow(header)
            writer.writerow(data)


    def parse_contact(self, business):
        try:
            contact = business.find_elements(By.CLASS_NAME, "W4Efsd")[3].text.split("·")[-1].strip()
        except:
            contact = ""

        if "+1" not in contact:
            try:
                contact = business.find_elements(By.CLASS_NAME, "W4Efsd")[4].text.split("·")[-1].strip()
            except:
                contact = ""

        return contact


    def parse_rating_and_review_count(self, business):
        try:
            reviews_block = business.find_element(By.CLASS_NAME, 'AJB7ye').text.split("(")
            rating = reviews_block[0].strip()
            reviews_count = reviews_block[1].split(")")[0].strip()
        except:
            rating = ""
            reviews_count = ""

        return rating, reviews_count


    def parse_address_and_category(self, business):
        try:
            address_block = business.find_elements(By.CLASS_NAME, "W4Efsd")[2].text.split("·")
            if len(address_block) >= 2:
                address = address_block[1].strip()
                category = address_block[0].strip()
            elif len(address_block) == 1:
                address = ""
                category = address_block[0]
        except:
            address = ""
            category = ""

        return address, category


    def get_business_info(self):
        time.sleep(2)
        for business in self.driver.find_elements(By.CLASS_NAME, 'THOPZb'):
            name = business.find_element(By.CLASS_NAME, 'fontHeadlineSmall').text
            rating, reviews_count = self.parse_rating_and_review_count(business)
            address, category = self.parse_address_and_category(business)
            contact = self.parse_contact(business)
            try:
                website = business.find_element(By.CLASS_NAME, "lcr4fd").get_attribute("href")
            except NoSuchElementException:
                website = ""

            unique_id = "".join([name, rating, reviews_count, address, category, contact, website])
            if unique_id not in self.unique_check:
                data = [name, rating, reviews_count, address, category, contact, website]
                self.save_data(data)
                self.unique_check.append(unique_id)



    def load_companies(self, url):
        print("Getting business info", url)
        self.driver.get(url)
        time.sleep(5)
        panel_xpath = '//*[@id="QA0Szd"]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]'
        panel_xpath = '//*[@id="QA0Szd"]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]'
        scrollable_div = self.driver.find_element(By.XPATH, panel_xpath)
        # scrolling
        flag = True
        i = 0
        while flag:
            print(f"Scrolling to page {i + 2}")
            self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', scrollable_div)
            time.sleep(2)

            if "You've reached the end of the list." in self.driver.page_source:
                flag = False

            self.get_business_info()
            i += 1


urls = [
    "https://www.google.com/maps/search/Nashville+Masonry+contractor/@35.5098836,-87.8700415,8z/data=!3m1!4b1"
]

business_scraper = GoogleMapScraper()
business_scraper.config_driver()
for url in urls:
    business_scraper.load_companies(url)

Tags: automation, beautifulsoup, business leads, data collection, google maps, python, python web scraping, scrapy, selenium, web scraping, web scraping python

Benjamin

September 3, 2023 at 7:51 pm

Hi,

How did you fix the infinite scrolling problem with google map. I used this but it doesnt work.
last_height = driver.execute_script(“return document.body.scrollHeight”)
while True:
# Scroll down to bottom
driver.execute_script(“window.scrollTo(0, document.body.scrollHeight);”)
# Wait to load page
time.sleep(3)
# Calculate new scroll height and compare it with last scroll height
new_height = driver.execute_script(“return document.body.scrollHeight”)
if new_height == last_height: # if the new and last height are equal, it means that there isn’t any new page to load, so we stop scrolling
break
else:
last_height = new_height

Tohfa Akib
November 28, 2024 at 6:19 am

Hi Benjamin,

Thank you for your question! The issue with the code you shared lies in how Google Maps handles infinite scrolling. Unlike simple web pages that rely on window.scrollTo or document.body.scrollHeight for scrolling, Google Maps typically uses a specific scrollable container for lists, such as search results or reviews. These containers are not part of the main window object, so scrolling the main page won’t affect them.

here’s the working code,

panel_xpath = ‘//*[@id=”QA0Szd”]/div/div/div[1]/div[2]/div/div[1]/div/div/div[2]/div[1]’
scrollable_div = self.driver.find_element(By.XPATH, panel_xpath)
flag = True
i = 0

while flag:
print(f”Scrolling to page {i + 2}”)
self.driver.execute_script(‘arguments[0].scrollTop = arguments[0].scrollHeight’, scrollable_div)
time.sleep(2)

if “You’ve reached the end of the list.” in self.driver.page_source:
flag = False

self.get_business_info()
i += 1

Here, scrollable_div is the element containing the scrollable content, which you can locate with Selenium. This method ensures you’re targeting the correct element for scrolling.