Scrape Amazon Product Reviews With Python –
Amazon is a well-known e-commerce platform with a large amount of data available in various formats on the web. This data can be invaluable for gaining business insights, particularly by analyzing product reviews to understand the quality of products provided by different vendors.
In this guide we will look into web scraping steps to extract amazon reviews of a particular product and save it in excel or csv format. Since manually copying information online can be tedious, in this guide we’ll focus on scraping reviews from Amazon. This hands-on experience will enhance our practical understanding of web scraping techniques.
Before we start, make sure you have Python installed in your system, you can do that from this link: python.org. The process is very simple, just install it like you would install any other application.
Now that everything is set let’s proceed:
How to Scrape Amazon Reviews Using Python
Install Anaconda using this link: https://www.anaconda.com/download . Be sure to follow the default settings during installation. For more guidance, please click here.
We can use various IDEs, but to keep it beginner-friendly, let’s start with Jupyter Notebook in Anaconda. You can watch the video linked above to understand and get familiar with the software.
Steps for Web Scraping Amazon Reviews:
Create New Notebook and Save it.
Step 1: Let’s start importing all the necessary modules using the following code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
Step 2: Define Headers to avoid getting your IP blocked. Note that you can search `my user agent` on google to get your user agent details and replace it below “User-agent”: “here goes your useragent below”.
custom_headers = {
"Accept-language": "en-GB,en;q=0.9",
"User-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1
Safari/605.1.15",
}
Step 3: Create a python function, to fetch the webpage, check for errors and return a BeautifulSoup object for further processing.
# Function to fetch the webpage and return a BeautifulSoup object
def fetch_webpage(url):
response = requests.get(url, headers=headers)
if response.status_code != 200:
print("Error in fetching webpage")
exit(-1)
page_soup = BeautifulSoup(response.text, "lxml")
return page_soup
Step 4: Inspect Element to find the `element and attribute` from which we want to extract data, Lets Create another function to select the div and attribute and set it to variable , `extract_reviews` identifies review-related elements on a webpage, but it doesn’t yet extract the actual review content. You would need to add code to extract the relevant information from these elements (e.g., review text, ratings, etc.).
# Function to extract reviews from the webpage
def extract_reviews(page_soup):
review_blocks = page_soup.select('div[data-hook="review"]')
reviews_list = []
Step 5: Below code processes each review element and extracts the customer’s name (if available), and stores it in the `customer` variable. If no `customer` information is found, `customer` remains none.
for review in review_blocks:
author_element = review.select_one('span.a-profile-name')
customer = author_element.text if author_element else None
rating_element = review.select_one('i.review-rating')
customer_rating = rating_element.text.replace("out of 5 stars", "") if rating_element else None
title_element = review.select_one('a[data-hook="review-title"]')
review_title = title_element.text.split('stars\n', 1)[-1].strip() if title_element else None
content_element = review.select_one('span[data-hook="review-body"]')
review_content = content_element.text.strip() if content_element else None
date_element = review.select_one('span[data-hook="review-date"]')
review_date = date_element.text.replace("Reviewed in the United States on ", "").strip() if date_element else None
image_element = review.select_one('img.review-image-tile')
image_url = image_element.attrs["src"] if image_element else None
Step 6: The purpose of this function is to process scraped reviews. It takes various parameters related to a review (such as customer, customer_rating, review_title, review_content, review_date, and image URL), and the function returns the list of processed reviews.
review_data = {
"customer": customer,
"customer_rating": customer_rating,
"review_title": review_title,
"review_content": review_content,
"review_date": review_date,
"image_url": image_url
}
reviews_list.append(review_data)
return reviews_list
Step 7: Now, Let’s initialize a search_url variable with an Amazon product review page URL
def main():
review_page_url = "https://www.amazon.com/BERIBES-Cancelling-Transparent-Soft-Earpads-Charging-Black/product-
reviews/B0CDC4X65Q/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews"
page_soup = fetch_webpage(review_page_url)
scraped_reviews = extract_reviews(page_soup)
Step 8: Now let’s print(“Scraped Data:”, data) scraped review data (stored in the data variable) to the console for verification purposes.
# Print the scraped data to verify
print("Scraped Data:", scraped_reviews)
Step 9: Next, Create a dataframe from the data which will help organize data into tabular form.
# create a DataFrame and export it to a CSV file
reviews_df = pd.DataFrame(data=scraped_reviews)
Step 10: Now exports the DataFrame to a CSV file in current working directory
reviews_df.to_csv("reviews.csv", index=False)
print("CSV file has been created.")
Step 11: below code construct acts as a protective measure. It ensures that certain code runs only when the script is directly executed as a standalone program, rather than being imported as a module by another script.
# Ensuring the script runs only when executed directly
if __name__ == '__main__':
main()
Result:
Why Scrape Amazon Product Reviews?
Scraping Amazon product reviews can provide valuable insights for businesses. Here’s why you should consider it:
● Feedback Collection: Every business needs feedback to understand customer requirements and implement changes to improve product quality. Scraping reviews allows businesses to gather large volumes of customer feedback quickly and efficiently.
● Sentiment Analysis: Analyzing the sentiments expressed in reviews can help identify positive and negative aspects of products, leading to informed business decisions.
● Competitor Analysis: Scraping allows businesses to monitor competitors’ pricing and product features, helping to stay competitive in the market.
● Business Expansion Opportunities: By understanding customer needs and preferences, businesses can identify opportunities for expanding their product lines or entering new markets.
Manually copying and pasting content is time-consuming and error-prone. This is where web scraping comes in. Using Python to scrape Amazon reviews can automate the process, reduce manual errors, and provide accurate data.
Benefits of Scraping Amazon Reviews
● Efficiency: Automate data extraction to save time and resources.
● Accuracy: Reduce human errors with automated scripts.
● Large Data Volume: Collect extensive data for comprehensive analysis.
● Informed Decision Making: Use customer feedback to make data-driven business decisions.
I found an amazing, cost-effective service provider that makes scraping easy. Follow this link to learn more.
Conclusion
Now that we’ve covered how to scrape Amazon reviews using Python, you can apply the same techniques to other websites by inspecting their elements. Here are some key points to remember:
● Understanding HTML: Familiarize yourself with HTML structure. Knowing how elements are nested and how to navigate the Document Object Model (DOM) is crucial for finding the data you want to scrape.
● CSS Selectors: Learn how to use CSS selectors to accurately target and extract specific elements from a webpage.
● Python Basics: Understand Python programming, especially how to use libraries like requests for making HTTP requests and BeautifulSoup for parsing HTML content.
● Inspecting Elements: Practice using browser developer tools (right-click on a webpage and select “Inspect” or press Ctrl+Shift+I) to examine the HTML structure. This helps you find the tags and attributes that hold the data you want to scrape.
● Error Handling: Add error handling to your code to deal with possible issues, like network errors or changes in the webpage structure.
● Legal and Ethical Considerations: Always check a website’s robots.txt file and terms of service to ensure compliance with legal and ethical rules of web scraping.
By mastering these areas, you’ll be able to confidently scrape data from various websites, allowing you to gather valuable insights and perform detailed analyses.