Web Unblocker<\/a><\/u> proxy solution is among the best-performing ones. It will automatically manage the proxy pool, headers, cookies, and other browser parameters for you, so you don\u2019t have to worry about getting blocked. You can also sign up and try it out for free before committing.<\/p>\n\n\n\nTo integrate it, you need to con\ufb01gure the code as exempli\ufb01ed below. Note that you will need Web Unblocker credentials that you get upon registering.<\/p>\n\n\n\n
Python\nproxy = 'http:\/\/{}:{}@unblock.oxylabs.io:60000'.format('USERNAME', 'PASSWORD')\n<\/code><\/pre>\n\n\n\nproxies = {\n'http': proxy, 'https': proxy\n}\nresponse = requests.get(page, proxies=proxies, verify=False)<\/code><\/pre>\n\n\n\nYou\u2019ll have to replace USERNAME and PASSWORD with your sub-user credentials. Also, when creating a network request, you\u2019ll have to pass an additional parameter verify=False as shown above.<\/p>\n\n\n\n
Extracting Product Information<\/strong><\/h3>\n\n\n\nIf you run the code, the response object will now have the HTML source code of the Amazon product page. Before you begin parsing the product information, you\u2019ll have to inspect the target elements using a web browser. To do that:<\/p>\n\n\n\n
\nOpen the product link in a web browser.<\/li>\n\n\n\n Right-click <\/strong>on the screen.<\/li>\n\n\n\nSelect Inspect.<\/strong><\/strong><\/li>\n<\/ol>\n\n\n\nThat\u2019s the view you should get:<\/p>\n\n\n\n <\/figure>\n\n\n\nNow, let\u2019s use Beautiful Soup to parse this HTML content and extract the elements:<\/p>\n\n\n\n
Python\ndata = []\nsoup = BeautifulSoup(response.content, 'html.parser')<\/code><\/pre>\n\n\n\nProduct Title<\/strong><\/h3>\n\n\n\n <\/figure>\n\n\n\nBeautiful Soup will parse the HTML and create a soup object. Using this object, you can extract the product title. Carefully inspect the product page again:<\/p>\n\n\n\n
Notice that the title has a property id=\u201dproductTitle\u201d. Using this property we can select it as below:<\/p>\n\n\n\n
Python\ntitle = soup.find('span', {'id': 'productTitle'}).text<\/code><\/pre>\n\n\n\nProduct Price<\/strong><\/h3>\n\n\n\nNext, let\u2019s grab the product price. Inspect the price element using the browser:<\/p>\n\n\n\n <\/figure>\n\n\n\nAs you can see, the price is in the span element, wrapped in another span element with a class a-text-price:<\/p>\n\n\n\n
Python\nprice = soup.find('span', {'class': 'a-text-price'}).find('span').text<\/code><\/pre>\n\n\n\nProduct Rating<\/strong><\/h3>\n\n\n\nSimilarly, you can also extract the total amount of product ratings:<\/p>\n\n\n\n
Python\ntotal_ratings = soup.find('span', {'id': 'acrCustomerReviewText'}).text<\/code><\/pre>\n\n\n\nThen, you can use the following code line to extract the product rating score:<\/p>\n\n\n\n
Python\nrating = soup.find('a', {'class': 'a-popover-trigger\na-declarative'}).find('span', {'class': 'a-size-base a-color-base'}).text<\/code><\/pre>\n\n\n\nStoring Data Into CSV<\/strong><\/h2>\n\n\n\nOnce you\u2019re done selecting the elements you want to extract, let\u2019s get all this information in a usable format.<\/p>\n\n\n\n
Using panda\u2019s data frame object, let\u2019s export the data in a CSV \ufb01le with the following line of code. Since you don\u2019t need an index, set the index to False.<\/p>\n\n\n\n
Python\ndata.append({\n'title': title, 'price': price,\n'total ratings': total_ratings, 'rating': rating\n})\ndf = pd.DataFrame(data) df.to_csv('amazon_product_data', index=False)<\/code><\/pre>\n\n\n\nFull Source Code<\/strong><\/h3>\n\n\n\nYou can also modify the code to extract multiple products by using a list of product URLs and a simple for loop. Note that Web Unblocker uses headers automatically, so you don\u2019t need to pass additional HTTP headers. The full source code is given below:<\/p>\n\n\n\n
Python\nimport requests\nfrom bs4 import BeautifulSoup import pandas as pd\nproxy = 'http:\/\/{}:{}@unblock.oxylabs.io:60000'.format('USERNAME', 'PASSWORD')\n\nproxies = {\n'http': proxy, 'https': proxy\n}<\/code><\/pre>\n\n\n\n\npages = [\n'https:\/\/www.amazon.com\/iPhone-Pro-Max-128GB-Gold\/dp\/B0BGYDDWDF\/'\ndata = []\nfor page in pages:\nresponse = requests.get(page, proxies=proxies, verify=False) soup = BeautifulSoup(response.content, 'html.parser')\ntitle = soup.find('span', {'id': 'productTitle'}).text\nprice = soup.find('span', {'class': 'a-text-price'}).find('span').text total_ratings = soup.find('span', {'id': 'acrCustomerReviewText'}).text rating = soup.find('a', {'class': 'a-popover-trigger\na-declarative'}).find('span', {'class': 'a-size-base a-color-base'}).text data.append({\n'title': title, 'price': price,\n'total ratings': total_ratings, 'rating': rating\n})\ndf = pd.DataFrame(data) df.to_csv('amazon_product_data', index=False)<\/code><\/pre>\n\n\n\nConclusion<\/strong><\/h3>\n\n\n\nHopefully, this step-by-step guide has equipped you with the necessary skills to navigate the Amazon website, extract product data, and overcome anti-bot challenges. Having the ability to gather data from Amazon opens up a world of possibilities for market research, competitor analysis, pricing optimization, and much more!<\/p>\n","protected":false},"excerpt":{"rendered":"
[…]<\/p>\n","protected":false},"author":29,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-gradient":""}},"footnotes":""},"categories":[3],"tags":[],"acf":[],"uagb_featured_image_src":{"full":false,"thumbnail":false,"medium":false,"medium_large":false,"large":false,"1536x1536":false,"2048x2048":false},"uagb_author_info":{"display_name":"Maryia Stsiopkina","author_link":"https:\/\/dataprot.net\/author\/maryia-stsiopkina\/"},"uagb_comment_info":0,"uagb_excerpt":"[…]","_links":{"self":[{"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/posts\/2639"}],"collection":[{"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/users\/29"}],"replies":[{"embeddable":true,"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/comments?post=2639"}],"version-history":[{"count":5,"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/posts\/2639\/revisions"}],"predecessor-version":[{"id":2716,"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/posts\/2639\/revisions\/2716"}],"wp:attachment":[{"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/media?parent=2639"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/categories?post=2639"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dataprot.net\/wp-json\/wp\/v2\/tags?post=2639"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}