I'm trying to scrape all the HTML elements of a page using requests & beautifulsoup. I'm using ASIN (Amazon Standard Identification Number) to get the product details of a page. My code is as follows:
from urllib.request import urlopen
import requests
from bs4 import BeautifulSoup
url = "http://www.amazon.com/dp/" + 'B004CNH98C'
response = urlopen(url)
soup = BeautifulSoup(response, "html.parser")
print(soup)
But the output doesn't show the entire HTML of the page, so I can't do my further work with product details. Any help on this?
EDIT 1:
From the given answer, It shows the markup of the bot detection page. I researched a bit & found two ways to breach it :
- I might need to add a header in the requests, but I couldn't understand what should be the value of header.
- Use Selenium. Now my question is, do both of the ways provide equal support?