Im trying to scrape one website that requires login with Python and Beautiful Soup. I want to scrape this page (when you click it it will redirect you to login page). : https://www.eurekalert.org/reporter/embargoed.php
This is login page: https://www.eurekalert.org/login.php
On the first link that I provided, there are a lot of news articles that has links like this: https://www.eurekalert.org/emb_releases/2021-01/embl-ebn011121.php
So every 'href' has '/emb_releases/2021-01/embl-ebn011121.php'
The problem is that I can not get HTML of the page (first link) where I can extract hrefs. Wanted hrefs has this css tag 'article.post a'. This is my code:
from bs4 import BeautifulSoup
import requests
url = 'https://www.eurekalert.org/'
login = 'login'
headers = {'origin': url,
'referer': url+login}
s = requests.session()
login_payload = {'login': 'xxx',
'password': 'xxx'}
# Each YT tutorial says that it should be .post here, but on my website the request is get, not post. I have tried both ways, its the same result
login_req = s.post(url+login, headers=headers, data = login_payload)
print(login_req) # returns 200, if i try .get it also returns 200
login_response = s.get(url+'reporter/embargoed.php')
print(login_response) # returns 200
soup = BeautifulSoup(login_response.content, 'html.parser')
print(soup) # prints HTML but not the HTML that I want
I have also tried this, but I get the same result:
login_response = requests.get(url+'reporter/embargoed.php', auth = ('username', 'password'))
soup = BeautifulSoup(login_response.content, 'html.parser')
print(soup) # prints HTML but not the HTML that I want
This is the first time Im trying to scrape website that requires login, so there are probably some stupid stuff on my code. What am I doing bad? I googled a lot, and I tried a lot of different stuf but I always failed.
Thanks for helping me out.