I am trying to create a WebScraper that scrapes magnet links/torrent files from the 1337x torrenting site and provides the output to me. I have encountered a strange issue that produces the TimeoutException error in my Python Selenium script. I have tried to look for the solution myself and could only find this post :
In which the person who is asking the question is using the Internet Explorer webdriver. Supposedly the error is caused by the IE's protected mode which to my understanding means that IE was not able to spawn a new tab without it being allowed in it's settings.
The problem that I am facing right now is that first I am trying to use the Mozilla's GeckoDriver for Firefox. And second that the error gets initiated even though the context should not allow it. If you look at my code you can see that I am trying to use an if statement to check if an element exists on the page so that it is only used if it's present. The reason why I am doing this is that as I found out some of the torrents don't contain the magnet link button and only provide the torrent file download button. I prefer scraping magnet links but since there doesn't seem to be a way of obtaining them when they are not provided by the user/1337x I am trying to at least download the torrent file for the torrent as a replacement for it's missing magnet link.
To my understanding Firefox could have some kind of "protected mode" too that doesn't allow me to spawn new tabs (which is a must since the torrent file button opens a new tab and automatically initiates download) without me allowing it somehow. I have found no information about how to allow this what so ever.
Even stranger thing is that the error should not occur on the first page at all though. The scraper successfully obtains the links for the torrent pages from the CSV file and opens the first page which does contain the magnet link so the if statement that I am using should go for the magnet link and the torrent file shouldn't be downloaded at all if the magnet link button exists.
Now please keep in mind that the extremely long class names that I use indeed match the button on every single page. I am not sure if the class name are randomly filled in by some type of a user ID or something like that but for the specific user that I am trying to scrape the links from they work.
I would be extremely grateful if someone could explain to me why the error happens in the first place since it's supposed to be eliminated by the if statement if the magnet link button exists. Thanks for any help with this from anyone!
My code :
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec
from time import sleep
from platform import system
from os import getcwd, getlogin
import csv
cwd = getcwd()
os = system()
user = getlogin()
appdata = "%appdata%"
if os == "Linux":
driver = webdriver.Firefox(executable_path=cwd + "/geckodriver",
firefox_profile="/home/"+ user + "/.mozilla/firefox/fejcr1nv.default-esr")
elif os == "Windows":
driver = webdriver.Firefox(executable_path=cwd + "/geckodriver",
firefox_profile=appdata + "MozillaFirefoxProfilesg4yuhhlu.default-esr-1")
#elif os == "darwin":
page_number = 1
wait = WebDriverWait(driver, 10)
with open('links.csv', 'w+', newline='') as write:
writer = csv.writer(write)
while True:
try:
pageNumberAsString = str(page_number)
page = "https://1337x.to/johncena141-torrents/" + pageNumberAsString + "/"
driver.get(page)
links = wait.until(
ec.presence_of_all_elements_located((By.CSS_SELECTOR, "td.coll-1.name a:nth-of-type(2)")))
# print(len(links))
for link in links:
print(link.get_attribute("href"))
writer.writerow([link.get_attribute("href")])
page_number = int(pageNumberAsString)
page_number = page_number + 1
sleep(0.5)
except Exception as e:
print(e)
break
with open('links.csv') as read:
reader = csv.reader(read)
link_list = list(reader)
with open('magnet-links.csv', 'w+', newline='') as write:
writer = csv.writer(write)
for link in link_list:
driver.get(', '.join(link))
sleep(5)
if (ec.presence_of_element_located((By.CSS_SELECTOR, "a.l1694fb2a57e7ac72bd13da7456603ae4a61ac37b.la799628e8f1f6125eb70d43d0c8a6fff015989d9.l935c35cd7973f0c5361f748bd20aa158d8721077"))):
obtain_reference = wait.until(
ec.presence_of_element_located((By.CSS_SELECTOR, "a.l1694fb2a57e7ac72bd13da7456603ae4a61ac37b.la799628e8f1f6125eb70d43d0c8a6fff015989d9.l935c35cd7973f0c5361f748bd20aa158d8721077"))
)
print(obtain_reference.get_attribute("href"))
writer.writerow([obtain_reference.get_attribute("href")])
else:
wait.until(
ec.presence_of_element_located((By.CSS_SELECTOR, "a.laa16cee1ce9e1fc2ab0aa0a7adebe1fc61ed7f1c.l815a274f811bafb3888948ae7a70bb0be62c4ff6.l8c8be498c56444ad36384df5fc724e8b10e215b0"))
)
obtain_reference = driver.find_element_by_css_selector("a.laa16cee1ce9e1fc2ab0aa0a7adebe1fc61ed7f1c.l815a274f811bafb3888948ae7a70bb0be62c4ff6.l8c8be498c56444ad36384df5fc724e8b10e215b0").click()
sleep(1)
driver.quit()