for loop - For statement Not working in Selenium's scraper

Question

Welcome To Ask or Share your Answers For Others

for loop - For statement Not working in Selenium's scraper

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

The for statement doesn't work in Scraper collecting articles made with Selenium. The purpose is to scrape all the article-related contents(title, date, office, sort, article) that appear on the screen entering the URL.

However, only the first article is scraped. I guess there is a problem with Pandas' data frame, but it's not clear.

import time
import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36")
chrome_options.add_argument("lang=ko_KR")

wd = webdriver.Chrome(executable_path='c:/chromedriver.exe', options=chrome_options)
wd.implicitly_wait(10)

news_df = pd.DataFrame(columns=('Title', 'Date', 'Office', 'Sort', 'Article'))
idx = 0
news_url = 'https://newslibrary.naver.com/search/searchByKeyword.nhn#%7B%22mode%22%3A1%2C%22sort%22%3A0%2C%22trans%22%3A%221%22%2C%22pageSize%22%3A10%2C%22keyword%22%3A%22%EA%B1%B4%EC%84%A4%EC%82%B0%EC%97%85%22%2C%22status%22%3A%22success%22%2C%22startIndex%22%3A1%2C%22page%22%3A1%2C%22startDate%22%3A%221945-01-01%22%2C%22endDate%22%3A%221945-12-31%22%7D'
wd.get(news_url)

data = wd.find_elements_by_css_selector('#searchlist > ul > li:nth-child(1)')
try:
    for da in data:
        title = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/h3/a').get_attribute('title')
        date = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[1]').text
        office = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[2]').text
        sort = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[4]').text
        article = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/div').text
        article = article.replace("
", "")
        article = article.replace("
", "")
        article = article.replace("", "")
       
        news_df.loc[idx] = [title, date, office, sort, article]
        idx += 1
        
except AttributeError:
    pass

wd.close()
print('Complete!')

question from:https://stackoverflow.com/questions/65898271/for-statement-not-working-in-seleniums-scraper

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1.2k views

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:16:03+0000

answered Oct 7, 2021 by 深蓝 (71.8m points)

Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

for loop - For statement Not working in Selenium's scraper

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags