I am Scraping some pages and am trying to use the LinkExtractor to get the URLs from the response. In general that is going quite ok, but the LinkExtractor is not able to extract the relative link to a pdf file that is found at line 111 of the html
I have tried a lot, but haven't been able to figure out how and when the linkextractor drops the relative links, and whether this is how the extractor is supposed to work
scrapy shell "https://www.limburg.nl/algemeen/zoeken/?mode=zoek&ajax=true&zoeken_sortering=Num&pager_page=4"
from scrapy.linkextractors import IGNORED_EXTENSIONS
skip_extensions = list(set(IGNORED_EXTENSIONS) - set("pdf")) + ["gz","txt","csv","cls","and","bib","xml","dat","dpr","cfg","bdsproj","dproj","local","tvsconfig","res","dsk"]
extractor = LinkExtractor(deny_extensions=skip_extensions)
extractor.extract_links(response)