Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need a comandline tool (or Javascript/PHP, but i think commandline is the one way) for render and get the rendered content of URL, but the important its I need to renderer the Javascript not only the CSS/Html/images.

For example command like: "renderengine http://www.google.es outputfile.html" and the content of the web (parsed html and javascript executed) isa saved in outputfile.html.

I need this because i need to take the result of a full javascript website like grooveshark, the site load all using javascript/ajax and the crawlers dont find nothing, only basic HTML empty template (because is loaded after using ajax/javscript)

Exists any browser engine for linux with support to Javascript (for example V8) that output the result for save in files?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
505 views
Welcome To Ask or Share your Answers For Others

1 Answer

  • Selenium : very complete solution with bindings in many languages
  • puppeteer : headless Chrome API, usable in NodeJS or as a command-line tool
  • HTtrack : command-line tool
  • Apache Notch & webmagic : open source Java web crawlers
  • pholcus : "distributed & high concurrency" web crawler written in Go
  • Xvfb a display server implementing the X11 display server protocol, without showing any screen output. I have used it successfully with Travis CI and Protractor as an example. Alternative: XDummy
  • PhantomJS (first suggested by nvuono) : can export the rendered page as non-HTML (pdf, png...). PhantomJS development is suspended until further notice (more details). Closely related: SlimerJS, CasperJS

And there are many Python web scraping libraries:


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...