Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm trying to use NodeJS to scrape a website that requires a login by POST. Then once I'm logged in I can access a separate webpage by GET.

The first problem right now is logging in. I've tried to use request to POST the login information, but the response I get does not appear to be logged in.

exports.getstats = function (req, res) {
    request.post({url : requesturl, form: lform}, function(err, response, body) {
        res.writeHeader(200, {"Content-Type": "text/html"});
        res.write(body);
        res.end();
    });
};

Here I'm just forwarding the page I get back, but the page I get back still shows the login form, and if I try to access another page it says I'm not logged in.

I think I need to maintain the client side session and cookie data, but I can find no resources to help me understand how to do that.


As a followup I ended up using zombiejs to get the functionality I needed

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.8k views
Welcome To Ask or Share your Answers For Others

1 Answer

You need to make a cookie jar and use the same jar for all related requests.

 var cookieJar = request.jar();
 request.post({url : requesturl, jar: cookieJar, form: lform}, ...

That should in theory allow you to scrape pages with GET as a logged-in user, but only once you get the actual login code working. Based on your description of the response to your login POST, that may not be actually working correctly yet, so the cookie jar won't help until you fix the problems in your login code first.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...