Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have an ASPX page at https://searchlight.cluen.com/E5/CandidateSearch.aspx with a form on it, that I'd like to submit and parse for information.

Using Python's urllib and urllib2 I created a post request with the proper headers and user agent. But the resulting html response does not contain the expected table of results. Am I misunderstanding or am I missing any obvious details?

    import urllib
    import urllib2

    headers = {
        'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13)         Gecko/2009073022 Firefox/3.0.13',
        'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
        'Content-Type': 'application/x-www-form-urlencoded'
    }
    # obtained these values from viewing the source of https://searchlight.cluen.com/E5/CandidateSearch.aspx
    viewstate = '/wEPDwULLTE3NTc4MzQwNDIPZBYCAg ... uJRWDs/6Ks1FECco='
    eventvalidation = '/wEWjQMC8pat6g4C77jgxg0CzoqI8wgC3uWinQQCwr/ ... oPKYVeb74='
    url = 'https://searchlight.cluen.com/E5/CandidateSearch.aspx'
    formData = (
        ('__VIEWSTATE', viewstate),
        ('__EVENTVALIDATION', eventvalidation),
        ('__EVENTTARGET',''),
        ('__EVENTARGUMENT',''),
        ('textcity',''),
        ('dropdownlistposition',''),
        ('dropdownlistdepartment',''),
        ('dropdownlistorderby',''),
        ('textsearch',''),
    )

    # change user agent
    from urllib import FancyURLopener
    class MyOpener(FancyURLopener):
        version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127         Firefox/2.0.0.11'

    myopener = MyOpener()

    # encode form data in post-request format
    encodedFields = urllib.urlencode(formData)

    f = myopener.open(url, encodedFields)
    print f.info()

    try:
      fout = open('tmp.htm', 'w')
    except:
      print('Could not open output file
')

    fout.writelines(f.readlines())
    fout.close()

There are several questions on this topic that were helpful (such as how to submit query to .aspx page in python) but I'm stuck on this and asking for additional help, if that is possible.

The resulting html page is saying I may need to log in, but the aspx page displays in my browser without any login.

Here are the results from info():

Connection: close Date: Tue, 07 Jun 2011 17:05:26 GMT Server: Microsoft-IIS/6.0 X-Powered-By: ASP.NET X-AspNet-Version: 2.0.50727 Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Length: 1944

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
490 views
Welcome To Ask or Share your Answers For Others

1 Answer

ASP.Net uses a security feature that protects against tampering with the ViewState by embedding specific information in it.

More than likely, the server is rejecting your request because the ViewState is being treated as though it were tampered with. I can't say this with absolute certainty, but ASP.Net has several security features that are built in to the framework that may be preventing a direct post.

If session is involved at all, then you will also need to take that into account. To simulate what the browser is doing you will need to perform the following steps:

  1. Request the page.
  2. Save the collection of cookies to a variable.
  3. Extract the ViewState to a variable.
  4. Post with the appropriate form values, passing both the saved cookies and ViewState information along with the request.

A lot of work I know, but not too awfully difficult. Again, this may not be the sole source of your problems, but it is worth reading up on in order to start troubleshooting.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...