Enhancing WWW::Mechanize

Today I wanted to extract a report from a search appliance. This task involved logging in to the appliance, receiving a cookie, and following a few links. No problem with Mechanize, right?

agent = WWW::Mechanize.new
login_form = agent.get(THUNDERSTONE_URL).forms.first
login_form.set_fields(:iname => LOGIN, :ipass => PASSWORD)
page = agent.submit(login)
page = agent.click page.links.text(PROFILE_NAME)
errors = agent.get_file "#{page.uri}/tsverrors.csv"

Only it turned out that it wasn't receiving the cookie. After a bit of Firebug inspection, I realized that the cookie wasn't being sent in the HTTP headers, but instead in the document itself as a meta http-equiv. Evidently Mechanize doesn't pick those up. We can use the nifty built-in Hpricot Xpath parsing and extract the value of the cookie:

page.search("//meta[@http-equiv = 'Set-Cookie']").first.attributes['content']

But how do we tell Mechanize to use it? The cookie() method is read-only. Fortunately, set_headers() is a protected method that is available to subclasses. So, I decided to create a new class called Agent that adds a custom cookie to the headers.

class Agent < WWW::Mechanize
  attr_accessor :custom_cookie
 
  def set_headers(uri, request, cur_page)
    super(uri, request, cur_page)
    request.add_field('Cookie', custom_cookie) unless custom_cookie.nil?
    request
  end
end

Then I tried again with the new Agent class:

agent = Agent.new
login_form = agent.get(THUNDERSTONE_URL).forms.first
login_form.set_fields(:iname => LOGIN, :ipass => PASSWORD)
page = agent.submit(login)
agent.custom_cookie =
  page.search("//meta[@http-equiv = 'Set-Cookie']").first.attributes['content']
page = agent.click page.links.text(PROFILE_NAME)
errors = agent.get_file "#{page.uri}/tsverrors.csv"

It worked like a charm. I'm curious though why mechanize didn't grab the meta-cookie automatically. Bug or feature?

1 comment so far ↓

#1 apotheon on 05.09.08 at 12:53 pm

Maybe it’s meant to be a security feature. Cross-site scripting exploits usually insert code into the body of a page — and perhaps WWW::Mechanize is trying to protect you from that sort of thing.

Maybe it’s a case of standard enforcement (right or wrong). I’m not sure whether there’s any standard that says cookies shouldn’t be delivered outside the header, and I’m not a huge fan of applications that try to enforce standards (apps should be standards compliant, not standards enforcing), but it’s a possibility.

I suspect the most likely option is simply that it didn’t occur to whoever wrote the offending routine to make it search the whole document for cookies. Assumptions are often more controlling than conscious decisions.

Leave a Comment