my $doc = new Labrador::Crawler::Document($HTTPresponse); my @links = $doc->links; my $content_type = $doc->content_type; print $doc->content;
This module is a wrapper around HTTP::Reponse, such that generic methods can be called on a Document object, and the correct child class will provide the data required. Child classes will be created for common types of document, including HTML, PDF, PS.
Constructs a new Document object from HTTP::Response object $response. If a specific child implementation exists for the given type of document, it will be used. Eg PDF, Postscript.
Initialiase the handler. This should be called from child handlers to ensure any common functionality is initiliased.
Returns the original content type of the response.
Passes through to HTTP::Headers->header
Returns the original HTTP::Response object returned by the request. NB: the content() method of HTTP::Response should not be used, as the data may have been compressed.
Return the URI object used during this request.
Returns an array of links found in this document.
Returns the downloaded content for this Document. Note this content may have been transformed or converted by the document handler. NB: To limit stack size, this returns a reference to the data.
Returns an MD5 sum (base 64) of the content of this data. This can be used for ignoring content across identical hosts.
Load the module named $name.
$Revision: 1.5 $