NAME

Labrador::Crawler:Document::HTML

SYNOPSIS

	use Labrador::Crawler::Document;
	my $doc = new Document(\$data, $req);
	#$doc will be instantiated with appopriate subclass if one exists

DESCRIPTION

This is the custom subclass of Labrador::Crawler::Document for HTML classes. It provides two ways to extract links from an HTML document - using HTML::LinkExtor (which is part of the standard HTML::Parser distribution), or if it's available HTML::LinkExtractor. HTML::LinkExtractor is preferred as this also extracts Anchors texts of links.

METHODS

init: Initialise the class.
links: Extract a list of links from the page.

PRIVATE METHODS

_linkextor: Extract links from HTML document using HTML::LinkExtor
_linkextractor: Extract links using HTML::LinkExtractor

REVISION

	$Revision: 1.4 $