NAME

Labrador::Crawler:Document::HTML

SYNOPSIS

	use Labrador::Crawler::Document;
	my $doc = new Document(\$data, $req);
	#$doc will be instantiated with appopriate subclass if one exists

DESCRIPTION

This is the custom subclass of Labrador::Crawler::Document for HTML classes. It provides two ways to extract links from an HTML document - using HTML::LinkExtor (which is part of the standard HTML::Parser distribution), or if it's available HTML::LinkExtractor. HTML::LinkExtractor is preferred as this also extracts Anchors texts of links.

METHODS

init

Initialise the class.

links

Extract a list of links from the page.

PRIVATE METHODS

_linkextor

Extract links from HTML document using HTML::LinkExtor

_linkextractor

Extract links using HTML::LinkExtractor

REVISION

	$Revision: 1.4 $