NAME

Labrador::Crawler::Manager::Partitioned

DESCRIPTION

Provides the crawler with all queuing operations. This manager supports URL partitioning.

METHODS

init()

Called automagically by the constructor Loads URLState, URLAlloc, and Partitioning modules

next_url

Provides the next_url to be fetched

when_next_available

Number of seconds until the next url is ready to be fetched

finished_url($url)

Mark $url as finished.

failed_url($url, $HTTPresponse)

Mark $url as finished. $HTTPresponse contains the HTTP::Response object which can be used to examine failure reasons and requeue appropraitely.

found_urls($url, @urls)

Enqueue @urls, that were all found in the page $url.

queue_status

Returns a string depicting the status of the queues. Useful for displaying when the crawler has no work.

get_stats

Returns a hash of statistics. Keys are master, delay and partition_size

PRIVATE METHODS

Not intended to be called from outside the class. Only documented for completeness.

_load_module($name)

Load the module called $name

=stripped($uri)

Used for spider trap detection - removes a URL with the fragment and querystring removed.

_update_queue()

Fetches any available URLs from the dispatcher

_done_url($uri, $url)

Mark $url as finished. Contains common code extracted from finished, failure and failure events.

REVISION

	$Revision: 1.16 $