NAME

Labrador::Dispatcher::Processing

SYPNOPSIS

	use Labrador::Dispatcher::Processing;
	Labrador::Dispatcher::Processing::init($config);
	Labrador::Dispatcher::Processing::get_urls('crawler1', 5);	

DESCRIPTION

Main interface between commands and data processing. If running a crawler was a business, then this module would contain all the business logic.

FUNCTIONS

init($config)

Imports and loads the URL Allocator, the Crawler Allocator, and all the URLFilters.

register_crawler($crawler)

Register a crawler running on hostname $crawler to be allocated URLs.

crawler_disconnect($crawler)

Note that crawler named $crawler has disconnected.

masterqueue_size()

Returns the size of the master queue.

crawlerqueue_size

Returns the size of the crawler queues.

new_url($url, $linking_url)

This URL $url has been found on $linking_url. Add it to the queue if it passes the filters.

get_urls($crawler, $count)

Please find $count number of URLs for a subcrawler on $crawler to process.

finished_url($url)

Note that the crawler has finished crawling $url.

failed_url($url)

Note that the crawler has failed to crawl the url $url.

linking_url($url)

What is the URL that linked to $url?

link_url($url, $from)
get_stats()

Obtain a hash of all the statistics.

submit_stats($clientname, %stats)

Add the stats %stats submitted by $clientname to the running totals

add_fingerprint($md5)

Add the fingerprint denoted by $md5 to a hash of seen fingerprints.

have_fingerprint($md5)

Returns true if the page denoted by $md5 has been seen before.

filter($url)

Run $url through all registered URL filters

checkpoint

Save all checkpoint supporting data structures to disk.

pause

Pause allocating URLs to clients. Also forces a checkpoint.

start

Restart after a pause

eval($code)
Allow an administrator connection to execute arbitrary code. The following variables are made avaiable to the code being executed. =over 6
$crawleralloc
$urlalloc
$urlstates
$data
$md5
$stats
$referers

$return can be used to give data back to the calling client. $@ is also returned.