Labrador::Dispatcher::Processing
use Labrador::Dispatcher::Processing; Labrador::Dispatcher::Processing::init($config); Labrador::Dispatcher::Processing::get_urls('crawler1', 5);
Main interface between commands and data processing. If running a crawler was a business, then this module would contain all the business logic.
Imports and loads the URL Allocator, the Crawler Allocator, and all the URLFilters.
Register a crawler running on hostname $crawler to be allocated URLs.
Note that crawler named $crawler has disconnected.
Returns the size of the master queue.
Returns the size of the crawler queues.
This URL $url has been found on $linking_url. Add it to the queue if it passes the filters.
Please find $count number of URLs for a subcrawler on $crawler to process.
Note that the crawler has finished crawling $url.
Note that the crawler has failed to crawl the url $url.
What is the URL that linked to $url?
Obtain a hash of all the statistics.
Add the stats %stats submitted by $clientname to the running totals
Add the fingerprint denoted by $md5 to a hash of seen fingerprints.
Returns true if the page denoted by $md5 has been seen before.
Run $url through all registered URL filters
Save all checkpoint supporting data structures to disk.
Pause allocating URLs to clients. Also forces a checkpoint.
Restart after a pause
$return can be used to give data back to the calling client. $@ is also returned.