Labrador::Crawler::Dispatcher_Client
my $client = new Labrador::Crawler::Dispatcher_Client('localhost' 2460); $client->connect; my $forkcount = $client->WORK; $client->disconnect;
This object provides an easy way for each sub crawler to talk to the dispatcher. It encapsulates all possible VERBs (questions).
For more information about how this class talks to the dispatcher, please refer to docs/protocol.txt
Methods are documented below. Most are named after the VERB they send. Futher information can be found in docs/protocol.txt
Instantiates the object, but does NOT open a connection to the dispatcher. This is done separately using connect(). This is a two-state object (connection open, and connection closed).
Opens a (TCP) connection to the dispatcher. Returns 1 if successfully connected and protocol handshake succeeded, 0 otherwise.
Sends QUIT, and closes the (TCP) connection with the dispatcher.
This object supports the following verbs, and they may be called as described.
Queries the dispatcher, to see how many clients a host fork
Obtain the configuration file from the dispatcher. Returns an array of the configuration text file lines, or an empty array if failure.
Obtain the next $n URLs to process from the dispatcher.
Ask the dispatcher to mark $url as finished, and add @links to the master queue.
Returns the result of the URL filters of the dispatcher on this URL. Implements local caching on the result of filtering $url. NB: this could grow extremely large over pro-longed usage.
Asks the dispatcher for the robots.txt file for the hostname and port given. (Joined by :). Will return ('#') for an empty file, or blank for not cached by the dispatcher.
Submit a robots.txt (@file) for server running on specified $hostnameport to the dispatcher, so that other crawlers can access it.
Submit the stats of this subcrawler to the dispatcher where they can be aggregated.
Inform the dispatcher that the retrieval of $url failed because of $reason.
Just checks a reply can be obtained from the dispatcher. Useful for checking connectivity with a client.
Obtain the stats hash from the dispatcher. Mainly used for monitoring the progress of a crawl.
These are documented for completeness, but should only be used internally.
Sends a command with verb $command to the dispatcher. $arg1 is appended to the verb (following a space). @args are appended as separate lines to the request. returns ($status_code, @all_returned_lines);
Returns the last status code from the last command executed. Returns 0 if no command has yet been executed.
$Revision: 1.16 $