NAME

Labrador::Crawler::Dispatcher_Client

SYNOPSIS

	my $client = new Labrador::Crawler::Dispatcher_Client('localhost' 2460);
	$client->connect;
	my $forkcount = $client->WORK;
	$client->disconnect;

DESCRIPTION

This object provides an easy way for each sub crawler to talk to the dispatcher. It encapsulates all possible VERBs (questions).

For more information about how this class talks to the dispatcher, please refer to docs/protocol.txt

METHODS

Methods are documented below. Most are named after the VERB they send. Futher information can be found in docs/protocol.txt

new($dispatcher_hostname, $dispatcher_port)

Instantiates the object, but does NOT open a connection to the dispatcher. This is done separately using connect(). This is a two-state object (connection open, and connection closed).

connect

Opens a (TCP) connection to the dispatcher. Returns 1 if successfully connected and protocol handshake succeeded, 0 otherwise.

disconnect

Sends QUIT, and closes the (TCP) connection with the dispatcher.

VERBS

This object supports the following verbs, and they may be called as described.

WORK

Queries the dispatcher, to see how many clients a host fork

CONF

Obtain the configuration file from the dispatcher. Returns an array of the configuration text file lines, or an empty array if failure.

NEXT($n)

Obtain the next $n URLs to process from the dispatcher.

FINISHED($url, @links)

Ask the dispatcher to mark $url as finished, and add @links to the master queue.

ALLOWED($url)

Returns the result of the URL filters of the dispatcher on this URL. Implements local caching on the result of filtering $url. NB: this could grow extremely large over pro-longed usage.

ROBOTS($hostnameport)

Asks the dispatcher for the robots.txt file for the hostname and port given. (Joined by :). Will return ('#') for an empty file, or blank for not cached by the dispatcher.

ROBOTSFILE($hostnameport, @file)

Submit a robots.txt (@file) for server running on specified $hostnameport to the dispatcher, so that other crawlers can access it.

STATS(%stats)

Submit the stats of this subcrawler to the dispatcher where they can be aggregated.

FAILED($url, $reason)

Inform the dispatcher that the retrieval of $url failed because of $reason.

NOOP

Just checks a reply can be obtained from the dispatcher. Useful for checking connectivity with a client.

MONITOR

Obtain the stats hash from the dispatcher. Mainly used for monitoring the progress of a crawl.

Private methods

These are documented for completeness, but should only be used internally.

_command($command, $arg1, @args)

Sends a command with verb $command to the dispatcher. $arg1 is appended to the verb (following a space). @args are appended as separate lines to the request. returns ($status_code, @all_returned_lines);

last_result_code()

Returns the last status code from the last command executed. Returns 0 if no command has yet been executed.

REVISION

	$Revision: 1.16 $