NAME

Labrador::Common::RobotsCache

SYPNOPSIS

	use Labrador::Common::RobotsCache;
	my $robotscache = new Labrador::Common::RobotsCache($config);
	my @file;
	if (! @file = $robotscache->get_file('www.gla.ac.uk');
	{
		#fetch http://www.gla.ac.uk/robots.txt using HTTP 
		#..
		#save to cache
		$robotscache->set_file('www.gla.ac.uk', @file);
	}

DESCRIPTION

Implements a disk-cache of robots.txt for hosts. Files are expired after a default of 25 days.

CONFIGURATION

Behaviour can be altered by the following configuration file options:

RobotsTxtExpiry
Number of days to keep a robots.txt file in cache. Defaults to 25 days.
RobotsTxtCache
Absolute location to use as the disk cache for robots.txt cache. Will be created if it does not exist. Defaults to 'data/robots.txt' relative to configuration directive Base
Base
Used as the file base for default robots.txt file cache directory. Defaults to '../'

METHODS

new($configuration)

Constructor. Calls init() automatically;

init

Initialises class, loading appropriate directives from configuration file.

cached($hostname)

Returns a boolean determining whether the cache contains the robots.txt file for the given $hostname;

get_file($hostname)

Retrieve the robots.txt file for $hostname. Note that an empty array signifies that the file was not found in the cache, and a single comment ('#') implies that the given host has no robots.txt file.

set_file($hostname, @contents)

Update the cache with the robots.txt file for $hostname.

REVISION

	$Revision: 1.7 $