use Labrador::Common::URLState::Normal; my $states = new Labrador::Common::URLState::Normal($data); $states->url('http://www.gla.ac.uk/#', time); print "Seen http://www.gla.ac.uk/# before" if $states->url_exists(http://www.gla.ac.uk/#');
Used for recording seen urls. Abstract class, must be implemented. Some papers recommend using a Bloom filter or digests of the URLs to save space.
-1 is in the master queue -2 is in the crawler queue -3 is with the crawler -4 failed >0 is the time we were informed by the crawler it finished crawling the URL
Constructs a new URLState object
Initialises this module. Automagically called by new()
Adds the $url to the hash if it doesnt exists. Returns the value it had if it was already there.
Returns 1 if url seen before
$Revision: 1.4 $