Labrador::Crawler::ContentFilter
use Labrador::Crawler::ContentFilter; my $filter = new Labrador::Crawler::ContentFilter( 'Binary', $config, $dispatcher_client); my $privs = {'index' => 0, 'follow' => 0}; $filter->filter($document, $privs); print "May not index\n" unless $privs->{'index'}; print "May not follow\n" unless $privs->{'follow'};
Abstract class. Must be implemented.
Content filters are responsible for looking at content and determining two things: a) if the content should not be indexed and b) if the links in the content should not be followed.
Constructs a new Content Filter object
Returns the name of this filter - useful for debugging warnings.
Abstract - each child class must provide this method, which alters the filter settings of $privs ('follow', 'index') according to some heuristic on the content.
An optional method that is called when the class is started, so that any child module can be initialiased
Load the module named $name.
$Revision: 1.4 $