NAME

Labrador::Crawler::ContentFilter::MetaRobots

DESCRIPTION

This content filter responsible for filtering out documents that state in their HTML that they do not wish to be indexed, or their links followed.

For more information on this standard, which uses META tags embeeded within HTML, see http://www.robotstxt.org/wc/faq.html#noindex and http://www.w3.org/pub/WWW/Search/9605-Indexing-Workshop/ReportOutcomes/Spidering.txt

NB: You should not turn the ContentFilter out unless you know exactly what you're doing. Obeying META tags is as important as obeying /robots.txt files.

METHODS

filter($document, $privs)

Disabled the follow and index privileges if the HTML META tags specify they should be.

REVISION

	$Revision: 1.3 $