Optionally provide private feedback to help us improve this article...

Thank you for your feedback!


Understanding the SearchEngineSpiders.config file

The SearchEngineSpiders.config file within the configuration directory contains a editable list of search engine spiders recognized by the the forum. If a search engine spider is indexing your board and the spider is specified within this list the spider will be identified within the Who's On page.

The SearchEngineSpiders.config file is a simple XML formatted document containing a list of HTTP_USER_AGENT tags spiders use to identify themselves and the equivalent text to display for that spider within the forum.

Identifying Bots

The best method to identify and accommodate popular bots visiting your board is to look at your raw log files. Just like browsers search engine spiders use the HTTP_USER_AGENT header to transmit their identity. The HTTP_USER_AGENT header is used by InstantForum to identify spiders. For example say upon looking at your logs you identified the following entry as a search engine spider...

FAST-WebCrawler/2.2.6 (crawler@fast.no; https://www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html)

To accommodate for this bot you could create the following XML element within the SearchEngineSpiders.config file...

<Spiders>
 <Spider Agent="FAST-WebCrawler">FastWeb</Spider>
</Spiders>

The above example performs a partial match of the HTTP_USER_AGENT tag; a partial match is useful if you don't wish to maintain the SearchEngineSpiders.config file each time a search engine slightly revising the HTTP_USER_AGENT tag. You can also provide a full match as shown below however maintaining full matches can become very time-consuming...

<Spiders>
 <Spider Agent="FAST-WebCrawler/2.2.6 (crawler@fast.no; FastWebhttps://www.fast.no/faq/faqfastwebsearch/faqfastwebcrawler.html)">FastWeb</Spider>
</Spiders>

A number of popular search engines are provided within the SearchEngineSpiders.config file by default.

Excluding Bots

You can stop bots from crawling your board using the robots.txt file.