Skip to main content
Updated Spider List Started by Burke Knight · · Read 113 times 0 Members and 3 Guests are viewing this topic. previous topic - next topic

Updated Spider List

I remember SMF used to have a mod that added to the Spider list and wonder if anyone has made one for ElkArte?

Re: Updated Spider List

Reply #1

That was Karl Benson (sp?) He used to have a long list of spiders that you could add.  No idea if it has been updated, but TBH given todays internet its a full time effort to keep it updated.

There is a good overall blocker for either Apache or Nginx available here. https://github.com/mitchellkrogza Thats a bit more effort than just a robots.txt list but really blocks them early in the process.

If you just want a list of bad spiders/bots/AI agents ... here is the robot.txt file from the above. https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/blob/master/robots.txt/robots.txt I think there may be a couple of blocked ones that are arguably SEO useful, but YMMV.

I use that robot.txt file to politely ask them to not scan/scrape  (which is all that really does, ask) and then the things I have in the ElkArte spiders list are what I consider "safe" or necessary/good for SEO, just so I can see who is doing what/when. 

If you have a really problem with bots, then installing the full code from above is the way to go as it uses iptables and fail2ban among others.

I've attempted to keep an updated list of interesting ones (a.k.a probably SEO useful) for 2.0, here is the current install section
Code: [Select]
		return $this->db->insert('ignore',
'{db_prefix}spiders',
array('spider_name' => 'string', 'user_agent' => 'string', 'ip_info' => 'string'),
array(
array('Amazon', 'Amazonbot', ''),
array('Anthropic-AI', 'anthropic-ai', ''),
array('Anthropic-AI (Bot)', 'ClaudeBot', ''),
array('Anthropic-AI (Claude)', 'claude-Web', ''),
array('Apple', 'Applebot', ''),
array('Baidu', 'Baiduspider', ''),
array('Bing', 'bingbot', ''),
array('Bing (Preview)', 'BingPreview', ''),
array('CCBot', 'CCBot', ''),
array('Diffbot', 'Diffbot', ''),
array('DoCoMo', 'DoCoMo', ''),
array('DuckDuckGo', 'duckduckgo', ''),
array('DuckDuckGo (Assist)', 'DuckAssistBot', ''),
array('Ecosia', 'Ecosia', ''),
array('Exabot', 'Exabot', ''),
array('Google', 'Googlebot', ''),
array('Google (AdSense)', 'Mediapartners-Google', ''),
array('Google (Adwords)', 'AdsBot-Google', ''),
array('Google (Bard)', 'Google-Extended', ''),
array('Google (Image)', 'Googlebot-Image', ''),
array('Google (ImageProxy)', 'GoogleImageProxy', ''),
array('Google (Mobile)', 'Googlebot-Mobile', ''),
array('Google (News)', 'Googlebot-News', ''),
array('Google (Video)', 'Googlebot-Video', ''),
array('Gravityscan', 'Gravityscan', ''),
array('InternetArchive', 'ia_archiver-web.archive.org', ''),
array('Jakarta', 'Jakarta Commons', ''),
array('Kraken', 'Kraken', ''),
array('LinkedIn', 'LinkedInBot', ''),
array('MegaIndex', 'MegaIndex.ru', ''),
array('Meta/Facebook', 'FacebookBot', ''),
array('Meta/Facebook', 'meta-externalagent', ''),
array('Meta/Facebook (Hit)', 'facebookexternalhit', ''),
array('MSN', 'msnbot', ''),
array('MSN (Mobile)', 'MSNBOT_Mobile', ''),
array('Omgili', 'Omgili', ''),
array('Open-AI (Bot)', 'GPTBot', ''),
array('Open-AI (SearchBot)', 'OAI-SearchBot', ''),
array('Open-AI (User)', 'ChatGPT-User', ''),
array('Perplexity (User)', 'Perplexity-User', ''),
array('PerplexityBot (Bot)', 'PerplexityBot', ''),
array('Slack', 'Slackbot', ''),
array('Sogou', 'Sogou', ''),
array('Teoma', 'teoma', ''),
array('Tik-Tok', 'Bytespider', ''),
array('Timpi', 'TimpiBot', ''),
array('Twitter', 'TwitterBot', ''),
array('Yahoo!', 'slurp', ''),
array('Yahoo! (Blogs)', 'Yahoo-Blogs', ''),
array('Yahoo! (Feeds)', 'YahooFeedSeeker', ''),
array('Yahoo! (Image)', 'Yahoo-MMCrawler', ''),
array('Yahoo! (Mobile)', 'YahooSeeker/M1A1-R2D2', ''),
array('Yandex', 'YandexBot', ''),
array('Yandex (Blogs)', 'YandexBlogs', ''),
array('Yandex (Images)', 'YandexImages', ''),
array('Yandex (Media)', 'YandexMedia', ''),
array('Yandex (Video)', 'YandexVideo', '')
);
}
Last Edit: Today at 09:49:09 am by Spuds

 

Re: Updated Spider List

Reply #2

Not really looking to ban spiders, but make it so they show as spiders in Who's Online list, so I can know.
Could you do a simple addon to update the list some time?