Skip to main content
Tips for Bots Started by vbgamer45 · · Read 1816 times 0 Members and 1 Guest are viewing this topic. previous topic - next topic

Tips for Bots

Tips for Bots
Use cloudflare for geo blocking of countries/asn's works great. You can also challenge users instead of block if you are concerned.
Code: [Select]
(ip.src.country eq "CN") or (ip.src.country eq "HK") or (ip.src.country eq "VN") or (ip.src.country eq "BR") or (ip.src.country eq "AR") or (ip.src.country eq "EC") or (ip.src.country eq "UY") or (ip.src.country eq "IR") or (ip.src.country eq "SG") or (ip.src.country eq "IQ") or (ip.src.country eq "BD") or (ip.src.country eq "VE") or (ip.src.country eq "CL") or (ip.src.country eq "PY") or (ip.src.country eq "MX") or (ip.src.country eq "PA") or (ip.src.country eq "BG") or (ip.src.asnum eq 136907) or (ip.src.country eq "SN")

Block old chrome versions or challenge if using cloudflare and block empty user agents
For apache httpd.conf below
Code: [Select]
# Block empty user agents
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^$ [NC]
RewriteRule .* - [F,L]

# Block Chrome below 120
RewriteCond %{HTTP_USER_AGENT} Chrome/([1-9][0-9]|10[0-9]|11[0-9])\. [NC]
RewriteRule .* - [F,L]
Clouldflare block chrome
Code: [Select]
(http.user_agent contains "Chrome/100." or http.user_agent contains "Chrome/101." or http.user_agent contains "Chrome/102." or http.user_agent contains "Chrome/103." or http.user_agent contains "Chrome/104." or http.user_agent contains "Chrome/105." or http.user_agent contains "Chrome/106." or http.user_agent contains "Chrome/107." or http.user_agent contains "Chrome/108." or http.user_agent contains "Chrome/109." or http.user_agent contains "Chrome/110." or http.user_agent contains "Chrome/111." or http.user_agent contains "Chrome/112." or http.user_agent contains "Chrome/113." or http.user_agent contains "Chrome/114." or http.user_agent contains "Chrome/115." or http.user_agent contains "Chrome/116." or http.user_agent contains "Chrome/117." or http.user_agent contains "Chrome/118." or http.user_agent contains "Chrome/119." or http.request.uri.query contains "action=printpage" or http.request.uri.path contains "printpage")


Turn off certain forum features for guests.

Make sure your site supports HTTP2 for your webserver.

Tweak your PHP/Database settings. User latest versions.

But in generally tweak, all settings, from webserver, php, database. The defaults are not enough for bigger sites.
Last Edit: February 26, 2026, 08:44:49 pm by vbgamer45
ElkarteMods.com - Addons, Products and more!

Re: Tips for Bots

Reply #1

Glad I'm not alone on this I've been in bot fighting mode on my sites for the last few days!

Other items that may help, depending on your site, traffic, location, etc.

Many requests are coming in coming in on groups of ipv4 /16, which is a group of ~65.5K address (xxx.xxx.123.123). For my sites that is not normal traffic but YMMV.  I wrote a script that grouped those /16 hits (from the access log) and if it finds more than xx IP's in a group (i use 10) in the last 15min's then I write it to a log file and use fail2ban to block that entire xxx.xxx.0.0 sub (use ipsec).  If you have some really small local group you can whitelist that sub. I now have over 400 of those subs blocked.

nginx has geoip2 (via max mind) so you can use that to GeoIP fence to countries and block ones you know are not in your zone. I know some folks take issue with that but honestly to bad, you have to work through an attack!  I will say, however, most of the bot traffic was out of US address (proxies) so Virginia TX and WA were common whois endpoints, but still that drops some of the crap.

Last thing that can be helpful is bots tend to flood on connection attempts.  Another script, this one groups connection limit failures (from the error.log) over a given limit/time threshold that also have PHPSESSION in the url and -> ban.  Guests are not opening 30+ connections to login or browse a site, and to be honest even with cache off and trying to beat on a site from your own IP, you will not trigger that either.

I may add that low chrome version check, more bot pain! I've seen high values but those are from variants (vivaldi for example), but i did not consider old cruft thanks for the idea!

Re: Tips for Bots

Reply #2

Yeah it was wrecking me and I had hardware firewall/software firewall, had to pull out all the stops.  Did switch to cloudflare as well which helped a ton.
I have done so much apache/fastcgi tweaking along with the database to handle the loads.    The worst is when I would get hit 50k to 100k bots at one time.
I do a lot of research the asn's i use ip2location.co.m If you use cloudflare be careful with your mx record proxing/email sending I run reports via https://mxwhiz.com/ to do double check  (mine btw)

Downside of blocking older chrome versions is windows 7 users would be cut off if they used chrom.

Re: Tips for Bots

Reply #3

The session_start on every GET request, combined with db session storage, has a dramatic impact on the server. As an immediate mitigation, I forced sessions to use cookies:

Code: [Select]
sources/Session.php:

@ini_set('session.use_only_cookies', true);

I then configured nginx to no longer serve requests that contain the session id. This only helps until the bots stop including the session id in their requests.

I'll probably move the session management to a ramdisk until I can figure out how to lean out the need for sessions by unregistered guests.

Re: Tips for Bots

Reply #4

I simply use cloudflare as my dns server, so never use any of the above. Not sure about bots as I never keep track on them.
Last Edit: April 04, 2026, 05:53:02 am by ahrasis

Re: Tips for Bots

Reply #5

Quote from: nwsw – The session_start on every GET request, combined with db session storage, has a dramatic impact on the server. As an immediate mitigation, I forced sessions to use cookies:

Code: [Select]
sources/Session.php:

@ini_set('session.use_only_cookies', true);

I then configured nginx to no longer serve requests that contain the session id. This only helps until the bots stop including the session id in their requests.

I'll probably move the session management to a ramdisk until I can figure out how to lean out the need for sessions by unregistered guests.

Note that PHP is deprecating the passing of PHPSESSID via URL in 8.x, and it will be removed in 9.0. 

That particular setting, 'use_only_cookies', will be retired soon - mainly because setting it to false is soon to be disallowed.  More here:
https://wiki.php.net/rfc/deprecate-get-post-sessions

So...  The idea is good - don't use PHPSESSID, and, since you're not generating it anymore, you can then block it via .htaccess. 

SMF implementation: https://github.com/SimpleMachines/SMF/pull/8394

One part of the SMF implementation, this commit, can save a LOT of resources.   It's causing some issues for forums that have guest-browsing disabled, though... Those issues are currently being addressed.:
https://github.com/SimpleMachines/SMF/pull/8394/changes/2f2a5e0ae404fd1adb408b87896ce00cca1715ec

The basic idea is that, since you cannot pass by URL, you MUST pass by cookie.  So...  When cookies are disabled, there is no way to pass the session.  At all...  So, don't even bother writing it.  Note certain classes of bots either block cookies or don't use them, or pass their own PHPSESSID...  All these variants cause more session writes. 

These changes will be a hard requirement before PHP 9.0.

You are effectively giving bots total control over your DB writes...   One step further, since they can flood you with writes, they can overwhelm your undo/redo logs.  Which can further lead to issues with backups.  Which can cause performance issues & even bring your site down... 

So stop that...

The savings can border on the ridiculous:
VGF-cpu-2025-01-13.png

In addition, this note outlines even further savings.  The goal is to avoid driving up CPU during bot storms.  I've been testing these on my site.  Check out the CPU charts before/after:
https://www.simplemachines.org/community/index.php?msg=4199062

The more broad notes found here might also help:
https://www.simplemachines.org/community/index.php?topic=593895.0
Last Edit: April 02, 2026, 03:18:54 pm by shawnb61


Re: Tips for Bots

Reply #7

Quote from: ahrasis –
Quote from: "shawnb61" – you can then block it via .htaccess.
What if we are not using apache2, but nginx instead?

This may help...
Last Edit: April 04, 2026, 11:50:12 am by Steeley

// Deep inside every dilemma lies a solution that involves explosives //

Re: Tips for Bots

Reply #8

For the record. my site uses htaccess to make a significant part of my site "private". My Forum resides "behind the wall". the 'public facing pages' explain the topic of the site and provides lots of 'general' information. The public side also informs of the existence of the forum (could certainly provide some sample screenshots if I was fishing for members, but in my case it's not necessary).

There's a link off the main menu of my site for requesting access to the restricted side. It's two-step process.. Click the link, it brings up a simple form.. you enter your email address, and submit.  The form submittal generates a reply to the entered address, and embeds a 6-digit random code.  It also copies me with that email. 

If the applicant doesn't get an email, it means it was typed wrong. Go back and try again. I want to make sure I send the access credentials to a valid address!

Meanwhile, submittal also brings up a second form for the applicant with instructions about what to do now..  specifically, says to retrieve the email just sent to them, copy the email address AND the 6-digit code into the new form, and fill out a few other fields in the form, information that I use to validate the applicant as someone allowed access to the forum. (In my case, it's people stationed at a particular location during the 'Nam war - things they know about it nobody else would. Your mileage would vary, of course).  I ask for his nickname or call sign - that will inform the username I'll give him if he provides one, otherwise I'll tweak his first name or last name.

When the applicant submits that second form, I comes to me in Email.

[Note: If I were to use the same credentials for everyone. and it gets compromised, now I have a problem - everything "behind the wall" is compromised. I'd have to change it and inform EVERYONE, and likely the unauthorized person too in the process. With unique credentials for each user, I know which account is compromised and can address that one.]
(Steeley's law: If you be lazy up front, you be hate'n your life later. All non-academic knowledge comes in suppository form).

I'll create a password for him based on info he provides.. something easy he'll remember.. (OK, so he was a hydraulics mech with Squadron 345, and a Sgt, so password "bubbles345sgt" will suffice..) I send his unique access info back..

All of the access email requests and my response email providing their unique access credentials are stored in my local computer. 

Voilla! Now they have access to the forum to register. (I don't have guest read access,  so they must register, to get in.)

BOTTOM LINE:

My forum does not get spammed. At all.  Ever. No bots, except the spiders looking for something new on the public side. Occasionally, they get lucky and find something new to link and memorialize. All the private personal information from the guys we share with each other is kept hidden so as not to scare or freak out the civilian visitors -that is back in the restricted area, and secure and snug as a bug in a old forum software version.

If you run a "really large" form with lots and lots of users, just set up a database of users that forbids duplicate username entries to keep them all unique. Otherwise, if you create a duplicate, it overwrites the original and the first username entity can't get in any more..  (and note; username "Mac" is different than "MAC" and "mac" as far as htaccess usernames are concerned).

Oddly enough, I get very few "fakers"... I do get a fair number of first emails, but the second form presented pretty much stops them in their tracks .. they know they ain't gonna fake that stuff (many don't even know what the heck it's asking for in some fields), a couple squirrels gave it a shot over the past 7+ years, but my BS detector is pretty good and an email reply back to them requesting "clarification" never gets answered..  

Anyway, if you want to see how it works, PM me for the website url.. If you're a legit user and not a bot, I'll "get back to ya.."
Last Edit: April 06, 2026, 02:02:12 am by Steeley

// Deep inside every dilemma lies a solution that involves explosives //

Re: Tips for Bots

Reply #9

I like the idea of what you did, could this be best, if we make this one of the ElkArte default features, with option to be disabled, or not, so once setup and install, an ElkArte forum is bots free or bots less?

What do you think @Spuds?

Re: Tips for Bots

Reply #10

Well, I built the scheme using basic opensource files, and basic html. And I'm not a programmer by any serious definition.

First thing you need is a directory-privacy feature (htaccess, in Apache, for example). C-Panel makes it easy to manage. However, that's not something ElkArte can do.. you create the protected directory on your server, and then tuck ElkArte behind it.. It's the directory-privacy feature that keeps the bots out and away from ElkArte (and anything else you don't want the unwashed masses to access). It's not the most secure way, but I've not yet had anyone try to "hack in" (except valid users that couldn't remember their credentials.. eventually they send me an email or another access request..)

Then you need some software routines to support your validation of people that will have access to the protected area..

email-code.php (located in cgi-bin)

Code: [Select]
<?php
if(!isset($_POST['submit']))
{
//This page should not be accessed directly. Need to submit the form.
echo "error; you need to submit the form!";
}
$visitor_email = $_POST['email'];
$message = rand(100000, 999999);

//Validate first
if(empty($visitor_email))
{
    echo "Email address is mandatory!";
    exit;
}
if (!strstr($visitor_email, '@'))
{
    echo "EMail Format is Incorrect - missing @ - press back button and try again";
    exit;
}

if (strpos($visitor_email, ','))
{
    echo "EMail Format is Incorrect - contains a comma - press back button and try again";
    exit;
}

if(IsInjected($visitor_email))
{
    echo "Bad email value!";
    exit;
}

$email_from = 'no-reply@domain.url';
$email_subject = "Validation Code";
$email_body =
"Copy the following validation code into the request form. \n".
"(If you did not request a code, someone entered an incorrect email address,\n". "please accept our apologies and delete this email.) \n".
"Validation Code: $message. \n".
"Submitter address: \n".
    
$to = "$visitor_email \r\n";
$headers = "From:  no_reply@domain.url \r\n";
$headers .= "Reply-To: no_reply@domain.url \r\n";
$headers .= "Bcc: access@domain.url \r\n"; //(goes to admin to match with later applicant form)
//Send the email!
mail($to,$email_subject,$email_body,$headers);
//done. redirect to code-pending page.
header('Location: ../pending.html');

form-to-email.php   (located in cgi-bin)

Code: [Select]
 <?php
if(!isset($_POST['submit']))
{
//This page should not be accessed directly. Need to submit the form.
echo "error; you need to submit the form!";
}
$Fname = $_POST['First'];
$Lname = $_POST['Last'];
$visitor_email = $_POST['email'];
$code = $_POST['code'];
$username = $_POST['Nick'];
//{other fields you want here}
$message = $_POST['Narrative'];
//Validate first
if(empty($Fname))
{
    echo "First name is required! Please press back button and correct";
    exit;
}
if(strpos($Fname, '_'))
{
    echo "Valid first and last name is required! Please press back button and correct";
    exit;
}
if(empty($Lname))
{
    echo "Last name is required! Please press back button and correct";
    exit;
}
if(strpos($Lname, '_'))
{
    echo "Valid last name is required! Please press back button and correct";
    exit;
}
if(empty($visitor_email))
{
    echo "Email is required! Please press back button and correct";
    exit;
}
if(IsInjected($visitor_email))
{
    echo "Bad email value!";
    exit;
}
if(empty($code))
{
    echo "Verification Code is required. Please press back button and correct";
    exit;
}
if(empty($username))
{
    echo "If you do not provide a nickname/username, you might not like what we assign. Please press back button and correct";
    exit;
}
if(strpos($username, '/'))
{
    echo "If you do not provide a nickname/username, you might not like what we assign. Please press back button and correct";
    exit;
}
//if(empty{otherfields}))
//{
//    echo "Tthis data is required! Please press back button and correct";
//    exit;
//}
$email_from =  'no-reply@domain.url';
$email_subject = "Access Request";
$email_body = "$Fname $Lname has submitted the following information for access: \n".
    "EMail Address: $visitor_email \n".
    "Validation code: $code \n".
    "Username: $username \n".
    //etc.. for additional fields
    "$username says: \n  $message \n".
    
$to = "access@domain.msg \r\n";
$headers = "From: $email_from \r\n";
$headers .= "Reply-To: $visitor_email \r\n";
//Send the email!
mail($to,$email_subject,$email_body,$headers);
//done. redirect to submitted page.
header('Location: ../submitted.html');

// Function to validate against any email injection attempts
function IsInjected($str)
{
  $injections = array('(\n+)',
              '(\r+)',
              '(\t+)',
              '(%0A+)',
              '(%0D+)',
              '(%08+)',
              '(%09+)'
              );
  $inject = join('|', $injections);
  $inject = "/$inject/i";
  if(preg_match($inject,$str))
    {
    return true;
  }
  else
    {
    return false;
  }
}
  
?>

gen_validatorv31.js is located in cgi-bin/scripts for form validation

And of course, html pages

web form for email addy that will drive the above

gets email addy, assign code and send email..
working guts are:
Code: [Select]
(snip)
<SCRIPT language=JavaScript type=text/javascript
src="scripts/gen_validatorv31.js"></SCRIPT>
</HEAD>
<BODY >
(your instructions and formatting here..)
<TBODY>
        <TR>
          <TD><!-- Start code for the form-->
            <DIV align=center>
            <FORM method=post name="EMail Address Verification"
            action=../cgi-bin/email-code.php>
            <P>STEP 1</P>
            <P><LABEL for=email>Enter Your Email Address
            (carefully)</LABEL><BR><INPUT style="HEIGHT: 22px; WIDTH: 338px"
            size=21 name=email> </P>
            <P></P><INPUT type=submit value=submit name=submit> </FORM></DIV>
      <SCRIPT language=JavaScript>
           var frmvalidator  = new Validator("EMail Address Verification");
           frmvalidator.addValidation("email","req","Please provide your email");
           frmvalidator.addValidation("email","email","Please enter a valid email address");
      </SCRIPT>

Submit brings up a "pending" webpage, tells submitter to get the email sent, and then click on "I have the code".
When they click on that, it brings up the authorization-data html form.

That form contains the same form structure as the form above, but with different fields, and calls a different form-email routine
 
Code: [Select]
snip
 <TD><!-- Start code for the form-->
            <DIV align=center>
            <FORM method=post name="Access Data"
            action=../cgi-bin/form-to-email.php>

 but with other fields (first name, last name, nickname/username, and any other pertinent information you desire to validate the applicant. Upon submit, it calls form-to-email.php, instead of email-code.php, and sends the applicant to a "success" page, informing them to wait for another email with their authorization data. and what to do with it to get into the "restricted section" behind the htaccess-directory check..

Of course, you would want to customize your authorization webpages to your needs, but the crux of the code is above.. it's simple stuff, really.. for example, I tell users to let their webbrowser remember the authorization data so it's just one extra mouse click to get into the restricted area. (HOWEVER - the ducklduckgo webbrowser - and maybe others - doesn't know wtf a directory-access prompts looks like so it doesn't bring up the username password like it does for other account-access prompts). Edge, Chrome, Firefox etc. does.  

EDIT: OH, and you have have to configure mail.php to send smtp mail through your mail server, simple in concept, but your actual mileage may vary..
Last Edit: April 07, 2026, 02:56:35 pm by Steeley

// Deep inside every dilemma lies a solution that involves explosives //

Re: Tips for Bots

Reply #11

Quote from: "shawnb61" – So...  The idea is good - don't use PHPSESSID, and, since you're not generating it anymore, you can then block it via .htaccess.
Thanks for sharing your changes, good stuff and I'll follow suit and force cookies.

I been playing with various defensive options on several sites during bot storms including custom jails for fail2ban or behaviors in crowdsec but when they hit hard those functions can begin to consume a lot of CPU and have memory growth problems to the point your server will likely be overwhelmed.  They are however effective.

Country Code / ASN blocks help somewhat, really depends on how aggressive you want to be.  Many of the bot blocks come out of US based residential groups so things can get ugly if you just block at the ASN level.

Connection and rate limiting in Nginx are also effective, but still have an impact on resources as as its still handling the traffic even if its 444 response.

Bots also love sending urls with printpage and prev_next links, a sure sign of crawler activity.  I use those as honeypots of sorts to say bu-bye.

I was often seeing 10-15K guests during a storm, I know for other sites this is trivial, but for my VPS it was just to much the bots could just roll through ip/24 or ip/20 as needed (some ipv4 some ipv6).

I was (am) using the PHPSESSION trick to detect bot activity as well, you will often see the same PHPSESSION being used across multiple IP's.  As I blocked on that over time, those started to not be included in the crawl links.  Ah the gift of AI.

Ultimately I ended up using Cloudflare and its free level reverse proxy.  I did not want to, you know you feel like you have been beat!  After enabling that and a few custom security rules I do not see more than 100 guests at a time, which is quite normal.  What is nice about the service is you can add Country/ASN blocks as wanted, and various security checks, like URL sniffs etc.  These requests are then dropped at the edge and your server never has to deal with any of that traffic.

If you do not want to fully block on a specific rule they do have two forms of challenges, the intrusive one with the annoying check box, and a passive one that checks for .... are cookies enabled? and is JS running?  I use the later, real users/guests are not impacted and bots fail.  They also have the appropriate allow rules for legit bots based on browser id and correct ASN for them.   One last aspect of this, once you have it enabled, you then only accept traffic from cloudflare IP's (and you know your own just in case LOL) as those are vetted queries.  Overall for me a massive reduction on site abuse. I still use crowdsec to further protect other areas, mail, ssh, ftp, etc but it has a lot less work to do.

All of this is not going to get better, the AI bots have access to what is essentially unlimited resources and will adapt on how to scrape a site down to its bones.