| Caution! This message was sent from outside the University of Manitoba. |
Yes, you would then need parentheses within the the one quoted string
for the entire pattern, rather than quoting the individual substring
patterns to be matched...
RewriteCond %{HTTP_USER_AGENT
"(googlebot|bingbot|Baiduspider|AhrefsBot/6.1|Ahrefs|Baiduspider|BLEXBot|SemrushBot|claudebot|YandexBot/3.0|Bytespider)"
[NC]
As for the unknown robot(s), you'd best look at the raw access log files
to see what the actual UserAgent string(s) is/are, as Adam suggested.
Gilbert
On 2025-04-22 3:05 p.m., Montana Quiring wrote:
> Ahh ok, thanks.
> I actually had the names of a bunch of bots in there, so wouldn't I need
> the parentheses?
> ie:
> RewriteCond %{HTTP_USER_AGENT}
> (googlebot|bingbot|Baiduspider|"AhrefsBot/6.1"|"Ahrefs"|"Baiduspider"|"BLEXBot"|"SemrushBot"|"claudebot"|"YandexBot/3.0"|Bytespider) [NC]
>
> Regards,
> -Montana
>
>
> On Tue, Apr 22, 2025 at 2:56 PM Gilbert Detillieux
> <Gilbert.Detillieux@umanitoba.ca
> <mailto:Gilbert.Detillieux@umanitoba.ca>> wrote:
>
> I think Adam is suggesting to use a regex in the RewriteCond, to avoid
> the problematic characters in the pattern...
>
> https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond <https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond>
>
> ... states that "CondPattern is usually a perl compatible regular
> expression, but there is additional syntax available to perform other
> useful tests against the Teststring:".
>
> So, something like this might work...
>
> RewriteCond %{HTTP_USER_AGENT} "Unknown robot identified by bot.." [NC]
>
> BTW, I don't think you want parentheses around the string, as that's
> probably not supported syntax. (Parentheses within the string will
> have
> the usual PCRE syntax and semantics.)
>
> Hope this helps.
>
> Gilbert
>
> On 2025-04-22 2:05 p.m., Montana Quiring wrote:
> > Sorry man, excuse my ignorance, but not sure what you are asking.
> > I got the bot name from AWstats, which I assume is just ASCII.
> >
> > Regards,
> > -Montana
> >
> >
> > On Tue, Apr 22, 2025 at 1:58 PM Adam Thompson
> <athompso@athompso.net <mailto:athompso@athompso.net>
> > <mailto:athompso@athompso.net <mailto:athompso@athompso.net>>> wrote:
> >
> > Urlencode or octal? Or if it's a regex just use ".".
> > -Adam
> >
> > Get Outlook for Android <https://aka.ms/AAb9ysg
> <https://aka.ms/AAb9ysg>>
> >
> ------------------------------------------------------------------------
> > *From:* Montana Quiring <montanaq@gmail.com
> <mailto:montanaq@gmail.com> <mailto:montanaq@gmail.com
> <mailto:montanaq@gmail.com>>>
> > *Sent:* Tuesday, April 22, 2025 1:47:31 PM
> > *To:* Continuation of Round Table discussion
> <roundtable@muug.ca <mailto:roundtable@muug.ca>
> > <mailto:roundtable@muug.ca <mailto:roundtable@muug.ca>>>
> > *Subject:* [RndTbl] .htaccess file: stopping robot with escape
> > character in name
> > Hello Folks,
> >
> > I'm trying to stop a bot from crawling a site using the .htaccess
> > file. The problem is that it's using the backslash character
> as its
> > name. Grrr...
> > It's called: Unknown robot identified by bot\*
> > This generates an internal server error:
> > RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by
> bot\*")
> > [NC]
> > I tried, this, but it didn't help:
> > RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by
> > bot\\*") [NC]
> >
> > Any thoughts?
> >
> > Regards,
> > -Montana
--
Gilbert E. Detillieux E-mail: <gedetil@muug.ca>
Manitoba UNIX User Group Web: http://muug.ca/