Caution! This message was sent from outside the University of Manitoba.

Thanks, ya, 
I'll pour over it tonight. :)

Regards,
-Montana


On Tue, Apr 22, 2025 at 3:30 PM Gilbert Detillieux <Gilbert.Detillieux@umanitoba.ca> wrote:
Yes, you would then need parentheses within the the one quoted string
for the entire pattern, rather than quoting the individual substring
patterns to be matched...

RewriteCond %{HTTP_USER_AGENT
"(googlebot|bingbot|Baiduspider|AhrefsBot/6.1|Ahrefs|Baiduspider|BLEXBot|SemrushBot|claudebot|YandexBot/3.0|Bytespider)"
[NC]

As for the unknown robot(s), you'd best look at the raw access log files
to see what the actual UserAgent string(s) is/are, as Adam suggested.

Gilbert

On 2025-04-22 3:05 p.m., Montana Quiring wrote:
> Ahh ok, thanks.
> I actually had the names of a bunch of bots in there, so wouldn't I need
> the parentheses?
> ie:
> RewriteCond %{HTTP_USER_AGENT}
> (googlebot|bingbot|Baiduspider|"AhrefsBot/6.1"|"Ahrefs"|"Baiduspider"|"BLEXBot"|"SemrushBot"|"claudebot"|"YandexBot/3.0"|Bytespider) [NC]
>
> Regards,
> -Montana
>
>
> On Tue, Apr 22, 2025 at 2:56 PM Gilbert Detillieux
> <Gilbert.Detillieux@umanitoba.ca
> <mailto:Gilbert.Detillieux@umanitoba.ca>> wrote:
>
>     I think Adam is suggesting to use a regex in the RewriteCond, to avoid
>     the problematic characters in the pattern...
>
>     https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond <https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond>
>
>     ... states that "CondPattern is usually a perl compatible regular
>     expression, but there is additional syntax available to perform other
>     useful tests against the Teststring:".
>
>     So, something like this might work...
>
>     RewriteCond %{HTTP_USER_AGENT} "Unknown robot identified by bot.." [NC]
>
>     BTW, I don't think you want parentheses around the string, as that's
>     probably not supported syntax.  (Parentheses within the string will
>     have
>     the usual PCRE syntax and semantics.)
>
>     Hope this helps.
>
>     Gilbert
>
>     On 2025-04-22 2:05 p.m., Montana Quiring wrote:
>      > Sorry man, excuse my ignorance, but not sure what you are asking.
>      > I got the bot name from AWstats, which I assume is just ASCII.
>      >
>      > Regards,
>      > -Montana
>      >
>      >
>      > On Tue, Apr 22, 2025 at 1:58 PM Adam Thompson
>     <athompso@athompso.net <mailto:athompso@athompso.net>
>      > <mailto:athompso@athompso.net <mailto:athompso@athompso.net>>> wrote:
>      >
>      >     Urlencode or octal?  Or if it's a regex just use ".".
>      >     -Adam
>      >
>      >     Get Outlook for Android <https://aka.ms/AAb9ysg
>     <https://aka.ms/AAb9ysg>>
>      >   
>       ------------------------------------------------------------------------
>      >     *From:* Montana Quiring <montanaq@gmail.com
>     <mailto:montanaq@gmail.com> <mailto:montanaq@gmail.com
>     <mailto:montanaq@gmail.com>>>
>      >     *Sent:* Tuesday, April 22, 2025 1:47:31 PM
>      >     *To:* Continuation of Round Table discussion
>     <roundtable@muug.ca <mailto:roundtable@muug.ca>
>      >     <mailto:roundtable@muug.ca <mailto:roundtable@muug.ca>>>
>      >     *Subject:* [RndTbl] .htaccess file: stopping robot with escape
>      >     character in name
>      >     Hello Folks,
>      >
>      >     I'm trying to stop a bot from crawling a site using the .htaccess
>      >     file. The problem is that it's using the backslash character
>     as its
>      >     name. Grrr...
>      >     It's called: Unknown robot identified by bot\*
>      >     This generates an internal server error:
>      >     RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by
>     bot\*")
>      >     [NC]
>      >     I tried, this, but it didn't help:
>      >     RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by
>      >     bot\\*") [NC]
>      >
>      >     Any thoughts?
>      >
>      >     Regards,
>      >     -Montana

--
Gilbert E. Detillieux    E-mail: <gedetil@muug.ca>
Manitoba UNIX User Group   Web:    http://muug.ca/