Caution! This message was sent from outside the University of Manitoba. Yes, you would then need parentheses within the the one quoted string for the entire pattern, rather than quoting the individual substring patterns to be matched... RewriteCond %{HTTP_USER_AGENT "(googlebot|bingbot|Baiduspider|AhrefsBot/6.1|Ahrefs|Baiduspider|BLEXBot|SemrushBot|claudebot|YandexBot/3.0|Bytespider)" [NC] As for the unknown robot(s), you'd best look at the raw access log files to see what the actual UserAgent string(s) is/are, as Adam suggested. Gilbert On 2025-04-22 3:05 p.m., Montana Quiring wrote:
Ahh ok, thanks. I actually had the names of a bunch of bots in there, so wouldn't I need the parentheses? ie: RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider|"AhrefsBot/6.1"|"Ahrefs"|"Baiduspider"|"BLEXBot"|"SemrushBot"|"claudebot"|"YandexBot/3.0"|Bytespider) [NC]
Regards, -Montana
On Tue, Apr 22, 2025 at 2:56 PM Gilbert Detillieux <Gilbert.Detillieux@umanitoba.ca <mailto:Gilbert.Detillieux@umanitoba.ca>> wrote:
I think Adam is suggesting to use a regex in the RewriteCond, to avoid the problematic characters in the pattern...
https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond <https://httpd.apache.org/docs/current/mod/mod_rewrite.html#rewritecond>
... states that "CondPattern is usually a perl compatible regular expression, but there is additional syntax available to perform other useful tests against the Teststring:".
So, something like this might work...
RewriteCond %{HTTP_USER_AGENT} "Unknown robot identified by bot.." [NC]
BTW, I don't think you want parentheses around the string, as that's probably not supported syntax. (Parentheses within the string will have the usual PCRE syntax and semantics.)
Hope this helps.
Gilbert
On 2025-04-22 2:05 p.m., Montana Quiring wrote: > Sorry man, excuse my ignorance, but not sure what you are asking. > I got the bot name from AWstats, which I assume is just ASCII. > > Regards, > -Montana > > > On Tue, Apr 22, 2025 at 1:58 PM Adam Thompson <athompso@athompso.net <mailto:athompso@athompso.net> > <mailto:athompso@athompso.net <mailto:athompso@athompso.net>>> wrote: > > Urlencode or octal? Or if it's a regex just use ".". > -Adam > > Get Outlook for Android <https://aka.ms/AAb9ysg <https://aka.ms/AAb9ysg>> > ------------------------------------------------------------------------ > *From:* Montana Quiring <montanaq@gmail.com <mailto:montanaq@gmail.com> <mailto:montanaq@gmail.com <mailto:montanaq@gmail.com>>> > *Sent:* Tuesday, April 22, 2025 1:47:31 PM > *To:* Continuation of Round Table discussion <roundtable@muug.ca <mailto:roundtable@muug.ca> > <mailto:roundtable@muug.ca <mailto:roundtable@muug.ca>>> > *Subject:* [RndTbl] .htaccess file: stopping robot with escape > character in name > Hello Folks, > > I'm trying to stop a bot from crawling a site using the .htaccess > file. The problem is that it's using the backslash character as its > name. Grrr... > It's called: Unknown robot identified by bot\* > This generates an internal server error: > RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by bot\*") > [NC] > I tried, this, but it didn't help: > RewriteCond %{HTTP_USER_AGENT} ("Unknown robot identified by > bot\\*") [NC] > > Any thoughts? > > Regards, > -Montana
-- Gilbert E. Detillieux E-mail: <gedetil@muug.ca> Manitoba UNIX User Group Web: http://muug.ca/ _______________________________________________ Roundtable mailing list -- roundtable@muug.ca To unsubscribe send an email to roundtable-leave@muug.ca